AITopics | Lakshminarayanan, Chandrashekar

Collaborating Authors

Lakshminarayanan, Chandrashekar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transformers with Sparse Attention for Granger Causality

Mahesh, Riya, Vashisht, Rahul, Lakshminarayanan, Chandrashekar

arXiv.org Artificial IntelligenceNov-20-2024

Temporal causal analysis means understanding the underlying causes behind observed variables over time. Deep learning based methods such as transformers are increasingly used to capture temporal dynamics and causal relationships beyond mere correlations. Recent works suggest self-attention weights of transformers as a useful indicator of causal links. We leverage this to propose a novel modification to the self-attention module to establish causal links between the variables of multivariate time-series data with varying lag dependencies. Our Sparse Attention Transformer captures causal relationships using a two-fold approach - performing temporal attention first followed by attention between the variables across the time steps masking them individually to compute Granger Causality indices. The key novelty in our approach is the ability of the model to assert importance and pick the most significant past time instances for its prediction task against manually feeding a fixed time lag value. We demonstrate the effectiveness of our approach via extensive experimentation on several synthetic benchmark datasets. Furthermore, we compare the performance of our model with the traditional Vector Autoregression based Granger Causality method that assumes fixed lag length.

artificial intelligence, granger causality, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.13264

Country: Asia > India > Tamil Nadu (0.15)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Half-Space Feature Learning in Neural Networks

Yadav, Mahesh Lorik, Ramaswamy, Harish Guruprasad, Lakshminarayanan, Chandrashekar

arXiv.org Artificial IntelligenceApr-5-2024

There currently exist two extreme viewpoints for neural network feature learning -- (i) Neural networks simply implement a kernel method (a la NTK) and hence no features are learned (ii) Neural networks can represent (and hence learn) intricate hierarchical features suitable for the data. We argue in this paper neither interpretation is likely to be correct based on a novel viewpoint. Neural networks can be viewed as a mixture of experts, where each expert corresponds to a (number of layers length) path through a sequence of hidden units. We use this alternate interpretation to motivate a model, called the Deep Linearly Gated Network (DLGN), which sits midway between deep linear networks and ReLU networks. Unlike deep linear networks, the DLGN is capable of learning non-linear features (which are then linearly combined), and unlike ReLU networks these features are ultimately simple -- each feature is effectively an indicator function for a region compactly described as an intersection of (number of layers) half-spaces in the input space. This viewpoint allows for a comprehensive global visualization of features, unlike the local visualizations for neurons based on saliency/activation/gradient maps. Feature learning in DLGNs is shown to happen and the mechanism with which this happens is through learning half-spaces in the input space that contain smooth regions of the target function. Due to the structure of DLGNs, the neurons in later layers are fundamentally the same as those in earlier layers -- they all represent a half-space -- however, the dynamics of gradient descent impart a distinct clustering to the later layer neurons. We hypothesize that ReLU networks also have similar feature learning behaviour.

artificial intelligence, dlgn, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2404.04312

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Approximate Linear Programming and Decentralized Policy Improvement in Cooperative Multi-agent Markov Decision Processes

Mandal, Lakshmi, Lakshminarayanan, Chandrashekar, Bhatnagar, Shalabh

arXiv.org Artificial IntelligenceNov-20-2023

In this work, we consider a `cooperative' multi-agent Markov decision process (MDP) involving m greater than 1 agents, where all agents are aware of the system model. At each decision epoch, all the m agents cooperatively select actions in order to maximize a common long-term objective. Since the number of actions grows exponentially in the number of agents, policy improvement is computationally expensive. Recent works have proposed using decentralized policy improvement in which each agent assumes that the decisions of the other agents are fixed and it improves its decisions unilaterally. Yet, in these works, exact values are computed. In our work, for cooperative multi-agent finite and infinite horizon discounted MDPs, we propose suitable approximate policy iteration algorithms, wherein we use approximate linear programming to compute the approximate value function and use decentralized policy improvement. Thus our algorithms can handle both large number of states as well as multiple agents. We provide theoretical guarantees for our algorithms and also demonstrate the performance of our algorithms on some numerical examples.

agent, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2311.11789

Country: North America > United States > Massachusetts > Middlesex County (0.14)

Genre: Research Report (0.40)

Industry:

Energy (0.68)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Neural Path Features and Neural Path Kernel : Understanding the role of gates in deep learning

Lakshminarayanan, Chandrashekar, Singh, Amit Vikram

arXiv.org Machine LearningJun-11-2020

Rectified linear unit (ReLU) activations can also be thought of as \emph{gates}, which, either pass or stop their pre-activation input when they are \emph{on} (when the pre-activation input is positive) or \emph{off} (when the pre-activation input is negative) respectively. A deep neural network (DNN) with ReLU activations has many gates, and the {on/off} status of each gate changes across input examples as well as network weights. For a given input example, only a subset of gates are \emph{active}, i.e., on, and the sub-network of weights connected to these active gates is responsible for producing the output. At randomised initialisation, the active sub-network corresponding to a given input example is random. During training, as the weights are learnt, the active sub-networks are also learnt, and potentially hold very valuable information. In this paper, we analytically characterise the role of active sub-networks in deep learning. To this end, we encode the {on/off} state of the gates of a given input in a novel \emph{neural path feature} (NPF), and the weights of the DNN are encoded in a novel \emph{neural path value} (NPV). Further, we show that the output of network is indeed the inner product of NPF and NPV. The main result of the paper shows that the \emph{neural path kernel} associated with the NPF is a fundamental quantity that characterises the information stored in the gates of a DNN. We show via experiments (on MNIST and CIFAR-10) that in standard DNNs with ReLU activations NPFs are learnt during training and such learning is key for generalisation. Furthermore, NPFs and NPVs can be learnt in two separate networks and such learning also generalises well in experiments.

deep learning, dnn, neural network, (19 more...)

arXiv.org Machine Learning

2006.10529

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging

Lakshminarayanan, Chandrashekar, Szepesvári, Csaba

arXiv.org Machine LearningSep-12-2017

We consider $d$-dimensional linear stochastic approximation algorithms (LSAs) with a constant step-size and the so called Polyak-Ruppert (PR) averaging of iterates. LSAs are widely applied in machine learning and reinforcement learning (RL), where the aim is to compute an appropriate $\theta_{*} \in \mathbb{R}^d$ (that is an optimum or a fixed point) using noisy data and $O(d)$ updates per iteration. In this paper, we are motivated by the problem (in RL) of policy evaluation from experience replay using the \emph{temporal difference} (TD) class of learning algorithms that are also LSAs. For LSAs with a constant step-size, and PR averaging, we provide bounds for the mean squared error (MSE) after $t$ iterations. We assume that data is \iid with finite variance (underlying distribution being $P$) and that the expected dynamics is Hurwitz. For a given LSA with PR averaging, and data distribution $P$ satisfying the said assumptions, we show that there exists a range of constant step-sizes such that its MSE decays as $O(\frac{1}{t})$. We examine the conditions under which a constant step-size can be chosen uniformly for a class of data distributions $\mathcal{P}$, and show that not all data distributions `admit' such a uniform constant step-size. We also suggest a heuristic step-size tuning algorithm to choose a constant step-size of a given LSA for a given data distribution $P$. We compare our results with related work and also discuss the implication of our results in the context of TD algorithms that are LSAs.

algorithm, artificial intelligence, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1709.04073

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Generalized Reduced Linear Program for Markov Decision Processes

Lakshminarayanan, Chandrashekar (Indian Institute of Science) | Bhatnagar, Shalabh (Indian Institute of Science)

AAAI ConferencesMar-6-2015

Markov decision processes (MDPs) with large number of states are of high practical interest. However, conventional algorithms to solve MDP are computationally infeasible in this scenario. Approximate dynamic programming (ADP) methods tackle this issue by computing approximate solutions. A widely applied ADP method is approximate linear program (ALP) which makes use of linear function approximation and offers theoretical performance guarantees. Nevertheless, the ALP is difficult to solve due to the presence of a large number of constraints and in practice, a reduced linear program (RLP) is solved instead. The RLP has a tractable number of constraints sampled from the original constraints of the ALP. Though the RLP is known to perform well in experiments, theoretical guarantees are available only for a specific RLP obtained under idealized assumptions. In this paper, we generalize the RLP to define a generalized reduced linear program (GRLP) which has a tractable number of constraints that are obtained as positive linear combinations of the original constraints of the ALP. The main contribution of this paper is the novel theoretical framework developed to obtain error bounds for any given GRLP. Central to our framework are two max-norm contraction operators. Our result theoretically justifies linear approximation of constraints. We discuss the implication of our results in the contexts of ADP and reinforcement learning. We also demonstrate via an example in the domain of controlled queues that the experiments conform to the theory.

Add feedback

A Markov Decision Process Framework for Predictable Job Completion Times on Crowdsourcing Platforms

Lakshminarayanan, Chandrashekar (Indian Institute of Science) | Dubey, Ayush (Indian Institute of Science) | Bhatnagar, Shalabh (Indian Institute of Science) | Balamurugan, Chithralekha (Xerox Research Centre India)

AAAI ConferencesOct-31-2014

Task starvation leads to huge variation in the completion times of the tasks posted on to the crowd. The price offered to a given task together with the dynamics of the crowd at the time of posting affect its completion time. Large organizations/requesters who frequent the crowd at regular intervals in order to get their tasks done desire predictability in completion times of the tasks. Thus, such requesters have to take into account the crowd dynamics at the time of posting the tasks and price them accordingly. In this work, we study an instance of the pricing problem and propose a solution based on the framework of Markov Decision Processes (MDPs).

completion time, crowdsourcing, social media, (10 more...)

AAAI Conferences

Second AAAI Conference on Human Computation and Crowdsourcing

Country: Asia > India (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)
Information Technology > Communications > Social Media > Crowdsourcing (0.52)

Add feedback