Goto

Collaborating Authors

 Undirected Networks


Visual Rendering of Shapes on 2D Display Devices Guided by Hand Gestures

arXiv.org Machine Learning

Designing of touchless user interface is gaining popularity in various contexts. Using such interfaces, users can interact with electronic devices even when the hands are dirty or non-conductive. Also, user with partial physical disability can interact with electronic devices using such systems. Research in this direction has got major boost because of the emergence of low-cost sensors such as Leap Motion, Kinect or RealSense devices. In this paper, we propose a Leap Motion controller-based methodology to facilitate rendering of 2D and 3D shapes on display devices. The proposed method tracks finger movements while users perform natural gestures within the field of view of the sensor. In the next phase, trajectories are analyzed to extract extended Npen++ features in 3D. These features represent finger movements during the gestures and they are fed to unidirectional left-to-right Hidden Markov Model (HMM) for training. A one-to-one mapping between gestures and shapes is proposed. Finally, shapes corresponding to these gestures are rendered over the display using MuPad interface. We have created a dataset of 5400 samples recorded by 10 volunteers. Our dataset contains 18 geometric and 18 non-geometric shapes such as "circle", "rectangle", "flower", "cone", "sphere" etc. The proposed methodology achieves an accuracy of 92.87% when evaluated using 5-fold cross validation method. Our experiments revel that the extended 3D features perform better than existing 3D features in the context of shape representation and classification. The method can be used for developing useful HCI applications for smart display devices.


Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

arXiv.org Machine Learning

The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance tradeoff, exponential utility, the percentile performance, value at risk, conditional value at risk, prospect theory and its later enhancement, cumulative prospect theory. In this article, we focus on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. We introduce the risk-constrained RL framework, cover popular risk measures based on variance, conditional value-at-risk and cumulative prospect theory, and present a template for a risk-sensitive RL algorithm. We survey some of our recent work on this topic, covering problems encompassing discounted cost, average cost, and stochastic shortest path settings, together with the aforementioned risk measures in a constrained framework. This non-exhaustive survey is aimed at giving a flavor of the challenges involved in solving a risk-sensitive RL problem, and outlining some potential future research directions.


Our Practice Of Using Machine Learning To Recognize Species By Voice

arXiv.org Machine Learning

As the technology is advancing, audio recognition in machine learning is improved as well. Research in audio recognition has traditionally focused on speech. Living creatures (especially the small ones) are part of the whole ecosystem, monitoring as well as maintaining them are important tasks. Species such as animals and birds are tending to change their activities as well as their habitats due to the adverse effects on the environment or due to other natural or man-made calamities. For those in far deserted areas, we will not have any idea about their existence until we can continuously monitor them. Continuous monitoring will take a lot of hard work and labor. If there is no continuous monitoring, then there might be instances where endangered species may encounter dangerous situations. The best way to monitor those species are through audio recognition. Classifying sound can be a difficult task even for humans. Powerful audio signals and their processing techniques make it possible to detect audio of various species. There might be many ways wherein audio recognition can be done. We can train machines either by pre-recorded audio files or by recording them live and detecting them. The audio of species can be detected by removing all the background noise and echoes. Smallest sound is considered as a syllable. Extracting various syllables is the process we are focusing on which is known as audio recognition in terms of Machine Learning (ML).


Implicit Maximum Likelihood Estimation

arXiv.org Machine Learning

Implicit probabilistic models are models defined naturally in terms of a sampling procedure and often induces a likelihood function that cannot be expressed explicitly. We develop a simple method for estimating parameters in implicit models that does not require knowledge of the form of the likelihood function or any derived quantities, but can be shown to be equivalent to maximizing likelihood under some conditions. Our result holds in the non-asymptotic parametric setting, where both the capacity of the model and the number of data examples are finite. We also demonstrate encouraging experimental results.


Multi-Agent Actor-Critic with Generative Cooperative Policy Network

arXiv.org Artificial Intelligence

We propose an efficient multi-agent reinforcement learning approach to derive equilibrium strategies for multi-agents who are participating in a Markov game. Mainly, we are focused on obtaining decentralized policies for agents to maximize the performance of a collaborative task by all the agents, which is similar to solving a decentralized Markov decision process. We propose to use two different policy networks: (1) decentralized greedy policy network used to generate greedy action during training and execution period and (2) generative cooperative policy network (GCPN) used to generate action samples to make other agents improve their objectives during training period. We show that the samples generated by GCPN enable other agents to explore the policy space more effectively and favorably to reach a better policy in terms of achieving the collaborative tasks.


Posterior Sampling for Large Scale Reinforcement Learning

arXiv.org Artificial Intelligence

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parametrization for a large class of problems in sequential recommendations.


Restricted Boltzmann Machines -- Simplified โ€“ Towards Data Science

#artificialintelligence

In this post, I will try to shed some light on the intuition about Restricted Boltzmann Machines and the way they work. This is supposed to be a simple explanation with a little bit of mathematics without going too deep into each concept or equation. So let's start with the origin of RBMs and delve deeper as we move forward. Boltzmann machines are stochastic and generative neural networks capable of learning internal representations and are able to represent and (given sufficient time) solve difficult combinatoric problems. They are named after the Boltzmann distribution (also known as Gibbs Distribution) which is an integral part of Statistical Mechanics and helps us to understand the impact of parameters like Entropy and Temperature on the Quantum States in Thermodynamics.


Infinite Factorial Finite State Machine for Blind Multiuser Channel Estimation

arXiv.org Machine Learning

New communication standards need to deal with machine-to-machine communications, in which users may start or stop transmitting at any time in an asynchronous manner. Thus, the number of users is an unknown and time-varying parameter that needs to be accurately estimated in order to properly recover the symbols transmitted by all users in the system. In this paper, we address the problem of joint channel parameter and data estimation in a multiuser communication channel in which the number of transmitters is not known. For that purpose, we develop the infinite factorial finite state machine model, a Bayesian nonparametric model based on the Markov Indian buffet that allows for an unbounded number of transmitters with arbitrary channel length. We propose an inference algorithm that makes use of slice sampling and particle Gibbs with ancestor sampling. Our approach is fully blind as it does not require a prior channel estimation step, prior knowledge of the number of transmitters, or any signaling information. Our experimental results, loosely based on the LTE random access channel, show that the proposed approach can effectively recover the data-generating process for a wide range of scenarios, with varying number of transmitters, number of receivers, constellation order, channel length, and signal-to-noise ratio.


Trust Region Policy Optimization of POMDPs

arXiv.org Machine Learning

We propose Generalized Trust Region Policy Optimization (GTRPO), a Reinforcement Learning algorithm for TRPO of Partially Observable Markov Decision Processes (POMDP). While the principle of policy gradient methods does not require any model assumption, previous studies of more sophisticated policy gradient methods are mainly limited to MDPs. Many real-world decision-making tasks, however, are inherently non-Markovian, i.e., only an incomplete representation of the environment is observable. Moreover, most of the advanced policy gradient methods are designed for infinite horizon MDPs. Our proposed algorithm, GTRPO, is a policy gradient method for continuous episodic POMDPs. We prove that its policy updates monotonically improve the expected cumulative return. We empirically study GTRPO on many RoboSchool environments, an extension to the MuJoCo environments, and provide insights into its empirical behavior.


Adaptive Clinical Trials: Exploiting Sequential Patient Recruitment and Allocation

arXiv.org Machine Learning

Randomized Controlled Trials (RCTs) are the gold standard for comparing the effectiveness of a new treatment to the current one (the control). Most RCTs allocate the patients to the treatment group and the control group by uniform randomization. We show that this procedure can be highly sub-optimal (in terms of learning) if -- as is often the case -- patients can be recruited in cohorts (rather than all at once), the effects on each cohort can be observed before recruiting the next cohort, and the effects are heterogeneous across identifiable subgroups of patients. We formulate the patient allocation problem as a finite stage Markov Decision Process in which the objective is to minimize a given weighted combination of type-I and type-II errors. Because finding the exact solution to this Markov Decision Process is computationally intractable, we propose an algorithm -- \textit{Knowledge Gradient for Randomized Controlled Trials} (RCT-KG) -- that yields an approximate solution. We illustrate our algorithm on a synthetic dataset with Bernoulli outcomes and compare it with uniform randomization. For a given size of trial our method achieves significant reduction in error, and to achieve a prescribed level of confidence (in identifying whether the treatment is superior to the control), our method requires many fewer patients. Our approach uses what has been learned from the effects on previous cohorts to recruit patients to subgroups and allocate patients (to treatment/control) within subgroups in a way that promotes more efficient learning.