Goto

Collaborating Authors

 Indian Institute of Science


Top-Down Feedback for Crowd Counting Convolutional Neural Network

AAAI Conferences

Counting people in dense crowds is a demanding task even for humans. This is primarily due to the large variability in appearance of people. Often people are only seen as a bunch of blobs. Occlusions, pose variations and background clutter further compound the difficulty. In this scenario, identifying a person requires larger spatial context and semantics of the scene. But the current state-of-the-art CNN regressors for crowd counting are feedforward and use only limited spatial context to detect people. They look for local crowd patterns to regress the crowd density map, resulting in false predictions. Hence, we propose top-down feedback to correct the initial prediction of the CNN. Our architecture consists of a bottom-up CNN along with a separate top-down CNN to generate feedback. The bottom-up network, which regresses the crowd density map, has two columns of CNN with different receptive fields. Features from various layers of the bottom-up CNN are fed to the top-down network. The feedback, thus generated, is applied on the lower layers of the bottom-up network in the form of multiplicative gating. This masking weighs activations of the bottom-up network at spatial as well as feature levels to correct the density prediction. We evaluate the performance of our model on all major crowd datasets and show the effectiveness of top-down feedback.


Game of Sketches: Deep Recurrent Models of Pictionary-Style Word Guessing

AAAI Conferences

The ability of machine-based agents to play games in human-like fashion is considered a benchmark of progress in AI. In this paper, we introduce the first computational model aimed at Pictionary, the popular word-guessing social game. We first introduce Sketch-QA, an elementary version of Visual Question Answering task. Styled after Pictionary, Sketch-QA uses incrementally accumulated sketch stroke sequences as visual data. Notably, Sketch-QA involves asking a fixed question ("What object is being drawn?") and gathering open-ended guess-words from human guessers. To mimic Pictionary-style guessing, we propose a deep neural model which generates guess-words in response to temporally evolving human-drawn sketches. Our model even makes human-like mistakes while guessing, thus amplifying the human mimicry factor. We evaluate our model on the large-scale guess-word dataset generated via Sketch-QA task and compare with various baselines. We also conduct a Visual Turing Test to obtain human impressions of the guess-words generated by humans and our model. Experimental results demonstrate the promise of our approach for Pictionary and similarly themed games.


Groupwise Maximin Fair Allocation of Indivisible Goods

AAAI Conferences

We study the problem of allocating indivisible goods among n agents in a fair manner. For this problem, maximin share (MMS) is a well-studied solution concept which provides a fairness threshold. Specifically, maximin share is defined as the minimum utility that an agent can guarantee for herself when asked to partition the set of goods into n bundles such that the remaining (n-1) agents pick their bundles adversarially. An allocation is deemed to be fair if every agent gets a bundle whose valuation is at least her maximin share. Even though maximin shares provide a natural benchmark for fairness, it has its own drawbacks and, in particular, it is not sufficient to rule out unsatisfactory allocations. Motivated by these considerations, in this work we define a stronger notion of fairness, called groupwise maximin share guarantee (GMMS). In GMMS, we require that the maximin share guarantee is achieved not just with respect to the grand bundle, but also among all the subgroups of agents. Hence, this solution concept strengthens MMS and provides an ex-post fairness guarantee. We show that in specific settings, GMMS allocations always exist. We also establish the existence of approximate GMMS allocations under additive valuations, and develop a polynomial-time algorithm to find such allocations. Moreover, we establish a scale of fairness wherein we show that GMMS implies approximate envy freeness. Finally, we empirically demonstrate the existence of GMMS allocations in a large set of randomly generated instances. For the same set of instances, we additionally show that our algorithm achieves an approximation factor better than the established, worst-case bound.


Weighted Bandits or: How Bandits Learn Distorted Values That Are Not Expected

AAAI Conferences

Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the cost distributions: the classic K -armed bandit and the linearly parameterized bandit. In both settings, we propose algorithms that are inspired by Upper Confidence Bound (UCB) algorithms, incorporate cost distortions, and exhibit sublinear regret assuming Holder continuous weight distortion functions. For the K -armed setting, we show that the algorithm, called W-UCB, achieves problem-dependent regret O ( L 2 M 2 log n / ฮ”(2/ฮฑ โ€“ 1), where n is the number of plays, ฮ” is the gap in distorted expected value between the best and next best arm, L and alpha are the Holder constants for the distortion function, and M is an upper bound on costs, and a problem-independent regret bound of O (( KL 2 M 2 ) (ฮฑ/2) n (2 โ€“ ฮฑ)/2) ). We also present a matching lower bound on the regret, showing that the regret of W-UCB is essentially unimprovable over the class of Holder-continuous weight distortions. For the linearly parameterized setting, we develop a new algorithm, a variant of the Optimism in the Face of Uncertainty Linear bandit (OFUL) algorithm called WOFUL (Weight-distorted OFUL), and show that it has regret O ( d โˆš n polylog( n) ) with high probability, for sub-Gaussian cost distributions.


Misspecified Linear Bandits

AAAI Conferences

We consider the problem of online learning in misspecified linear stochastic multi-armed bandit problems. Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features. It is, however, of interest to investigate the impact of potential misspecification in linear bandit models, where the expected rewards are perturbed away from the linear subspace determined by the arms features. Although OFUL has recently been shown to be robust to relatively small deviations from linearity, we show that any linear bandit algorithm that enjoys optimal regret performance in the perfectly linear setting (e.g., OFUL) must suffer linear regret under a sparse additive perturbation of the linear model. In an attempt to overcome this negative result,we define a natural class of bandit models characterized by a non-sparse deviation from linearity. We argue that the OFUL algorithm can fail to achieve sublinear regret even under models that have non-sparse deviation. We finally develop a novel bandit algorithm, comprising a hypothesis test for linearity followed by a decision to use either the OFUL or Upper Confidence Bound (UCB) algorithm. For perfectly linear bandit models, the algorithm provably exhibits OFULs favorable regret performance, while for misspecified models satisfying the non-sparse deviation property, the algorithm avoids the linear regret phenomenon and falls back on UCBs sublinear regret scaling. Numerical experiments on synthetic data, and on recommendation data from the public Yahoo! Learning toRank Challenge dataset, empirically support our findings.


Frugal Bribery in Voting

AAAI Conferences

Bribery in elections is an important problem in computational social choice theory. We introduce and study two important special cases of the bribery problem, namely, FRUGAL-BRIBERY and FRUGAL-$BRIBERY where the briber is frugal in nature. By this, we mean that the briber is only able to influence voters who benefit from the suggestion of the briber. More formally, a voter is vulnerable if the outcome of the election improves according to her own preference when she accepts the suggestion of the briber. In the FRUGAL-BRIBERY problem, the goal is to make a certain candidate win the election by changing only the vulnerable votes. In the FRUGAL-$BRIBERY problem, the vulnerable votes have prices and the goal is to make a certain candidate win the election by changing only the vulnerable votes, subject to a budget constraint. We show that both the FRUGAL-BRIBERY and the FRUGAL-$BRIBERY problems are intractable for many commonly used voting rules for weighted as well as unweighted elections. These intractability results demonstrate that bribery is a hard computational problem, in the sense that several special cases of this problem continue to be computationally intractable. This strengthens the view that bribery, although a possible attack on an election in principle, may be infeasible in practice.


Randomised Procedures for Initialising and Switching Actions in Policy Iteration

AAAI Conferences

Policy Iteration (PI) (Howard 1960) is a classical method for computing an optimal policy for a finite Markov Decision Problem (MDP). The method is conceptually simple: starting from some initial policy, โ€œpolicy improvementโ€ is repeatedly performed to obtain progressively dominating policies, until eventually, an optimal policy is reached. Being remarkably efficient in practice, PI is often favoured over alternative approaches such as Value Iteration and Linear Programming. Unfortunately, even after several decades of study, theoretical bounds on the complexity of PI remain unsatisfactory. For an MDP with n states and k actions, Mansour and Singh (1999) bound the number of iterations taken by Howardโ€™s PI, the canonical variant of the method, by O ( k n / n ). This bound merely improves upon the trivial bound of kn by a linear factor. However, a randomised variant of PI introduced by Mansour and Singh (1999) does yield an exponential improvement, with its expected number of iterations bounded by O(((1 + 2/log 2 ( k )) k / 2) n ).With the objective of furnishing improved upper bounds for PI, we introduce two randomised procedures in this paper. Our first contribution is a routine to find a good initial policy for PI. After evaluating a number of randomly generated policies, this procedure applies a novel criterion to pick one to initialise PI. When PI is subsequently applied, we show that the expected number of policy evaluationsโ€”including both the initialisation and the improvement stagesโ€”remains bounded in expectation by O ( k n /2 ). The key construction employed in this routine is a total order on the set of policies. Our second contribution is a randomised action-switching rule for PI, which admits a bound of O((2 + ln( k โ€“ 1)) n ) on the expected number of iterations. To the best of our knowledge, this is the tightest complexity bound known for PI when k >= 3. ย 


Combining Vector Space Embeddings with Symbolic Logical Inference over Open-Domain Text

AAAI Conferences

We have recently shown how to combine random walk inference over knowledge bases with vector space representations of surface forms, improving performance on knowledge base inference. In this paper, we formalize the connection of our prior work to logical inference rules, giving some general observations about methods for incorporating vector space representations into symbolic logic systems. Additionally, we present some promising preliminary work that extends these techniques to learning open-domain relations for the purpose of answering multiple choice questions, achieving 67% accuracy on a small test set.


Never-Ending Learning

AAAI Conferences

Whereas people learn many different types of knowledge from diverse experiences over many years, most current machine learning systems acquire just a single function or data model from just a single data set. We propose a never-ending learning paradigm for machine learning, to better reflect the more ambitious and encompassing type of learning performed by humans. As a case study, we describe the Never-Ending Language Learner (NELL), which achieves some of the desired properties of a never-ending learner, and we discuss lessons learned. NELL has been learning to read the web 24 hours/day since January 2010, and so far has acquired a knowledge base with over 80 million confidence-weighted beliefs (e.g., servedWith(tea, biscuits) ). NELL has also learned millions of features and parameters that enable it to read these beliefs from the web. Additionally, it has learned to reason over these beliefs to infer new beliefs, and is able to extend its ontology by synthesizing new relational predicates. NELL can be tracked online at http://rtw.ml.cmu.edu, and followed on Twitter at @CMUNELL.


A Generalized Reduced Linear Program for Markov Decision Processes

AAAI Conferences

Markov decision processes (MDPs) with large number of states are of high practical interest. However, conventional algorithms to solve MDP are computationally infeasible in this scenario. Approximate dynamic programming (ADP) methods tackle this issue by computing approximate solutions. A widely applied ADP method is approximate linear program (ALP) which makes use of linear function approximation and offers theoretical performance guarantees. Nevertheless, the ALP is difficult to solve due to the presence of a large number of constraints and in practice, a reduced linear program (RLP) is solved instead. The RLP has a tractable number of constraints sampled from the original constraints of the ALP. Though the RLP is known to perform well in experiments, theoretical guarantees are available only for a specific RLP obtained under idealized assumptions. In this paper, we generalize the RLP to define a generalized reduced linear program (GRLP) which has a tractable number of constraints that are obtained as positive linear combinations of the original constraints of the ALP. The main contribution of this paper is the novel theoretical framework developed to obtain error bounds for any given GRLP. Central to our framework are two max-norm contraction operators. Our result theoretically justifies linear approximation of constraints. We discuss the implication of our results in the contexts of ADP and reinforcement learning. We also demonstrate via an example in the domain of controlled queues that the experiments conform to the theory.