Goto

Collaborating Authors

 Sharma, Abhishek


Scale Normalized Image Pyramids with AutoFocus for Object Detection

arXiv.org Artificial Intelligence

We present an efficient foveal framework to perform object detection. A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales. Such a restriction of objects' size during training affords better learning of object-sensitive filters, and therefore, results in better accuracy. However, the use of an image pyramid increases the computational cost. Hence, we propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects (as object locations are known during training). The resulting approach, referred to as Scale Normalized Image Pyramid with Efficient Resampling or SNIPER, yields up to 3 times speed-up during training. Unfortunately, as object locations are unknown during inference, the entire image pyramid still needs processing. To this end, we adopt a coarse-to-fine approach, and predict the locations and extent of object-like regions which will be processed in successive scales of the image pyramid. Intuitively, it's akin to our active human-vision that first skims over the field-of-view to spot interesting regions for further processing and only recognizes objects at the right resolution. The resulting algorithm is referred to as AutoFocus and results in a 2.5-5 times speed-up during inference when used with SNIP.


Geometric Matrix Completion: A Functional View

arXiv.org Machine Learning

We propose a totally functional view of geometric matrix completion problem. Differently from existing work, we propose a novel regularization inspired from the functional map literature that is more interpretable and theoretically sound. On synthetic tasks with strong underlying geometric structure, our framework outperforms state of the art by a huge margin (two order of magnitude) demonstrating the potential of our approach. On real datasets, we achieve state-of-the-art results at a fraction of the computational effort of previous methods.


Price Optimization in Fashion E-commerce

arXiv.org Machine Learning

With the rapid growth in the fashion e-commerce industry, it is becoming extremely challenging for the E-tailers to set an optimal price point for all the products on the platform. By establishing an optimal price point, they can maximize overall revenue and profit for the platform. In this paper, we propose a novel machine learning and optimization technique to find the optimal price point at an individual product level. It comprises three major components. Firstly, we use a demand prediction model to predict the next day demand for each product at a certain discount percentage. Next step, we use the concept of price elasticity of demand to get the multiple demand values by varying the discount percentage. Thus we obtain multiple price demand pairs for each product and we have to choose one of them for the live platform. Typically fashion e-commerce has millions of products, so there can be many permutations. Each permutation will assign a unique price point for all the products, which will sum up to a unique revenue number. To choose the best permutation which gives maximum revenue, a linear programming optimization technique is used. We have deployed the above methods in the live production environment and conducted several AB tests. According to the AB test result, our model is improving the revenue by 1 percent and gross margin by 0.81 percent.


Foreground Clustering for Joint Segmentation and Localization in Videos and Images

Neural Information Processing Systems

This paper presents a novel framework in which video/image segmentation and localization are cast into a single optimization problem that integrates information from low level appearance cues with that of high level localization cues in a very weakly supervised manner. The proposed framework leverages two representations at different levels, exploits the spatial relationship between bounding boxes and superpixels as linear constraints and simultaneously discriminates between foreground and background at bounding box and superpixel level. Different from previous approaches that mainly rely on discriminative clustering, we incorporate a foreground model that minimizes the histogram difference of an object across all image frames. Exploiting the geometric relation between the superpixels and bounding boxes enables the transfer of segmentation cues to improve localization output and vice-versa. Inclusion of the foreground model generalizes our discriminative framework to video data where the background tends to be similar and thus, not discriminative.


Reinforcement learning with spiking coagents

arXiv.org Machine Learning

Neuroscientific theory suggests that dopaminergic neurons broadcast global reward prediction errors to large areas of the brain influencing the synaptic plasticity of the neurons in those regions. We build on this theory to propose a multi-agent learning framework with spiking neurons in the generalized linear model (GLM) formulation as agents, to solve reinforcement learning (RL) tasks. We show that a network of GLM spiking agents connected in a hierarchical fashion, where each spiking agent modulates its firing policy based on local information and a global prediction error, can learn complex action representations to solve RL tasks. We further show how leveraging principles of modularity and population coding inspired from the brain can help reduce variance in the learning updates making it a viable optimization technique.


Neural Conversational QA: Learning to Reason v.s. Exploiting Patterns

arXiv.org Artificial Intelligence

In this paper we work on the recently introduced ShARC task - a challenging form of conversational QA that requires reasoning over rules expressed in natural language. Attuned to the risk of superficial patterns in data being exploited by neural models to do well on benchmark tasks (Niven and Kao 2019), we conduct a series of probing experiments and demonstrate how current state-of-the-art models rely heavily on such patterns. To prevent models from learning based on the superficial clues, we modify the dataset by automatically generating new instances reducing the occurrences of those patterns. We also present a simple yet effective model that learns embedding representations to incorporate dialog history along with the previous answers to follow-up questions. We find that our model outperforms existing methods on all metrics, and the results show that the proposed model is more robust in dealing with spurious patterns and learns to reason meaningfully.


Exploration of Self-Propelling Droplets Using a Curiosity Driven Robotic Assistant

arXiv.org Artificial Intelligence

We describe a chemical robotic assistant equipped with a curiosity algorithm (CA) that can efficiently explore the state a complex chemical system can exhibit. The CA-robot is designed to explore formulations in an open-ended way with no explicit optimization target. By applying the CA-robot to the study of self-propelling multicomponent oil-in-water droplets, we are able to observe an order of magnitude more variety of droplet behaviours than possible with a random parameter search and given the same budget. We demonstrate that the CA-robot enabled the discovery of a sudden and highly specific response of droplets to slight temperature changes. Six modes of self-propelled droplets motion were identified and classified using a time-temperature phase diagram and probed using a variety of techniques including NMR. This work illustrates how target free search can significantly increase the rate of unpredictable observations leading to new discoveries with potential applications in formulation chemistry.


Foreground Clustering for Joint Segmentation and Localization in Videos and Images

Neural Information Processing Systems

This paper presents a novel framework in which video/image segmentation and localization are cast into a single optimization problem that integrates information from low level appearance cues with that of high level localization cues in a very weakly supervised manner. The proposed framework leverages two representations at different levels, exploits the spatial relationship between bounding boxes and superpixels as linear constraints and simultaneously discriminates between foreground and background at bounding box and superpixel level. Different from previous approaches that mainly rely on discriminative clustering, we incorporate a foreground model that minimizes the histogram difference of an object across all image frames. Exploiting the geometric relation between the superpixels and bounding boxes enables the transfer of segmentation cues to improve localization output and vice-versa. Inclusion of the foreground model generalizes our discriminative framework to video data where the background tends to be similar and thus, not discriminative. We demonstrate the effectiveness of our unified framework on the YouTube Object video dataset, Internet Object Discovery dataset and Pascal VOC 2007.


Foreground Clustering for Joint Segmentation and Localization in Videos and Images

Neural Information Processing Systems

This paper presents a novel framework in which video/image segmentation and localization are cast into a single optimization problem that integrates information from low level appearance cues with that of high level localization cues in a very weakly supervised manner. The proposed framework leverages two representations at different levels, exploits the spatial relationship between bounding boxes and superpixels as linear constraints and simultaneously discriminates between foreground and background at bounding box and superpixel level. Different from previous approaches that mainly rely on discriminative clustering, we incorporate a foreground model that minimizes the histogram difference of an object across all image frames. Exploiting the geometric relation between the superpixels and bounding boxes enables the transfer of segmentation cues to improve localization output and vice-versa. Inclusion of the foreground model generalizes our discriminative framework to video data where the background tends to be similar and thus, not discriminative. We demonstrate the effectiveness of our unified framework on the YouTube Object video dataset, Internet Object Discovery dataset and Pascal VOC 2007.


Identifying Useful Inference Paths in Large Commonsense Knowledge Bases by Retrograde Analysis

AAAI Conferences

Commonsense reasoning at scale is a critical problem for modern cognitive systems. Large theories have millions of axioms, but only a handful are relevant for answering a given goal query. Irrelevant axioms increase the search space, overwhelming unoptimized inference engines in large theories. Therefore, methods that help in identifying useful inference paths are an essential part of large cognitive systems. In this paper, we use retrograde analysis to build a database of proof paths that lead to at least one successful proof. This database helps the inference engine identify more productive parts of the search space. A heuristic based on this approach is used to order nodes during a search. We study the efficacy of this approach on hundreds of queries from the Cyc KB. Empirical results show that this approach leads to significant reduction in inference time.