Goto

Collaborating Authors

 Pandey, Gaurav


Look Both Ways: Bidirectional Visual Sensing for Automatic Multi-Camera Registration

arXiv.org Artificial Intelligence

This work describes the automatic registration of a large network (approximately 40) of fixed, ceiling-mounted environment cameras spread over a large area (approximately 800 squared meters) using a mobile calibration robot equipped with a single upward-facing fisheye camera and a backlit ArUco marker for easy detection. The fisheye camera is used to do visual odometry (VO), and the ArUco marker facilitates easy detection of the calibration robot in the environment cameras. In addition, the fisheye camera is also able to detect the environment cameras. This two-way, bidirectional detection constrains the pose of the environment cameras to solve an optimization problem. Such an approach can be used to automatically register a large-scale multi-camera system used for surveillance, automated parking, or robotic applications. This VO based multi-camera registration method has been extensively validated using real-world experiments, and also compared against a similar approach which uses a LiDAR - an expensive, heavier and power hungry sensor.


Simulated Chats for Task-oriented Dialog: Learning to Generate Conversations from Instructions

arXiv.org Artificial Intelligence

Popular task-oriented dialog data sets such as MultiWOZ (Budzianowski et al. 2018) are created by providing crowd-sourced workers a goal instruction, expressed in natural language, that describes the task to be accomplished. Crowd-sourced workers play the role of a user and an agent to generate dialogs to accomplish tasks involving booking restaurant tables, making train reservations, calling a taxi etc. However, creating large crowd-sourced datasets can be time consuming and expensive. To reduce the cost associated with generating such dialog datasets, recent work has explored methods to automatically create larger datasets from small samples.In this paper, we present a data creation strategy that uses the pre-trained language model, GPT2 (Radford et al. 2018), to simulate the interaction between crowd-sourced workers by creating a user bot and an agent bot. We train the simulators using a smaller percentage of actual crowd-generated conversations and their corresponding goal instructions. We demonstrate that by using the simulated data, we achieve significant improvements in both low-resource setting as well as in over-all task performance. To the best of our knowledge we are the first to present a model for generating entire conversations by simulating the crowd-sourced data collection process


Ford Highway Driving RTK Dataset: 30,000 km of North American Highways

arXiv.org Artificial Intelligence

Today, Global Navigation Satellite Systems (GNSS) are used to provide position information as a driver navigational aid. This provides an attractive solution, as it offers global positioning using relatively lowcost hardware with lightweight computational load. In recent years, accuracy and robustness have increased, thanks to the availability of substantially more GNSS satellites, multiple civil frequencies such as L5, multi-frequency capable mass market receivers, and continental-scale coverage of corrections services like networked Real-Time Kinematic (RTK), Precise Point Positioning (PPP), and other model based approaches such as PPP-RTK [2]. One of the challenges facing adoption of RTK and other precision GNSS solutions in next-generation automotive systems is understanding the environment that vehicles will be operating in, as this could potentially be used as a core component of a safety critical system. General Motor's (GM) Super Cruise is an example use of GNSS as a core input to the feature activation criteria, only allowing the feature to be active on divided highways [3]. In order to address the integrity of such a system, the GNSS conditions on roads in terms of service denials must be understood. Some of the factors that affect the performance of GNSS and RTK use on highways include obstructions (e.g.


Unravelling the Architecture of Membrane Proteins with Conditional Random Fields

arXiv.org Machine Learning

In this paper, we will show that the recently introduced graphical model: Conditional Random Fields (CRF) provides a template to integrate micro-level information about biological entities into a mathematical model to understand their macro-level behavior. More specifically, we will apply the CRF model to an important classification problem in protein science, namely the secondary structure prediction of proteins based on the observed primary structure. A comparison on benchmark data sets against twenty-eight other methods shows that not only does the CRF model lead to extremely accurate predictions but the modular nature of the model and the freedom to integrate disparate, overlapping and non-independent sources of information, makes the model an extremely versatile tool to potentially solve many other problems in bioinformatics.


Standalone and RTK GNSS on 30,000 km of North American Highways

arXiv.org Artificial Intelligence

There is a growing need for vehicle positioning information to support Advanced Driver Assistance Systems (ADAS), Connectivity (V2X), and Automated Driving (AD) features. These range from a need for road determination (<5 meters), lane determination (<1.5 meters), and determining where the vehicle is within the lane (<0.3 meters). This work examines the performance of Global Navigation Satellite Systems (GNSS) on 30,000 km of North American highways to better understand the automotive positioning needs it meets today and what might be possible in the near future with wide area GNSS correction services and multi-frequency receivers. This includes data from a representative automotive production GNSS used primarily for turn-by-turn navigation as well as an Inertial Navigation System which couples two survey grade GNSS receivers with a tactical grade Inertial Measurement Unit (IMU) to act as ground truth. The latter utilized networked Real-Time Kinematic (RTK) GNSS corrections delivered over a cellular modem in real-time. We assess on-road GNSS accuracy, availability, and continuity. Availability and continuity are broken down in terms of satellite visibility, satellite geometry, position type (RTK fixed, RTK float, or standard positioning), and RTK correction latency over the network. Results show that current automotive solutions are best suited to meet road determination requirements at 98% availability but are less suitable for lane determination at 57%. Multi-frequency receivers with RTK corrections were found more capable with road determination at 99.5%, lane determination at 98%, and highway-level lane departure protection at 91%.


Localization Requirements for Autonomous Vehicles

arXiv.org Artificial Intelligence

Autonomous vehicles require precise knowledge of their position and orientation in all weather and traffic conditions for path planning, perception, control, and general safe operation. Here we derive these requirements for autonomous vehicles based on first principles. We begin with the safety integrity level, defining the allowable probability of failure per hour of operation based on desired improvements on road safety today. This draws comparisons with the localization integrity levels required in aviation and rail where similar numbers are derived at 10^-8 probability of failure per hour of operation. We then define the geometry of the problem, where the aim is to maintain knowledge that the vehicle is within its lane and to determine what road level it is on. Longitudinal, lateral, and vertical localization error bounds (alert limits) and 95% accuracy requirements are derived based on US road geometry standards (lane width, curvature, and vertical clearance) and allowable vehicle dimensions. For passenger vehicles operating on freeway roads, the result is a required lateral error bound of 0.57 m (0.20 m, 95%), a longitudinal bound of 1.40 m (0.48 m, 95%), a vertical bound of 1.30 m (0.43 m, 95%), and an attitude bound in each direction of 1.50 deg (0.51 deg, 95%). On local streets, the road geometry makes requirements more stringent where lateral and longitudinal error bounds of 0.29 m (0.10 m, 95%) are needed with an orientation requirement of 0.50 deg (0.17 deg, 95%).


Deep Discriminative Learning for Unsupervised Domain Adaptation

arXiv.org Machine Learning

The primary objective of domain adaptation methods is to transfer knowledge from a source domain to a target domain that has similar but different data distributions. Thus, in order to correctly classify the unlabeled target domain samples, the standard approach is to learn a common representation for both source and target domain, thereby indirectly addressing the problem of learning a classifier in the target domain. However, such an approach does not address the task of classification in the target domain directly. In contrast, we propose an approach that directly addresses the problem of learning a classifier in the unlabeled target domain. In particular, we train a classifier to correctly classify the training samples while simultaneously classifying the samples in the target domain in an unsupervised manner. The corresponding model is referred to as Discriminative Encoding for Domain Adaptation (DEDA). We show that this simple approach for performing unsupervised domain adaptation is indeed quite powerful. Our method achieves state of the art results in unsupervised adaptation tasks on various image classification benchmarks. We also obtained state of the art performance on domain adaptation in Amazon reviews sentiment classification dataset. We perform additional experiments when the source data has less labeled examples and also on zero-shot domain adaptation task where no target domain samples are used for training.


Unsupervised Learning of Interpretable Dialog Models

arXiv.org Artificial Intelligence

Recently several deep learning based models have been proposed for end-to-end learning of dialogs. While these models can be trained from data without the need for any additional annotations, it is hard to interpret them. On the other hand, there exist traditional state based dialog systems, where the states of the dialog are discrete and hence easy to interpret. However these states need to be handcrafted and annotated in the data. To achieve the best of both worlds, we propose Latent State Tracking Network (LSTN) using which we learn an interpretable model in unsupervised manner. The model defines a discrete latent variable at each turn of the conversation which can take a finite set of values. Since these discrete variables are not present in the training data, we use EM algorithm to train our model in unsupervised manner. In the experiments, we show that LSTN can help achieve interpretability in dialog models without much decrease in performance compared to end-to-end approaches.


Developing parsimonious ensembles using ensemble diversity within a reinforcement learning framework

arXiv.org Machine Learning

Heterogeneous ensembles built from the predictions of a wide variety and large number of diverse base predictors represent a potent approach to building predictive models for problems where the ideal base/individual predictor may not be obvious. Ensemble selection is an especially promising approach here, not only for improving prediction performance, but also because of its ability to select a collectively predictive subset, often a relatively small one, of the base predictors. In this paper, we present a set of algorithms that explicitly incorporate ensemble diversity, a known factor influencing predictive performance of ensembles, into a reinforcement learning framework for ensemble selection. We rigorously tested these approaches on several challenging problems and associated data sets, yielding that several of them produced more accurate ensembles than those that don't explicitly consider diversity. More importantly, these diversity-incorporating ensembles were much smaller in size, i.e., more parsimonious, than the latter types of ensembles. This can eventually aid the interpretation or reverse engineering of predictive models assimilated into the resultant ensemble(s).


Variational methods for Conditional Multimodal Deep Learning

arXiv.org Machine Learning

In this paper, we address the problem of conditional modality learning, whereby one is interested in generating one modality given the other. While it is straightforward to learn a joint distribution over multiple modalities using a deep multimodal architecture, we observe that such models aren't very effective at conditional generation. Hence, we address the problem by learning conditional distributions between the modalities. We use variational methods for maximizing the corresponding conditional log-likelihood. The resultant deep model, which we refer to as conditional multimodal autoencoder (CMMA), forces the latent representation obtained from a single modality alone to be `close' to the joint representation obtained from multiple modalities. We use the proposed model to generate faces from attributes. We show that the faces generated from attributes using the proposed model, are qualitatively and quantitatively more representative of the attributes from which they were generated, than those obtained by other deep generative models. We also propose a secondary task, whereby the existing faces are modified by modifying the corresponding attributes. We observe that the modifications in face introduced by the proposed model are representative of the corresponding modifications in attributes.