Goto

Collaborating Authors

 Government


Leveraging Multiple Artificial Intelligence Techniques to Improve the Responsiveness in Operations Planning: ASPEN for Orbital Express

AI Magazine

The challenging timeline for DARPAโ€™s Orbital Express mission demanded a flexible, responsive, and (above all) safe approach to mission planning. Mission planning for space is challenging because of the mixture of goals and constraints. Every space mission tries to squeeze all of the capacity possible out of the spacecraft. For Orbital Express, this means performing as many experiments as possible, while still keeping the spacecraft safe. Keeping the spacecraft safe can be very challenging because we need to maintain the correct thermal environment (or batteries might freeze), we need to avoid pointing cameras and sensitive sensors at the sun, we need to keep the spacecraft batteries charged, and we need to keep the two spacecraft from colliding... made more difficult as only one of the spacecraft had thrusters. Because the mission was a technology demonstration, pertinent planning information was learned during actual mission execution. For example, we didnโ€™t know for certain how long it would take to transfer propellant from one spacecraft to the other, although this was a primary mission goal. The only way to find out was to perform the task and monitor how long it actually took. This information led to amendments to procedures, which led to changes in the mission plan. In general, we used the ASPEN planner scheduler to generate and validate the mission plans. ASPEN is a planning system that allows us to enter all of the spacecraft constraints, the resources, the communications windows, and our objectives. ASPEN then could automatically plan our day. We enhanced ASPEN to enable it to reason about uncertainty. We also developed a model generator that would read the text of a procedure and translate it into an ASPEN model. Note that a model is the input to ASPEN that describes constraints, resources, and activities. These technologies had a significant impact on the success of the Orbital Express mission. Finally, we formulated a technique for converting procedural information to declarative information by transforming procedures into models of hierarchical task networks (HTNs). The impact of this effort on the mission was a significant reduction in (1) the execution time of the mission, (2) the daily staff required to produce plans, and (3) planning errors. Not a single miss-configured command was sent during operations.


Automated Scheduling for NASA's Deep Space Network

AI Magazine

This article describes the DSN scheduling wngine (DSE) component of a new scheduling system being deployed for NASA's deep space network. The DSE provides core automation functionality for scheduling the network, including the interpretation of scheduling requirements expressed by users, their elaboration into tracking passes, and the resolution of conflicts and constraint violations. The DSE incorporates both systematic search and repair-based algorithms, used for different phases and purposes in the overall system. It has been integrated with a web application which provides DSE functionality to all DSN users through a standard web browser, as part of a peer-to-peer schedule negotiation process for the entire network. The system has been deployed operationally and is in routine use, and is in the process of being extended to support long-range planning and forecasting, and near-real-time scheduling.


Space Applications of Artificial Intelligence

AI Magazine

We are pleased to introduce the space application issue articles in this issue of AI Magazine. The exploration of space is a testament to human curiosity and the desire to understand the universe that we inhabit. As many space agencies around the world design and deploy missions, it is apparent that there is a need for intelligent, exploring systems that can make decisions on their own in remote, potentially hostile environments. At the same time, the monetary cost of operating missions, combined with the growing complexity of the instruments and vehicles being deployed, make it apparent that substantial improvements can be made by the judicious use of automation in mission operations.


Consistent Classification Algorithms for Multi-class Non-Decomposable Performance Metrics

arXiv.org Machine Learning

We study consistency of learning algorithms for a multi-class performance metric that is a non-decomposable function of the confusion matrix of a classifier and cannot be expressed as a sum of losses on individual data points; examples of such performance metrics include the macro F-measure popular in information retrieval and the G-mean metric used in class-imbalanced problems. While there has been much work in recent years in understanding the consistency properties of learning algorithms for `binary' non-decomposable metrics, little is known either about the form of the optimal classifier for a general multi-class non-decomposable metric, or about how these learning algorithms generalize to the multi-class case. In this paper, we provide a unified framework for analysing a multi-class non-decomposable performance metric, where the problem of finding the optimal classifier for the performance metric is viewed as an optimization problem over the space of all confusion matrices achievable under the given distribution. Using this framework, we show that (under a continuous distribution) the optimal classifier for a multi-class performance metric can be obtained as the solution of a cost-sensitive classification problem, thus generalizing several previous results on specific binary non-decomposable metrics. We then design a consistent learning algorithm for concave multi-class performance metrics that proceeds via a sequence of cost-sensitive classification problems, and can be seen as applying the conditional gradient (CG) optimization method over the space of feasible confusion matrices. To our knowledge, this is the first efficient learning algorithm (whose running time is polynomial in the number of classes) that is consistent for a large family of multi-class non-decomposable metrics. Our consistency proof uses a novel technique based on the convergence analysis of the CG method.


Learning a Concept Hierarchy from Multi-labeled Documents

Neural Information Processing Systems

While topic models can discover patterns of word usage in large corpora, it is difficult to meld this unsupervised structure with noisy, human-provided labels, especially when the label space is large. In this paper, we present a model-Label to Hierarchy (L2H)-that can induce a hierarchy of user-generated labels and the topics associated with those labels from a set of multi-labeled documents. The model is robust enough to account for missing labels from untrained, disparate annotators and provide an interpretable summary of an otherwise unwieldy label set. We show empirically the effectiveness of L2H in predicting held-out words and labels for unseen documents.


Discriminative Metric Learning by Neighborhood Gerrymandering

Neural Information Processing Systems

We formulate the problem of metric learning for k nearest neighbor classification as a large margin structured prediction problem, with a latent variable representing the choice of neighbors and the task loss directly corresponding to classification error. We describe an efficient algorithm for exact loss augmented inference, and a fast gradient descent algorithm for learning in this model. The objective drives the metric to establish neighborhood boundaries that benefit the true class labels for the training points. Our approach, reminiscent of gerrymandering (redrawing of political boundaries to provide advantage to certain parties), is more direct in its handling of optimizing classification accuracy than those previously proposed. In experiments on a variety of data sets our method is shown to achieve excellent results compared to current state of the art in metric learning.


Asynchronous Anytime Sequential Monte Carlo

Neural Information Processing Systems

We introduce a new sequential Monte Carlo algorithm we call the particle cascade. The particle cascade is an asynchronous, anytime alternative to traditional sequential Monte Carlo algorithms that is amenable to parallel and distributed implementations. It uses no barrier synchronizations which leads to improved particle throughput and memory efficiency. It is an anytime algorithm in the sense that it can be run forever to emit an unbounded number of particles while keeping within a fixed memory budget. We prove that the particle cascade provides an unbiased marginal likelihood estimator which can be straightforwardly plugged into existing pseudo-marginal methods.


Learning Distributed Representations for Structured Output Prediction

Neural Information Processing Systems

In recent years, distributed representations of inputs have led to performance gains in many applications by allowing statistical information to be shared across inputs. However, the predicted outputs (labels, and more generally structures) are still treated as discrete objects even though outputs are often not discrete units of meaning. In this paper, we present a new formulation for structured prediction where we represent individual labels in a structure as dense vectors and allow semantically similar labels to share parameters. We extend this representation to larger structures by defining compositionality using tensor products to give a natural generalization of standard structured prediction approaches. We define a learning objective for jointly learning the model parameters and the label vectors and propose an alternating minimization algorithm for learning. We show that our formulation outperforms structural SVM baselines in two tasks: multiclass document classification and part-of-speech tagging.


Improved Distributed Principal Component Analysis

Neural Information Processing Systems

We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure for approximate PCA, one can use it to approximately solve problems such as $k$-means clustering and low rank approximation. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiency for a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for $k$-means clustering and related problems. Our empirical study on real world data shows a speedup of orders of magnitude, preserving communication with only a negligible degradation in solution quality. Some of these techniques we develop, such as input-sparsity subspace embeddings with high correctness probability with a dimension and sparsity independent of the error probability, may be of independent interest.


Greedy Subspace Clustering

Neural Information Processing Systems

We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses the sets to estimate the subspaces. As the geometric structure of the clusters (linear subspaces) forbids proper performance of general distance based approaches such as K-means, many model-specific methods have been proposed. In this paper, we provide new simple and efficient algorithms for this problem. Our statistical analysis shows that the algorithms are guaranteed exact (perfect) clustering performance under certain conditions on the number of points and the affinity be- tween subspaces. These conditions are weaker than those considered in the standard statistical literature. Experimental results on synthetic data generated from the standard unions of subspaces model demonstrate our theory. We also show that our algorithm performs competitively against state-of-the-art algorithms on real-world applications such as motion segmentation and face clustering, with much simpler implementation and lower computational cost.