AITopics | McCallum, Andrew

Plotting

McCallum, Andrew

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples

Chang, Haw-Shiuan, Learned-Miller, Erik, McCallum, Andrew

Neural Information Processing SystemsDec-31-2017

Self-paced learning and hard example mining re-weight training instances to improve learning accuracy. This paper presents two improved alternatives based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD): the variance in predicted probability of the correct class across iterations of mini-batch SGD, and the proximity of the correct class probability to the decision threshold. Extensive experimental results on six datasets show that our methods reliably improve accuracy in various network architectures, including additional gains on top of other popular training techniques, such as residual learning, momentum, ADAM, batch normalization, dropout, and distillation.

deep learning, neural network, variance, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report (0.71)

Industry:

Education (0.68)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Improved Representation Learning for Predicting Commonsense Ontologies

Li, Xiang, Vilnis, Luke, McCallum, Andrew

arXiv.org Machine LearningAug-1-2017

Recent work in learning ontologies (hierarchical and partially-ordered structures) has leveraged the intrinsic geometry of spaces of learned representations to make predictions that automatically obey complex structural constraints. We explore two extensions of one such model, the order-embedding model for hierarchical relation learning, with an aim towards improved performance on text data for commonsense knowledge representation. Our first model jointly learns ordering relations and non-hierarchical knowledge in the form of raw text. Our second extension exploits the partial order structure of the training data to find long-distance triplet constraints among embeddings which are poorly enforced by the pairwise training procedure. We find that both incorporating free text and augmented training constraints improve over the original order-embedding model and other strong baselines.

improved representation learning, inductive learning, neural network, (17 more...)

arXiv.org Machine Learning

1708.00549

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.51)

Add feedback

End-to-End Learning for Structured Prediction Energy Networks

Belanger, David, Yang, Bishan, McCallum, Andrew

arXiv.org Machine LearningJul-15-2017

Structured Prediction Energy Networks (SPENs) are a simple, yet expressive family of structured prediction models (Belanger and McCallum, 2016). An energy function over candidate structured outputs is given by a deep network, and predictions are formed by gradient-based optimization. This paper presents end-to-end learning for SPENs, where the energy function is discriminatively trained by back-propagating through gradient-based prediction. In our experience, the approach is substantially more accurate than the structured SVM method of Belanger and McCallum (2016), as it allows us to use more sophisticated non-convex energies. We provide a collection of techniques for improving the speed, accuracy, and memory requirements of end-to-end SPENs, and demonstrate the power of our method on 7-Scenes image denoising and CoNLL-2005 semantic role labeling tasks. In both, inexact minimization of non-convex SPEN energies is superior to baseline methods that use simplistic energy functions that can be minimized exactly.

constraint-based reasoning, energy function, neural network, (16 more...)

arXiv.org Machine Learning

1703.05667

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.82)

Industry: Energy > Power Industry (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(6 more...)

Add feedback

SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications

Augenstein, Isabelle, Das, Mrinal, Riedel, Sebastian, Vikraman, Lakshmi, McCallum, Andrew

arXiv.org Machine LearningMay-2-2017

We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials. Although this was a new task, we had a total of 26 submissions across 3 evaluation scenarios. We expect the task and the findings reported in this paper to be relevant for researchers working on understanding scientific content, as well as the broader knowledge base population and information extraction communities.

artificial intelligence, proceedings, text processing, (16 more...)

arXiv.org Machine Learning

1704.02853

Country:

Europe (0.28)
North America > United States > Massachusetts (0.14)

Genre:

Overview (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)

Add feedback

An Online Hierarchical Algorithm for Extreme Clustering

Kobren, Ari, Monath, Nicholas, Krishnamurthy, Akshay, McCallum, Andrew

arXiv.org Machine LearningApr-6-2017

Many modern clustering methods scale well to a large number of data items, N, but not to a large number of clusters, K. This paper introduces PERCH, a new non-greedy algorithm for online hierarchical clustering that scales to both massive N and K--a problem setting we term extreme clustering. Our algorithm efficiently routes new data points to the leaves of an incrementally-built tree. Motivated by the desire for both accuracy and speed, our approach performs tree rotations for the sake of enhancing subtree purity and encouraging balancedness. We prove that, under a natural separability assumption, our non-greedy algorithm will produce trees with perfect dendrogram purity regardless of online data arrival order. Our experiments demonstrate that PERCH constructs more accurate trees than other tree-building clustering algorithms and scales well with both N and K, achieving a higher quality clustering than the strongest flat clustering competitor in nearly half the time.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1704.01858

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Learning a Natural Language Interface with Neural Programmer

Neelakantan, Arvind, Le, Quoc V., Abadi, Martin, McCallum, Andrew, Amodei, Dario

arXiv.org Machine LearningMar-2-2017

Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide the desired response when executed on the database. To our knowledge, this paper presents the first weakly supervised, end-to-end neural network model to induce such programs on a real-world dataset. We enhance the objective function of Neural Programmer, a neural network with built-in discrete operations, and apply it on WikiTableQuestions, a natural language question-answering dataset. The model is trained end-to-end with weak supervision of question-answer pairs, and does not require domain-specific grammars, rules, or annotations that are key elements in previous approaches to program induction. The main experimental result in this paper is that a single Neural Programmer model achieves 34.2% accuracy using only 10,000 examples with weak supervision. An ensemble of 15 models, with a trivial combination technique, achieves 37.7% accuracy, which is competitive to the current state-of-the-art accuracy of 37.1% obtained by a traditional natural language semantic parser.

deep learning, neural network, opération, (18 more...)

arXiv.org Machine Learning

1611.08945

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Bethe Projections for Non-Local Inference

Vilnis, Luke, Belanger, David, Sheldon, Daniel, McCallum, Andrew

arXiv.org Machine LearningNov-28-2016

Many inference problems in structured prediction are naturally solved by augmenting a tractable dependency structure with complex, non-local auxiliary objectives. This includes the mean field family of variational inference algorithms, soft- or hard-constrained inference using Lagrangian relaxation or linear programming, collective graphical models, and forms of semi-supervised learning such as posterior regularization. We present a method to discriminatively learn broad families of inference objectives, capturing powerful non-local statistics of the latent variables, while maintaining tractable and provably fast inference using non-Euclidean projected gradient descent with a distance-generating function given by the Bethe entropy. We demonstrate the performance and flexibility of our method by (1) extracting structured citations from research papers by learning soft global constraints, (2) achieving state-of-the-art results on a widely-used handwriting recognition task using a novel learned non-convex inference procedure, and (3) providing a fast and highly scalable algorithm for the challenging problem of inference in a collective graphical model applied to bird migration.

constraint-based reasoning, inference, optimization problem, (20 more...)

arXiv.org Machine Learning

1503.01397

Country:

Asia (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Ask the GRU: Multi-Task Learning for Deep Text Recommendations

Bansal, Trapit, Belanger, David, McCallum, Andrew

arXiv.org Machine LearningSep-9-2016

In a variety of application domains the content to be recommended to users is associated with text. This includes research papers, movies with associated plot summaries, news articles, blog posts, etc. Recommendation approaches based on latent factor models can be extended naturally to leverage text by employing an explicit mapping from text to factors. This enables recommendations for new, unseen content, and may generalize better, since the factors for all items are produced by a compactly-parametrized model. Previous work has used topic models or averages of word embeddings for this mapping. In this paper we present a method leveraging deep recurrent neural networks to encode the text sequence into a latent vector, specifically gated recurrent units (GRUs) trained end-to-end on the collaborative filtering task. For the task of scientific paper recommendation, this yields models with significantly higher accuracy. In cold-start scenarios, we beat the previous state-of-the-art, all of which ignore word order. Performance is further improved by multi-task learning, where the text encoder network is trained for a combination of content recommendation and item metadata prediction. This regularizes the collaborative filtering model, ameliorating the problem of sparsity of the observed rating matrix.

deep learning, neural network, recommendation, (18 more...)

arXiv.org Machine Learning

doi: 10.1145/2959100.2959180

1609.02116

Country: North America > United States > Massachusetts (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)
Information Technology (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Structured Prediction Energy Networks

Belanger, David, McCallum, Andrew

arXiv.org Machine LearningJun-23-2016

We introduce structured prediction energy networks (SPENs), a flexible framework for structured prediction. A deep architecture is used to define an energy function of candidate labels, and then predictions are produced by using back-propagation to iteratively optimize the energy with respect to the labels. This deep architecture captures dependencies between labels that would lead to intractable graphical models, and performs structure learning by automatically learning discriminative features of the structured output. One natural application of our technique is multi-label classification, which traditionally has required strict prior assumptions about the interactions between labels to ensure tractable learning and prediction. We are able to apply SPENs to multi-label problems with substantially larger label sets than previous applications of structured prediction, while modeling high-order interactions using minimal structural assumptions. Overall, deep learning provides remarkable tools for learning features of the inputs to a prediction problem, and this work extends these techniques to learning features of structured outputs. Our experiments provide impressive performance on a variety of benchmark multi-label classification tasks, demonstrate that our technique can be used to provide interpretable structure learning, and illuminate fundamental trade-offs between feed-forward and iterative structured prediction.

deep learning, neural network, prediction, (19 more...)

arXiv.org Machine Learning

1511.0635

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > Experimental Study (0.34)

Industry: Energy > Power Industry (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Reports on the 2015 AAAI Spring Symposium Series

Agarwal, Nitin (University of Arkansas at Little Rock) | Andrist, Sean (University of Wisconsin-Madison) | Bohus, Dan (Microsoft Research) | Fang, Fei (University of Southern California) | Fenstermacher, Laurie (Wright-Patterson Air Force Base) | Kagal, Lalana (Massachusetts Institute of Technology) | Kido, Takashi (Rikengenesis) | Kiekintveld, Christopher (University of Texas at El Paso) | Lawless, W. F. (Paine College) | Liu, Huan (Arizona State University) | McCallum, Andrew (University of Massachusetts) | Purohit, Hemant (Wright State University) | Seneviratne, Oshani (Massachusetts Institute of Technology) | Takadama, Keiki (University of Electro-Communications) | Taylor, Gavin (US Naval Academy)

AI MagazineSep-28-2015

The AAAI 2015 Spring Symposium Series was held Monday through Wednesday, March 23-25, at Stanford University near Palo Alto, California. The titles of the seven symposia were Ambient Intelligence for Health and Cognitive Enhancement, Applied Computational Game Theory, Foundations of Autonomy and Its (Cyber) Threats: From Individuals to Interdependence, Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches, Logical Formalizations of Commonsense Reasoning, Socio-Technical Behavior Mining: From Data to Decisions, Structured Data for Humanitarian Technologies: Perfect Fit or Overkill?

artificial intelligence, PROBLEM SOLVING, symposium, (3 more...)

AI Magazine

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.73)

Add feedback