AITopics | Barbu, Andrei

Collaborating Authors

Barbu, Andrei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Trajectory Prediction with Linguistic Representations

Kuo, Yen-Ling, Huang, Xin, Barbu, Andrei, McGill, Stephen G., Katz, Boris, Leonard, John J., Rosman, Guy

arXiv.org Artificial IntelligenceOct-19-2021

Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions. We present a novel trajectory prediction model that uses linguistic intermediate representations to forecast trajectories, and is trained using trajectory samples with partially annotated captions. The model learns the meaning of each of the words without direct per-word supervision. At inference time, it generates a linguistic description of trajectories which captures maneuvers and interactions over an extended time interval. This generated description is used to refine predictions of the trajectories of multiple agents. We train and validate our model on the Argoverse dataset, and demonstrate improved accuracy results in trajectory prediction. In addition, our model is more interpretable: it presents part of its reasoning in plain language as captions, which can aid model development and can aid in building confidence in the model before deploying it.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2110.09741

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.40)

Industry:

Transportation (0.46)
Information Technology (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception

Netanyahu, Aviv, Shu, Tianmin, Katz, Boris, Barbu, Andrei, Tenenbaum, Joshua B.

arXiv.org Artificial IntelligenceMar-19-2021

The ability to perceive and reason about social interactions in the context of physical environments is core to human social intelligence and human-machine cooperation. However, no prior dataset or benchmark has systematically evaluated physically grounded perception of complex social interactions that go beyond short actions, such as high-fiving, or simple group activities, such as gathering. In this work, we create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions by including social concepts such as helping another agent. PHASE consists of 2D animations of pairs of agents moving in a continuous space generated procedurally using a physics engine and a hierarchical planner. Agents have a limited field of view, and can interact with multiple objects, in an environment that has multiple landmarks and obstacles. Using PHASE, we design a social recognition task and a social prediction task. PHASE is validated with human experiments demonstrating that humans perceive rich interactions in the social events, and that the simulated agents behave similarly to humans. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE (SIMulation, Planning and Local Estimation), which outperforms state-of-the-art feed-forward neural networks. We hope that PHASE can serve as a difficult new challenge for developing new models that can recognize complex social interactions.

agent, neural network, us government, (21 more...)

arXiv.org Artificial Intelligence

2103.01933

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Social Events (0.82)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Add feedback

Compositional Networks Enable Systematic Generalization for Grounded Language Understanding

Kuo, Yen-Ling, Katz, Boris, Barbu, Andrei

arXiv.org Artificial IntelligenceAug-6-2020

Humans are remarkably flexible when understanding new sentences that include combinations of concepts they have never encountered before. Recent work has shown that while deep networks can mimic some human language abilities when presented with novel sentences, systematic variation uncovers the limitations in the language-understanding abilities of neural networks. We demonstrate that these limitations can be overcome by addressing the generalization challenges in a recently-released dataset, gSCAN, which explicitly measures how well a robotic agent is able to interpret novel ideas grounded in vision, e.g., novel pairings of adjectives and nouns. The key principle we employ is compositionality: that the compositional structure of networks should reflect the compositional structure of the problem domain they address, while allowing all other parameters and properties to be learned end-to-end with weak supervision. We build a general-purpose mechanism that enables robots to generalize their language understanding to compositional domains. Crucially, our base network has the same state-of-the-art performance as prior work, 97% execution accuracy, while at the same time generalizing its knowledge when prior work does not; for example, achieving 95% accuracy on novel adjective-noun compositions where previous work has 55% average accuracy. Robust language understanding without dramatic failures and without corner causes is critical to building safe and fair robots; we demonstrate the significant role that compositionality can play in achieving that goal.

artificial intelligence, attention map, natural language, (19 more...)

arXiv.org Artificial Intelligence

2008.02742

Country: North America > United States (0.46)

Genre: Research Report > Promising Solution (0.48)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.97)

Add feedback

ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models

Barbu, Andrei, Mayo, David, Alverio, Julian, Luo, William, Wang, Christopher, Gutfreund, Dan, Tenenbaum, Josh, Katz, Boris

Neural Information Processing SystemsMar-19-2020, 00:30:32 GMT

We collect a large real-world test set, ObjectNet, for object recognition with controls where object backgrounds, rotations, and imaging viewpoints are random. Most scientific experiments have controls, confounds which are removed from the data, to ensure that subjects cannot perform a task by exploiting trivial correlations in the data. Historically, large machine learning and computer vision datasets have lacked such controls. This has resulted in models that must be fine-tuned for new datasets and perform better on datasets than in real-world applications. When tested on ObjectNet, object detectors show a 40-45% drop in performance, with respect to their performance on other benchmarks, due to the controls for biases.

artificial intelligence, dataset, machine learning, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.65)

Add feedback

Seeing What You're Told: Sentence-Guided Activity Recognition In Video

Siddharth, N., Barbu, Andrei, Siskind, Jeffrey Mark

arXiv.org Artificial IntelligenceMay-28-2014

We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, thereby providing a medium, not only for top-down and bottom-up integration, but also for multi-modal integration between vision and language. We show how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) in the form of whole sentential descriptions mediated by a grammar, guides the activity-recognition process. Further, the utility and expressiveness of our framework is demonstrated by performing three separate tasks in the domain of multi-activity videos: sentence-guided focus of attention, generation of sentential descriptions of video, and query-based video search, simply by leveraging the framework in different manners.

artificial intelligence, natural language, preposition, (18 more...)

arXiv.org Artificial Intelligence

1308.4189

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.46)

Add feedback

Simultaneous Object Detection, Tracking, and Event Recognition

Barbu, Andrei, Michaux, Aaron, Narayanaswamy, Siddharth, Siskind, Jeffrey Mark

arXiv.org Artificial IntelligenceApr-12-2012

The common internal structure and algorithmic organization of object detection, detection-based tracking, and event recognition facilitates a general approach to integrating these three components. This supports multidirectional information flow between these components allowing object detection to influence tracking and event recognition and event recognition to influence tracking and object detection. The performance of the combination can exceed the performance of the components in isolation. This can be done with linear asymptotic complexity.

artificial intelligence, detection, image understanding, (17 more...)

arXiv.org Artificial Intelligence

1204.2741

Country:

North America > United States > California (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)

Industry: Government > Military (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Seeing Unseeability to See the Unseeable

Narayanaswamy, Siddharth, Barbu, Andrei, Siskind, Jeffrey Mark

arXiv.org Artificial IntelligenceApr-12-2012

We present a framework that allows an observer to determine occluded portions of a structure by finding the maximum-likelihood estimate of those occluded portions consistent with visible image evidence and a consistency model. Doing this requires determining which portions of the structure are occluded in the first place. Since each process relies on the other, we determine a solution to both problems in tandem. We extend our framework to determine confidence of one's assessment of which portions of an observed structure are occluded, and the estimate of that occluded structure, by determining the sensitivity of one's assessment to potential new observations. We further extend our framework to determine a robotic action whose execution would allow a new observation that would maximally increase one's confidence.

artificial intelligence, log feature, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1204.2801

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Video In Sentences Out

Barbu, Andrei, Bridge, Alexander, Burchill, Zachary, Coroian, Dan, Dickinson, Sven, Fidler, Sanja, Michaux, Aaron, Mussman, Sam, Narayanaswamy, Siddharth, Salvi, Dhaval, Schmidt, Lara, Shangguan, Jiangnan, Siskind, Jeffrey Mark, Waggoner, Jarrell, Wang, Song, Wei, Jinlian, Yin, Yifan, Zhang, Zhiqi

arXiv.org Artificial IntelligenceApr-12-2012

We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases,spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the track-to-role assignments, and changing body posture.

action class, artificial intelligence, natural language, (20 more...)

arXiv.org Artificial Intelligence

1204.2742

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)

Industry: Government > Military (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback