AITopics | Baldridge, Jason

Plotting

Baldridge, Jason

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MURAL: Multimodal, Multitask Retrieval Across Languages

Jain, Aashi, Guo, Mandy, Srinivasan, Krishna, Chen, Ting, Kudugunta, Sneha, Jia, Chao, Yang, Yinfei, Baldridge, Jason

arXiv.org Artificial IntelligenceSep-10-2021

Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages. We use both types of pairs in MURAL (MUltimodal, MUltitask Representations Across Languages), a dual encoder that solves two tasks: 1) image-text matching and 2) translation pair matching. By incorporating billions of translation pairs, MURAL extends ALIGN (Jia et al. PMLR'21)--a state-of-the-art dual encoder learned from 1.8 billion noisy image-text pairs. When using the same encoders, MURAL's performance matches or exceeds ALIGN's cross-modal retrieval performance on well-resourced languages across several datasets. More importantly, it considerably improves performance on under-resourced languages, showing that text-text learning can overcome a paucity of image-caption examples for these languages. On the Wikipedia Image-Text dataset, for example, MURAL-base improves zero-shot mean recall by 8.1% on average for eight under-resourced languages and by 6.8% on average when fine-tuning. We additionally show that MURAL's text representations cluster not only with respect to genealogical connections but also based on areal linguistics, such as the Balkan Sprachbund.

computational linguistics, machine translation, text processing, (20 more...)

arXiv.org Artificial Intelligence

2109.05125

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

PanGEA: The Panoramic Graph Environment Annotation Toolkit

Ku, Alexander, Anderson, Peter, Pont-Tuset, Jordi, Baldridge, Jason

arXiv.org Artificial IntelligenceMar-23-2021

PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightweight toolkit for collecting speech and text annotations in photo-realistic 3D environments. PanGEA immerses annotators in a web-based simulation and allows them to move around easily as they speak and/or listen. It includes database and cloud storage integration, plus utilities for automatically aligning recorded speech with manual transcriptions and the virtual pose of the annotators. Out of the box, PanGEA supports two tasks -- collecting navigation instructions and navigation instruction following -- and it could be easily adapted for annotating walking tours, finding and labeling landmarks or objects, and similar tasks. We share best practices learned from using PanGEA in a 20,000 hour annotation effort to collect the Room-Across-Room dataset. We hope that our open-source annotation toolkit and insights will both expedite future data collection efforts and spur innovation on the kinds of grounded language tasks such environments can support.

artificial intelligence, instruction, natural language, (14 more...)

arXiv.org Artificial Intelligence

2103.12703

Country: Asia > China (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback

On the Evaluation of Vision-and-Language Navigation Instructions

Zhao, Ming, Anderson, Peter, Jain, Vihan, Wang, Su, Ku, Alexander, Baldridge, Jason, Ie, Eugene

arXiv.org Artificial IntelligenceJan-25-2021

Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions. However, existing instruction generators have not been comprehensively evaluated, and the automatic evaluation metrics used to develop them have not been validated. Using human wayfinders, we show that these generators perform on par with or only slightly better than a template-based generator and far worse than human instructors. Furthermore, we discover that BLEU, ROUGE, METEOR and CIDEr are ineffective for evaluating grounded navigation instructions. To improve instruction evaluation, we propose an instruction-trajectory compatibility model that operates without reference instructions. Our model shows the highest correlation with human wayfinding outcomes when scoring individual instructions. For ranking instruction generation systems, if reference instructions are available we recommend using SPICE.

deep learning, instruction, neural network, (18 more...)

arXiv.org Artificial Intelligence

2101.10504

Country:

Europe (0.28)
North America > United States > Maryland (0.14)

Genre: Research Report (1.00)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Text-to-Image Generation Grounded by Fine-Grained User Attention

Koh, Jing Yu, Baldridge, Jason, Lee, Honglak, Yang, Yinfei

arXiv.org Artificial IntelligenceNov-7-2020

Localized Narratives is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases. We propose TReCS, a sequential model that exploits this grounding to generate images. TReCS uses descriptions to retrieve segmentation masks and predict object labels aligned with mouse traces. These alignments are used to select and position masks to generate a fully covered segmentation canvas; the final image is produced by a segmentation-to-image generator using this canvas. This multi-step, retrieval-based approach outperforms existing direct text-to-image generation models on both automatic metrics and human evaluations: overall, its generated images are more photo-realistic and better match descriptions.

artificial intelligence, narrative, neural network, (17 more...)

arXiv.org Artificial Intelligence

2011.03775

Country: Europe > Italy (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

Ku, Alexander, Anderson, Peter, Patel, Roma, Ie, Eugene, Baldridge, Jason

arXiv.org Artificial IntelligenceOct-15-2020

RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and instructions) than other VLN datasets. It emphasizes the role of language in VLN by addressing known biases in paths and eliciting more references to visible entities. Furthermore, each word in an instruction is time-aligned to the virtual poses of instruction creators and validators. We establish baseline scores for monolingual and multilingual settings and multitask learning when including Room-to-Room annotations (Anderson et al., 2018b). We also provide results for a model that learns from synchronized pose traces by focusing only on portions of the panorama attended to in human Figure 1: RxR's instructions are densely grounded to demonstrations. The size, scope and detail of the visual scene by aligning the annotator's virtual pose RxR dramatically expands the frontier for research to their spoken instructions for navigating a path.

deep learning, instruction, neural network, (24 more...)

arXiv.org Artificial Intelligence

2010.07954

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.48)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Effective and General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping

Magalhaes, Gabriel, Jain, Vihan, Ku, Alexander, Ie, Eugene, Baldridge, Jason

arXiv.org Artificial IntelligenceJul-11-2019

In instruction conditioned navigation, agents interpret natural language and their surroundings to navigate through an environment. Datasets for studying this task typically contain pairs of these instructions and reference trajectories. Yet, most evaluation metrics used thus far fail to properly account for the latter, relying instead on insufficient similarity comparisons. We address fundamental flaws in previously used metrics and show how Dynamic Time Warping (DTW), a long known method of measuring similarity between two time series, can be used for evaluation of navigation agents. For such, we define the normalized Dynamic Time Warping (nDTW) metric, that softly penalizes deviations from the reference path, is naturally sensitive to the order of the nodes composing each path, is suited for both continuous and graph-based evaluations, and can be efficiently calculated. Further, we define SDTW, which constrains nDTW to only successful paths. We collect human similarity judgments for simulated paths and find nDTW correlates better with human rankings than all other metrics. We also demonstrate that using nDTW as a reward signal for Reinforcement Learning navigation agents improves their performance on both the Room-to-Room (R2R) and Room-for-Room (R4R) datasets. The R4R results in particular highlight the superiority of SDTW over previous success-constrained metrics.

artificial intelligence, dynamic time, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1907.05446

Country: Oceania > Australia (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation

Jain, Vihan, Magalhaes, Gabriel, Ku, Alexander, Vaswani, Ashish, Ie, Eugene, Baldridge, Jason

arXiv.org Artificial IntelligenceJun-4-2019

Advances in learning and representations have reinvigorated work that connects language to other modalities. A particularly exciting direction is Vision-and-Language Navigation (VLN), in which agents interpret natural language instructions and visual scenes to move through environments and reach goals. Despite recent progress, current research leaves unclear how much of a role language understanding plays in this task, especially because dominant evaluation metrics have focused on Figure 1: It's the journey, not just the goal. To give goal completion rather than the sequence of actions language its due place in VLN, we compose paths in corresponding to the instructions. Here, the R2R dataset to create longer, twistier R4R paths we highlight shortcomings of current metrics (blue). Under commonly used metrics, agents that head for the Room-to-Room dataset (Anderson et al., straight to the goal (red) are not penalized for ignoring 2018b) and propose a new metric, Coverage the language instructions: for instance, SPL yields a weighted by Length Score (CLS). We also show perfect 1.0 score for the red and only 0.17 for the orange that the existing paths in the dataset are not path. In contrast, our proposed CLS metric measures ideal for evaluating instruction following because fidelity to the reference path, strongly preferring the they are direct-to-goal shortest paths.

deep learning, instruction, neural network, (21 more...)

arXiv.org Artificial Intelligence

1905.12255

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning

Garrette, Dan (University of Texas at Austin) | Dyer, Chris (Carnegie Mellon University) | Baldridge, Jason (University of Texas at Austin) | Smith, Noah A. (Carnegie Mellon University)

AAAI ConferencesMar-6-2015

Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that, in combination with a small universal set of rules, specify the syntactic configurations in which they may occur. Categories are selected from a large, recursively-defined set; this leads to high word-to-category ambiguity, which is one of the primary factors that make learning CCG parsers difficult, especially in the face of little data. Previous work has shown that learning sequence models for CCG tagging can be improved by using linguistically-motivated prior probability distributions over potential categories. We extend this approach to the task of learning a CCG parser from weak supervision. We present a Bayesian formulation for CCG parser induction that assumes only supervision in the form of an incomplete tag dictionary mapping some word types to sets of potential categories. Our approach outperforms a baseline model trained with uniform priors by exploiting universal, intrinsic properties of the CCG formalism to bias the model toward simpler, more cross-linguistically common categories.

bayesian inference, category, us government, (21 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles

DeLozier, Grant (The University of Texas at Austin) | Baldridge, Jason (The University of Texas at Austin) | London, Loretta (The University of Texas at Austin)

AAAI ConferencesMar-6-2015

Toponym resolution, or grounding names of places to their actual locations, is an important problem in analysis of both historical corpora and present-day news and web content. Recent approaches have shifted from rule-based spatial minimization methods to machine learned classifiers that use features of the text surrounding a toponym. Such methods have been shown to be highly effective, but they crucially rely on gazetteers and are unable to handle unknown place names or locations. We address this limitation by modeling the geographic distributions of words over the earth's surface: we calculate the geographic profile of each word based on local spatial statistics over a set of geo-referenced language models. These geo-profiles can be further refined by combining in-domain data with background statistics from Wikipedia. Our resolver computes the overlap of all geo-profiles in a given text span; without using a gazetteer, it performs on par with existing classifiers. When combined with a gazetteer, it achieves state-of-the-art performance for two standard toponym resolution corpora (TR-CoNLL and Civil War). Furthermore, it dramatically improves recall when toponyms are identified by named entity recognizers, which often (correctly) find non-standard variants of toponyms.

artificial intelligence, natural language, toponym, (16 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas > Travis County > Austin (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.35)

Add feedback