AITopics | Peng, Nanyun

Collaborating Authors

Peng, Nanyun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Masked Path Modeling for Vision-and-Language Navigation

Dou, Zi-Yi, Gao, Feng, Peng, Nanyun

arXiv.org Artificial IntelligenceMay-23-2023

Vision-and-language navigation (VLN) agents are trained to navigate in real-world environments by following natural language instructions. A major challenge in VLN is the limited availability of training data, which hinders the models' ability to generalize effectively. Previous approaches have attempted to address this issue by introducing additional supervision during training, often requiring costly human-annotated data that restricts scalability. In this paper, we introduce a masked path modeling (MPM) objective, which pretrains an agent using self-collected data for downstream navigation tasks. Our proposed method involves allowing the agent to actively explore navigation environments without a specific goal and collect the paths it traverses. Subsequently, we train the agent on this collected data to reconstruct the original path given a randomly masked subpath. This way, the agent can actively accumulate a diverse and substantial amount of data while learning conditional action generation. To evaluate the effectiveness of our technique, we conduct experiments on various VLN datasets and demonstrate the versatility of MPM across different levels of instruction complexity. Our results exhibit significant improvements in success rates, with enhancements of 1.32\%, 1.05\%, and 1.19\% on the val-unseen split of the Room-to-Room, Room-for-Room, and Room-across-Room datasets, respectively. Furthermore, we conduct an analysis that highlights the potential for additional improvements when the agent is allowed to explore unseen environments prior to testing.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.14268

Country: North America > United States > California (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sequentially Controlled Text Generation

Spangher, Alexander, Hua, Xinyu, Ming, Yao, Peng, Nanyun

arXiv.org Artificial IntelligenceJan-5-2023

While GPT-2 generates sentences that are remarkably human-like, longer documents can ramble and do not follow human-like writing structure. We study the problem of imposing structure on long-range text. We propose a novel controlled text generation task, sequentially controlled text generation, and identify a dataset, NewsDiscourse as a starting point for this task. We develop a sequential controlled text generation pipeline with generation and editing. We test different degrees of structural awareness and show that, in general, more structural awareness results in higher control-accuracy, grammaticality, coherency and topicality, approaching human-level writing performance.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2301.02299

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Media > News (1.00)
Law (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Generalized Decoding for Pixel, Image, and Language

Zou, Xueyan, Dou, Zi-Yi, Yang, Jianwei, Gan, Zhe, Li, Linjie, Li, Chunyuan, Dai, Xiyang, Behl, Harkirat, Wang, Jianfeng, Yuan, Lu, Peng, Nanyun, Wang, Lijuan, Lee, Yong Jae, Gao, Jianfeng

arXiv.org Artificial IntelligenceDec-21-2022

We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decodert takes as input two types of queries: (i) generic non-semantic queries and (ii) semantic queries induced from text inputs, to decode different pixel-level and token-level outputs in the same semantic space. With such a novel design, X-Decoder is the first work that provides a unified way to support all types of image segmentation and a variety of vision-language (VL) tasks. Further, our design enables seamless interactions across tasks at different granularities and brings mutual benefits by learning a common and rich pixel-level visual-semantic understanding space, without any pseudo-labeling. After pretraining on a mixed set of a limited amount of segmentation data and millions of image-text pairs, X-Decoder exhibits strong transferability to a wide range of downstream tasks in both zero-shot and finetuning settings. Notably, it achieves (1) state-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets; (2) better or competitive finetuned performance to other generalist and specialist models on segmentation and VL tasks; and (3) flexibility for efficient finetuning and novel task composition (e.g., referring captioning and image editing). Code, demo, video, and visualization are available at https://x-decoder-vl.github.io.

machine learning, natural language, x-decoder, (17 more...)

arXiv.org Artificial Intelligence

2212.1127

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (0.81)

Industry:

Transportation > Ground > Road (0.46)
Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Towards Robust NLG Bias Evaluation with Syntactically-diverse Prompts

Aggarwal, Arshiya, Sun, Jiao, Peng, Nanyun

arXiv.org Artificial IntelligenceDec-3-2022

We present a robust methodology for evaluating biases in natural language generation(NLG) systems. Previous works use fixed hand-crafted prefix templates with mentions of various demographic groups to prompt models to generate continuations for bias analysis. These fixed prefix templates could themselves be specific in terms of styles or linguistic structures, which may lead to unreliable fairness conclusions that are not representative of the general trends from tone varying prompts. To study this problem, we paraphrase the prompts with different syntactic structures and use these to evaluate demographic bias in NLG systems. Our results suggest similar overall bias trends but some syntactic structures lead to contradictory conclusions compared to past works. We show that our methodology is more robust and that some syntactic structures prompt more toxic content while others could prompt less biased generation. This suggests the importance of not relying on a fixed syntactic structure and using tone-invariant prompts. Introducing syntactically-diverse prompts can achieve more robust NLG (bias) evaluation.

artificial intelligence, natural language, syntactic structure, (15 more...)

arXiv.org Artificial Intelligence

2212.017

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.86)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)

Add feedback

A Moral- and Event- Centric Inspection of Gender Bias in Fairy Tales at A Large Scale

Zhou, Zhixuan, Sun, Jiao, Pei, Jiaxin, Peng, Nanyun, Xiong, Jinjun

arXiv.org Artificial IntelligenceNov-25-2022

Fairy tales are a common resource for young children to learn a language or understand how a society works. However, gender bias, e.g., stereotypical gender roles, in this literature may cause harm and skew children's world view. Instead of decades of qualitative and manual analysis of gender bias in fairy tales, we computationally analyze gender bias in a fairy tale dataset containing 624 fairy tales from 7 different cultures. We specifically examine gender difference in terms of moral foundations, which are measures of human morality, and events, which reveal human activities associated with each character. We find that the number of male characters is two times that of female characters, showing a disproportionate gender representation. Our analysis further reveal stereotypical portrayals of both male and female characters in terms of moral foundations and events. Female characters turn out more associated with care-, loyalty- and sanctity- related moral words, while male characters are more associated with fairness- and authority- related moral words. Female characters' events are often about emotion (e.g., weep), appearance (e.g., comb), household (e.g., bake), etc.; while male characters' events are more about profession (e.g., hunt), violence (e.g., destroy), justice (e.g., judge), etc. Gender bias in terms of moral foundations shows an obvious difference across cultures. For example, female characters are more associated with care and sanctity in high uncertainty-avoidance cultures which are less open to changes and unpredictability. Based on the results, we propose implications for children's literature and early literacy research.

artificial intelligence, natural language, social media, (17 more...)

arXiv.org Artificial Intelligence

2211.14358

Country: North America > United States > California (0.46)

Genre: Research Report > Experimental Study (0.47)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Character-Centric Story Visualization via Visual Planning and Token Alignment

Chen, Hong, Han, Rujun, Wu, Te-Lin, Nakayama, Hideki, Peng, Nanyun

arXiv.org Artificial IntelligenceOct-22-2022

Story visualization advances the traditional text-to-image generation by enabling multiple image generation based on a complete story. This task requires machines to 1) understand long text inputs and 2) produce a globally consistent image sequence that illustrates the contents of the story. A key challenge of consistent story visualization is to preserve characters that are essential in stories. To tackle the challenge, we propose to adapt a recent work that augments Vector-Quantized Variational Autoencoders (VQ-VAE) with a text-tovisual-token (transformer) architecture. Specifically, we modify the text-to-visual-token module with a two-stage framework: 1) character token planning model that predicts the visual tokens for characters only; 2) visual token completion model that generates the remaining visual token sequence, which is sent to VQ-VAE for finalizing image generations. To encourage characters to appear in the images, we further train the two-stage framework with a character-token alignment objective. Extensive experiments and evaluations demonstrate that the proposed method excels at preserving characters and can produce higher quality image sequences compared with the strong baselines. Codes can be found in https://github.com/sairin1202/VP-CSV

artificial intelligence, machine learning, sequence, (16 more...)

arXiv.org Artificial Intelligence

2210.08465

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HyperExpan: Taxonomy Expansion with Hyperbolic Representation Learning

Ma, Mingyu Derek, Chen, Muhao, Wu, Te-Lin, Peng, Nanyun

arXiv.org Artificial IntelligenceSep-21-2021

Taxonomies are valuable resources for many applications, but the limited coverage due to the expensive manual curation process hinders their general applicability. Prior works attempt to automatically expand existing taxonomies to improve their coverage by learning concept embeddings in Euclidean space, while taxonomies, inherently hierarchical, more naturally align with the geometric properties of a hyperbolic space. In this paper, we present HyperExpan, a taxonomy expansion algorithm that seeks to preserve the structure of a taxonomy in a more expressive hyperbolic embedding space and learn to represent concepts and their relations with a Hyperbolic Graph Neural Network (HGNN). Specifically, HyperExpan leverages position embeddings to exploit the structure of the existing taxonomies, and characterizes the concept profile information to support the inference on unseen concepts during training. Experiments show that our proposed HyperExpan outperforms baseline models with representation learning in a Euclidean feature space and achieves state-of-the-art performance on the taxonomy expansion benchmarks.

artificial intelligence, neural network, taxonomy, (19 more...)

arXiv.org Artificial Intelligence

2109.105

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

Yin, Da, Li, Liunian Harold, Hu, Ziniu, Peng, Nanyun, Chang, Kai-Wei

arXiv.org Artificial IntelligenceSep-14-2021

Commonsense is defined as the knowledge that is shared by everyone. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenarios of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models' ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard multimodal commonsense benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition. Dataset and code are released at https://github.com/WadeYin9712/GD-VCR.

commonsense reasoning, qa pair, us government, (15 more...)

arXiv.org Artificial Intelligence

2109.0686

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.66)
Government > Regional Government > North America Government > United States Government (0.46)
Consumer Products & Services > Restaurants (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

DEGREE: A Data-Efficient Generative Event Extraction Model

Hsu, I-Hung, Huang, Kuan-Hao, Boschee, Elizabeth, Miller, Scott, Natarajan, Prem, Chang, Kai-Wei, Peng, Nanyun

arXiv.org Artificial IntelligenceSep-14-2021

Event extraction (EE) aims to identify structured events, including event triggers and their corresponding arguments, from unstructured text. Most of the existing works rely on a large number of labeled instances to train models, while the labeled data could be expensive to be obtained. In this work, we present a data-efficient event extraction method by formulating event extraction as a natural language generation problem. The formulation allows us to inject knowledge of label semantics, event structure, and output dependencies into the model. Given a passage and an event type, our model learns to summarize this passage into a templated sentence in a predefined structure. The template is event-type-specific, manually created, and contains event trigger and argument information. Lastly, a rule-based algorithm is used to derive the trigger and argument predictions from the generated sentence. Our method inherently enjoys the following benefits: (1) The pretraining of the generative language models help incorporate the semantics of the labels for generative EE. (2) The autoregressive generation process and our end-to-end design for extracting triggers and arguments force the model to capture the dependencies among the output triggers and their arguments. (3) The predefined templates form concrete yet flexible rules to hint the models about the valid patterns for each event type, reducing the models' burden to learn structures from the data. Empirical results show that our model achieves superior performance over strong baselines on EE tasks in the low data regime and achieves competitive results to the current state-of-the-art when more data becomes available.

egree, inductive learning, text processing, (20 more...)

arXiv.org Artificial Intelligence

2108.12724

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

HypoGen: Hyperbole Generation with Commonsense and Counterfactual Knowledge

Tian, Yufei, Sridhar, Arvind krishna, Peng, Nanyun

arXiv.org Artificial IntelligenceSep-10-2021

A hyperbole is an intentional and creative exaggeration not to be taken literally. Despite its ubiquity in daily life, the computational explorations of hyperboles are scarce. In this paper, we tackle the under-explored and challenging task: sentence-level hyperbole generation. We start with a representative syntactic pattern for intensification and systematically study the semantic (commonsense and counterfactual) relationships between each component in such hyperboles. Next, we leverage the COMeT and reverse COMeT models to do commonsense and counterfactual inference. We then generate multiple hyperbole candidates based on our findings from the pattern, and train neural classifiers to rank and select high-quality hyperboles. Automatic and human evaluations show that our generation method is able to generate hyperboles creatively with high success rate and intensity scores.

hyperbole, neural network, text processing, (21 more...)

arXiv.org Artificial Intelligence

2109.05097

Country: North America > United States > California (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback