AITopics | Ma, Shuang

Collaborating Authors

Ma, Shuang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

Feng, Shengyu, Kong, Xiang, Ma, Shuang, Zhang, Aonan, Yin, Dong, Wang, Chong, Pang, Ruoming, Yang, Yiming

arXiv.org Artificial IntelligenceOct-9-2024

Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process supervision, which is costly to acquire. In this paper, we address these limitations by introducing a novel verification method based on Twisted Sequential Monte Carlo (TSMC). TSMC sequentially refines its sampling effort to focus exploration on promising candidates, resulting in more efficient generation of high-quality solutions. We apply TSMC to LLMs by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations. We empirically demonstrate the advantages of our method across multiple math benchmarks, and also validate our theoretical analysis of both our approach and existing verification methods.

importance weight, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.0192

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.85)

Add feedback

Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

Zheng, Ruijie, Liang, Yongyuan, Wang, Xiyao, Ma, Shuang, Daumé, Hal III, Xu, Huazhe, Langford, John, Palanisamy, Praveen, Basu, Kalyan Shankar, Huang, Furong

arXiv.org Artificial IntelligenceFeb-11-2024

We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier-TACO leverages a subset of multitask offline datasets for pretraining a general feature representation, which captures critical environmental dynamics and is fine-tuned using minimal expert demonstrations. It advances the temporal action contrastive learning (TACO) objective, known for state-of-the-art results in visual control tasks, by incorporating a novel negative example sampling strategy. This strategy is crucial in significantly boosting TACO's computational efficiency, making large-scale multitask offline pretraining feasible. Our extensive empirical evaluation in a diverse set of continuous control benchmarks including Deepmind Control Suite, MetaWorld, and LIBERO demonstrate Premier-TACO's effectiveness in pretraining visual representations, significantly enhancing few-shot imitation learning of novel tasks. Our code, pretraining data, as well as pretrained model checkpoints will be released at https://github.com/PremierTACO/premier-taco.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2402.06187

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

Zheng, Ruijie, Wang, Xiyao, Sun, Yanchao, Ma, Shuang, Zhao, Jieyu, Xu, Huazhe, Daumé, Hal III, Huang, Furong

arXiv.org Artificial IntelligenceOct-26-2023

Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle. Prior works have attempted to address this challenge by creating self-supervised auxiliary tasks, aiming to enrich the agent's learned representations with control-relevant information for future state prediction. However, these objectives are often insufficient to learn representations that can represent the optimal policy or value function, and they often consider tasks with small, abstract discrete action spaces and thus overlook the importance of action representation learning in continuous control. In this paper, we introduce TACO: Temporal Action-driven COntrastive Learning, a simple yet powerful temporal contrastive learning approach that facilitates the concurrent acquisition of latent state and action representations for agents. TACO simultaneously learns a state and an action representation by optimizing the mutual information between representations of current states paired with action sequences and representations of the corresponding future states. Theoretically, TACO can be shown to learn state and action representations that encompass sufficient information for control, thereby improving sample efficiency. For online RL, TACO achieves 40% performance boost after one million environment interaction steps on average across nine challenging visual continuous control tasks from Deepmind Control Suite. In addition, we show that TACO can also serve as a plug-and-play module adding to existing offline visual RL methods to establish the new state-of-the-art performance for offline visual RL across offline datasets with varying quality.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2306.13229

Country:

North America > United States > Maryland (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Government > Military (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training

Wei, Yao, Sun, Yanchao, Zheng, Ruijie, Vemprala, Sai, Bonatti, Rogerio, Chen, Shuhang, Madaan, Ratnesh, Ba, Zhongjie, Kapoor, Ashish, Ma, Shuang

arXiv.org Artificial IntelligenceOct-9-2023

We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning. DualMind uses a novel "Dual-phase" training strategy that emulates how humans learn to act in the world. The model first learns fundamental common knowledge through a self-supervised objective tailored for control tasks and then learns how to make decisions based on different contexts through imitating behaviors conditioned on given prompts. DualMind can handle tasks across domains, scenes, and embodiments using just a single set of model weights and can execute zero-shot prompting without requiring task-specific fine-tuning. We evaluate DualMind on MetaWorld and Habitat through extensive experiments and demonstrate its superior generalizability compared to previous techniques, outperforming other generalist agents by over 50$\%$ and 70$\%$ on Habitat and MetaWorld, respectively. On the 45 tasks in MetaWorld, DualMind achieves over 30 tasks at a 90$\%$ success rate.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2307.07909

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

SMART: Self-supervised Multi-task pretrAining with contRol Transformers

Sun, Yanchao, Ma, Shuang, Madaan, Ratnesh, Bonatti, Rogerio, Huang, Furong, Kapoor, Ashish

arXiv.org Artificial IntelligenceJan-24-2023

Self-supervised pretraining has been extensively studied in language and vision domains, where a unified model can be easily adapted to various downstream tasks by pretraining representations without explicit labels. When it comes to sequential decision-making tasks, however, it is difficult to properly design such a pretraining approach that can cope with both high-dimensional perceptual information and the complexity of sequential control over long interaction horizons. The challenge becomes combinatorially more complex if we want to pretrain representations amenable to a large variety of tasks. To tackle this problem, in this work, we formulate a general pretraining-finetuning pipeline for sequential decision making, under which we propose a generic pretraining framework \textit{Self-supervised Multi-task pretrAining with contRol Transformer (SMART)}. By systematically investigating pretraining regimes, we carefully design a Control Transformer (CT) coupled with a novel control-centric pretraining objective in a self-supervised manner. SMART encourages the representation to capture the common essential information relevant to short-term control and long-term control, which is transferrable across tasks. We show by extensive experiments in DeepMind Control Suite that SMART significantly improves the learning efficiency among seen and unseen downstream tasks and domains under different learning scenarios including Imitation Learning (IL) and Reinforcement Learning (RL). Benefiting from the proposed control-centric objective, SMART is resilient to distribution shift between pretraining and finetuning, and even works well with low-quality pretraining datasets that are randomly collected.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2301.09816

Country: North America > United States > Maryland (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning

McDuff, Daniel, Song, Yale, Lee, Jiyoung, Vineet, Vibhav, Vemprala, Sai, Gyde, Nicholas, Salman, Hadi, Ma, Shuang, Sohn, Kwanghoon, Kapoor, Ashish

arXiv.org Artificial IntelligenceJun-24-2021

The ability to perform causal and counterfactual reasoning are central properties of human intelligence. Decision-making systems that can perform these types of reasoning have the potential to be more generalizable and interpretable. Simulations have helped advance the state-of-the-art in this domain, by providing the ability to systematically vary parameters (e.g., confounders) and generate examples of the outcomes in the case of counterfactual scenarios. However, simulating complex temporal causal events in multi-agent scenarios, such as those that exist in driving and vehicle navigation, is challenging. To help address this, we present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning in the safety-critical context. A core component of our work is to introduce \textit{agency}, such that it is simple to define and create complex scenarios using high-level definitions. The vehicles then operate with agency to complete these objectives, meaning low-level behaviors need only be controlled if necessary. We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment. Finally, we highlight challenges and opportunities for future work.

artificial intelligence, causal discovery and reasoning, complex simulation, (1 more...)

arXiv.org Artificial Intelligence

2106.13364

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency

Whitehill, Matt, Ma, Shuang, McDuff, Daniel, Song, Yale

arXiv.org Machine LearningOct-25-2019

Current multi-reference style transfer models for Text-to-Speech (TTS) perform sub-optimally on disjoints datasets, where one dataset contains only a single style class for one of the style dimensions. These models generally fail to produce style transfer for the dimension that is underrepresented in the dataset. In this paper, we propose an adversarial cycle consistency training scheme with paired and unpaired triplets to ensure the use of information from all style dimensions. During training, we incorporate unpaired triplets with randomly selected reference audio samples and encourage the synthesized speech to preserve the appropriate styles using adversarial cycle consistency. We use this method to transfer emotion from a dataset containing four emotions to a dataset with only a single emotion. This results in a 78% improvement in style transfer (based on emotion classification) with minimal reduction in fidelity and naturalness. In subjective evaluations our method was consistently rated as closer to the reference style than the baseline. Synthesized speech samples are available at: https://sites.google.com/view/adv-cycle-consistent-tts

artificial intelligence, speech recognition, style dimension, (16 more...)

arXiv.org Machine Learning

1910.11958

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)

Add feedback