AITopics | Qu, Yuxiao

Collaborating Authors

Qu, Yuxiao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Qu, Yuxiao, Yang, Matthew Y. R., Setlur, Amrith, Tunstall, Lewis, Beeching, Edward Emanuel, Salakhutdinov, Ruslan, Kumar, Aviral

arXiv.org Artificial IntelligenceMar-10-2025

Training models to effectively use test-time compute is crucial for improving the reasoning performance of LLMs. Current methods mostly do so via fine-tuning on search traces or running RL with 0/1 outcome reward, but do these approaches efficiently utilize test-time compute? Would these approaches continue to scale as the budget improves? In this paper, we try to answer these questions. We formalize the problem of optimizing test-time compute as a meta-reinforcement learning (RL) problem, which provides a principled perspective on spending test-time compute. This perspective enables us to view the long output stream from the LLM as consisting of several episodes run at test time and leads us to use a notion of cumulative regret over output tokens as a way to measure the efficacy of test-time compute. Akin to how RL algorithms can best tradeoff exploration and exploitation over training, minimizing cumulative regret would also provide the best balance between exploration and exploitation in the token stream. While we show that state-of-the-art models do not minimize regret, one can do so by maximizing a dense reward bonus in conjunction with the outcome 0/1 reward RL. This bonus is the ''progress'' made by each subsequent block in the output stream, quantified by the change in the likelihood of eventual success. Using these insights, we develop Meta Reinforcement Fine-Tuning, or MRT, a new class of fine-tuning methods for optimizing test-time compute. MRT leads to a 2-3x relative gain in performance and roughly a 1.5x gain in token efficiency for math reasoning compared to outcome-reward RL.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2503.07572

Country:

North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas > Upstream (0.54)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Harnessing Webpage UIs for Text-Rich Visual Understanding

Liu, Junpeng, Ou, Tianyue, Song, Yifan, Qu, Yuxiao, Lam, Wai, Xiong, Chenyan, Chen, Wenhu, Neubig, Graham, Yue, Xiang

arXiv.org Artificial IntelligenceNov-6-2024

Text-rich visual understanding-the ability to process environments where dense textual content is integrated with visuals-is crucial for multimodal large language models (MLLMs) to interact effectively with structured environments. To enhance this capability, we propose synthesizing general multimodal instructions from webpage UIs using text-based large language models (LLMs). Despite lacking direct visual input, text-based LLMs are able to process structured text representations from webpage accessibility trees. These instructions are then paired with UI screenshots to train multimodal models. We introduce MultiUI, a dataset containing 7.3 million samples from 1 million websites, covering diverse multimodal tasks and UI layouts. Models trained on MultiUI not only excel in web UI tasks-achieving up to a 48% improvement on VisualWebBench and a 19.1% boost in element accuracy on a web agent dataset Mind2Web-but also generalize surprisingly well to non-web UI tasks and even to non-UI domains, such as document understanding, OCR, and chart interpretation. These results highlight the broad applicability of web UI data for advancing text-rich visual understanding across various scenarios.

accessibility tree, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2410.13824

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning

Corrado, Nicholas E., Qu, Yuxiao, Balis, John U., Labiosa, Adam, Hanna, Josiah P.

arXiv.org Artificial IntelligenceOct-27-2023

Learning from demonstration (LfD) is a popular technique that uses expert demonstrations to learn robot control policies. However, the difficulty in acquiring expert-quality demonstrations limits the applicability of LfD methods: real-world data collection is often costly, and the quality of the demonstrations depends greatly on the demonstrator's abilities and safety concerns. A number of works have leveraged data augmentation (DA) to inexpensively generate additional demonstration data, but most DA works generate augmented data in a random fashion and ultimately produce highly suboptimal data. In this work, we propose Guided Data Augmentation (GuDA), a human-guided DA framework that generates expert-quality augmented data. The key insight of GuDA is that while it may be difficult to demonstrate the sequence of actions required to produce expert data, a user can often easily identify when an augmented trajectory segment represents task progress. Thus, the user can impose a series of simple rules on the DA process to automatically generate augmented samples that approximate expert behavior. To extract a policy from GuDA, we use off-the-shelf offline reinforcement learning and behavior cloning algorithms. We evaluate GuDA on a physical robot soccer task as well as simulated D4RL navigation tasks, a simulated autonomous driving task, and a simulated soccer task. Empirically, we find that GuDA enables learning from a small set of potentially suboptimal demonstrations and substantially outperforms a DA strategy that samples augmented data randomly.

demonstration, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2310.18247

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.49)
Leisure & Entertainment > Sports > Soccer (0.46)
Information Technology > Robotics & Automation (0.34)
Automobiles & Trucks (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

FLEE-GNN: A Federated Learning System for Edge-Enhanced Graph Neural Network in Analyzing Geospatial Resilience of Multicommodity Food Flows

Qu, Yuxiao, Rao, Jinmeng, Gao, Song, Zhang, Qianheng, Chao, Wei-Lun, Su, Yu, Miller, Michelle, Morales, Alfonso, Huber, Patrick

arXiv.org Artificial IntelligenceOct-19-2023

Within the networks is a global imperative to tackle increasing food agrifood systems, food supply networks are pivotal in upholding insecurity. However, the complexity of these networks, with their global food security and facilitating the transit, dissemination, multidimensional interactions and decisions, presents significant and sale of food. It's imperative that these networks demonstrate challenges. This paper proposes FLEE-GNN, a novel Federated resilience and sturdiness [12, 15, 21]. Learning System for Edge-Enhanced Graph Neural Network, However, the complexity inherent in them, arising from designed to overcome these challenges and enhance the analysis of diverse food needs, shipment timeframes and costs, promotional geospatial resilience of multicommodity food flow network, which strategies, cultural and environmental considerations, among is one type of spatial networks. FLEE-GNN addresses the limitations others, complicates the assessment of their durability and of current methodologies, such as entropy-based methods, in terms adaptability [2, 23]. Given the intricate nature of food supply of generalizability, scalability, and data privacy. It combines the networks, the concept of resilience is often interpreted in diverse robustness and adaptability of graph neural networks with the ways by different individuals and groups [6, 12, 16, 22]. The term privacy-conscious and decentralized aspects of federated learning "resilience" in this study predominantly pertains to the capacity of on food supply network resilience analysis across geographical the food flow networks to sustain essential food supplies across regions.

artificial intelligence, machine learning, resilience, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3615886.3627742

2310.13248

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology (1.00)
Food & Agriculture > Agriculture (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback