AITopics | Peng, Xiangyu

Collaborating Authors

Peng, Xiangyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Peng, Xiangyu, Zheng, Zangwei, Shen, Chenhui, Young, Tom, Guo, Xinying, Wang, Binluo, Xu, Hang, Liu, Hongxin, Jiang, Mingyan, Li, Wenjun, Wang, Yuhui, Ye, Anbang, Ren, Gang, Ma, Qianran, Liang, Wanying, Lian, Xiang, Wu, Xiwen, Zhong, Yuting, Li, Zhuangyan, Gong, Chaoyu, Lei, Guojun, Cheng, Leijun, Zhang, Limin, Li, Minghao, Zhang, Ruijie, Hu, Silan, Huang, Shijie, Wang, Xiaokang, Zhao, Yuanheng, Wang, Yuqi, Wei, Ziang, You, Yang

arXiv.org Artificial IntelligenceMar-12-2025

Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.09642

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

BingoGuard: LLM Content Moderation Tools with Risk Levels

Yin, Fan, Laban, Philippe, Peng, Xiangyu, Zhou, Yilun, Mao, Yixin, Vats, Vaibhav, Ross, Linnea, Agarwal, Divyansh, Xiong, Caiming, Wu, Chien-Sheng

arXiv.org Artificial IntelligenceMar-9-2025

Malicious content generated by large language models (LLMs) can pose varying degrees of harm. Although existing LLM-based moderators can detect harmful content, they struggle to assess risk levels and may miss lower-risk outputs. Accurate risk assessment allows platforms with different safety thresholds to tailor content filtering and rejection. In this paper, we introduce per-topic severity rubrics for 11 harmful topics and build BingoGuard, an LLM-based moderation system designed to predict both binary safety labels and severity levels. To address the lack of annotations on levels of severity, we propose a scalable generate-then-filter framework that first generates responses across different severity levels and then filters out low-quality responses. Using this framework, we create BingoGuardTrain, a training dataset with 54,897 examples covering a variety of topics, response severity, styles, and BingoGuardTest, a test set with 988 examples explicitly labeled based on our severity rubrics that enables fine-grained analysis on model behaviors on different severity levels. Our BingoGuard-8B, trained on BingoGuardTrain, achieves the state-of-the-art performance on several moderation benchmarks, including WildGuardTest and HarmBench, as well as BingoGuardTest, outperforming best public models, WildGuard, by 4.3\%. Our analysis demonstrates that incorporating severity levels into training significantly enhances detection performance and enables the model to effectively gauge the severity of harmful responses.

large language model, machine learning, severity level, (20 more...)

arXiv.org Artificial Intelligence

2503.0655

Country: North America > United States > California (0.14)

Genre:

Instructional Material (1.00)
Research Report (0.82)

Industry:

Media > News (1.00)
Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents

Choubey, Prafulla Kumar, Peng, Xiangyu, Bhagavath, Shilpa, Xiong, Caiming, Pentyala, Shiva Kumar, Wu, Chien-Sheng

arXiv.org Artificial IntelligenceFeb-24-2025

Automated service agents require well-structured workflows to provide consistent and accurate responses to customer queries. However, these workflows are often undocumented, and their automatic extraction from conversations remains unexplored. In this work, we present a novel framework for extracting and evaluating dialog workflows from historical interactions. Our extraction process consists of two key stages: (1) a retrieval step to select relevant conversations based on key procedural elements, and (2) a structured workflow generation process using a question-answer-based chain-of-thought (QA-CoT) prompting. To comprehensively assess the quality of extracted workflows, we introduce an automated agent and customer bots simulation framework that measures their effectiveness in resolving customer issues. Extensive experiments on the ABCD and SynthABCD datasets demonstrate that our QA-CoT technique improves workflow extraction by 12.16\% in average macro accuracy over the baseline. Moreover, our evaluation method closely aligns with human assessments, providing a reliable and scalable framework for future research.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.17321

Country:

North America > Canada (0.14)
Europe > Spain (0.14)
Asia > Taiwan (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Unanswerability Evaluation for Retreival Augmented Generation

Peng, Xiangyu, Choubey, Prafulla Kumar, Xiong, Caiming, Wu, Chien-Sheng

arXiv.org Artificial IntelligenceDec-16-2024

Existing evaluation frameworks for retrieval-augmented generation (RAG) systems focus on answerable queries, but they overlook the importance of appropriately rejecting unanswerable requests. In this paper, we introduce UAEval4RAG, a framework designed to evaluate whether RAG systems can handle unanswerable queries effectively. We define a taxonomy with six unanswerable categories, and UAEval4RAG automatically synthesizes diverse and challenging queries for any given knowledge base with unanswered ratio and acceptable ratio metrics. We conduct experiments with various RAG components, including retrieval models, rewriting methods, rerankers, language models, and prompting strategies, and reveal hidden trade-offs in performance of RAG systems. Our findings highlight the critical role of component selection and prompt design in optimizing RAG systems to balance the accuracy of answerable queries with high rejection rates of unanswerable ones. UAEval4RAG provides valuable insights and tools for developing more robust and reliable RAG systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.123

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency

Choubey, Prafulla Kumar, Su, Xin, Luo, Man, Peng, Xiangyu, Xiong, Caiming, Le, Tiep, Rosenman, Shachar, Lal, Vasudev, Mui, Phil, Ho, Ricky, Howard, Phillip, Wu, Chien-Sheng

arXiv.org Artificial IntelligenceOct-21-2024

Knowledge graphs (KGs) generated by large language models (LLMs) are becoming increasingly valuable for Retrieval-Augmented Generation (RAG) applications that require knowledge-intensive reasoning. However, existing KG extraction methods predominantly rely on prompt-based approaches, which are inefficient for processing large-scale corpora. These approaches often suffer from information loss, particularly with long documents, due to the lack of specialized design for KG construction. Additionally, there is a gap in evaluation datasets and methodologies for ontology-free KG construction. To overcome these limitations, we propose SynthKG, a multi-step, document-level ontology-free KG synthesis workflow based on LLMs. By fine-tuning a smaller LLM on the synthesized document-KG pairs, we streamline the multi-step process into a single-step KG generation approach called Distill-SynthKG, substantially reducing the number of LLM inference calls. Furthermore, we re-purpose existing question-answering datasets to establish KG evaluation datasets and introduce new evaluation metrics. Using KGs produced by Distill-SynthKG, we also design a novel graph-based retrieval framework for RAG. Experimental results demonstrate that Distill-SynthKG not only surpasses all baseline models in KG quality -- including models up to eight times larger -- but also consistently excels in retrieval and question-answering tasks. Our proposed graph retrieval framework also outperforms all KG-retrieval methods across multiple benchmark datasets. We release the SynthKG dataset and Distill-SynthKG model publicly to support further research and development.

large language model, machine learning, triplet, (19 more...)

arXiv.org Artificial Intelligence

2410.16597

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

Peng, Xiangyu, Xia, Congying, Yang, Xinyi, Xiong, Caiming, Wu, Chien-Sheng, Xing, Chen

arXiv.org Artificial IntelligenceOct-2-2024

Post-training Large Language Models (LLMs) with explicit reasoning trajectories can enhance their reasoning abilities. However, acquiring such high-quality trajectory data typically demands meticulous supervision from humans or superior models, which can be either expensive or license-constrained. In this paper, we explore how far an LLM can improve its reasoning by self-synthesizing reasoning paths as training data without any additional supervision. Existing self-synthesizing methods, such as STaR, suffer from poor generalization to out-of-domain (OOD) reasoning tasks. We hypothesize it is due to that their self-synthesized reasoning paths are too task-specific, lacking general task-agnostic reasoning guidance. To address this, we propose Reasoning Generalist via Self-Improvement (ReGenesis), a method to self-synthesize reasoning paths as post-training data by progressing from abstract to concrete. More specifically, ReGenesis self-synthesizes reasoning paths by converting general reasoning guidelines into task-specific ones, generating reasoning structures, and subsequently transforming these structures into reasoning paths, without the need for human-designed task-specific examples used in existing methods. We show that ReGenesis achieves superior performance on all in-domain and OOD settings tested compared to existing methods. For six OOD tasks specifically, while previous methods exhibited an average performance decrease of approximately 4.6% after post training, ReGenesis delivers around 6.1% performance improvement. We also conduct in-depth analysis of our framework and show ReGenesis is effective across various LLMs and design choices.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.02108

Country: Asia > Thailand (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Measuring Trust for Exoskeleton Systems

Stirling, Leia, Wu, Man I, Peng, Xiangyu

arXiv.org Artificial IntelligenceJul-9-2024

Wearable robotic systems are a class of robots that have a tight coupling between human and robot movements. Similar to non-wearable robots, it is important to measure the trust a person has that the robot can support achieving the desired goals. While some measures of trust may apply to all potential robotic roles, there are key distinctions between wearable and non-wearable robotic systems. In this paper, we considered the dimensions and sub-dimensions of trust, with example attributes defined for exoskeleton applications. As the research community comes together to discuss measures of trust, it will be important to consider how the selected measures support interpreting trust along different dimensions for the variety of robotic systems that are emerging in the field in a way that leads to actionable outcomes.

artificial intelligence, exoskeleton, robot, (15 more...)

arXiv.org Artificial Intelligence

2407.072

Country: North America > United States (0.70)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.66)

Add feedback

Player-Driven Emergence in LLM-Driven Game Narrative

Peng, Xiangyu, Quaye, Jessica, Rao, Sudha, Xu, Weijia, Botchway, Portia, Brockett, Chris, Jojic, Nebojsa, DesGarennes, Gabriel, Lobb, Ken, Xu, Michael, Leandro, Jorge, Jin, Claire, Dolan, Bill

arXiv.org Artificial IntelligenceJun-3-2024

We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers to play the game and use GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player's gameplay. We find that through their interactions with the non-deterministic behavior of the LLM, players are able to discover interesting new emergent nodes that were not a part of the original narrative but have potential for being fun and engaging. Players that created the most emergent nodes tended to be those that often enjoy games that facilitate discovery, exploration and experimentation.

large language model, machine learning, npc, (18 more...)

arXiv.org Artificial Intelligence

2404.17027

Country: Europe > Sweden (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)

Add feedback

A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds

Cui, Christopher Z., Peng, Xiangyu, Riedl, Mark O.

arXiv.org Artificial IntelligenceMay-9-2024

Open-ended worlds are those in which there are no pre-specified goals or environmental reward signal. As a consequence, an agent must know how to perform a multitude of tasks. However, when a new task is presented to an agent, we expect it to be able to reuse some of what it knows from previous tasks to rapidly learn that new task. We introduce a novel technique whereby policies for different a priori known tasks are combined into a Mixture-of-Experts model with an attention mechanism across a mix of frozen and unfrozen experts. The model learns when to attend to frozen task-specific experts when appropriate and learns new experts to handle novel situations. We work in an open-ended text-based environment in which the agent is tasked with behaving like different types of character roles and must rapidly learn behaviors associated with new character role types. We show that our agent both obtains more rewards in the zero-shot setting, and discovers these rewards with greater sample efficiency in the few-shot learning settings.

large language model, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2405.06059

Country: North America > United States (0.14)

Genre: Research Report > Promising Solution (0.66)

Industry: Leisure & Entertainment > Games > Computer Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
(2 more...)

Add feedback

Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning

Peng, Xiangyu, Li, Siyan, Wiegreffe, Sarah, Riedl, Mark

arXiv.org Artificial IntelligenceNov-17-2023

Transformer-based language model approaches to automated story generation currently provide state-of-the-art results. However, they still suffer from plot incoherence when generating narratives over time, and critically lack basic commonsense reasoning. Furthermore, existing methods generally focus only on single-character stories, or fail to track characters at all. To improve the coherence of generated narratives and to expand the scope of character-centric narrative generation, we introduce Commonsense-inference Augmented neural StoryTelling (CAST), a framework for introducing commonsense reasoning into the generation process with the option to model the interaction between multiple characters. We find that our CAST method produces significantly more coherent, on-topic, enjoyable and fluent stories than existing models in both the single-character and two-character settings in three storytelling domains.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2105.01311

Country:

Europe (0.93)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (1.00)
Media (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback