AITopics | Chen, Haonan

Collaborating Authors

Chen, Haonan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cooperative Advisory Residual Policies for Congestion Mitigation

Hasan, Aamir, Chakraborty, Neeloy, Chen, Haonan, Cho, Jung-Hoon, Wu, Cathy, Driggs-Campbell, Katherine

arXiv.org Artificial IntelligenceJun-29-2024

Fleets of autonomous vehicles can mitigate traffic congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these approaches are limited in practice as they assume precise control over autonomous vehicle fleets, incur extensive installation costs for a centralized sensor ecosystem, and also fail to account for uncertainty in driver behavior. To this end, we develop a class of learned residual policies that can be used in cooperative advisory systems and only require the use of a single vehicle with a human driver. Our policies advise drivers to behave in ways that mitigate traffic congestion while accounting for diverse driver behaviors, particularly drivers' reactions to instructions, to provide an improved user experience. To realize such policies, we introduce an improved reward function that explicitly addresses congestion mitigation and driver attitudes to advice. We show that our residual policies can be personalized by conditioning them on an inferred driver trait that is learned in an unsupervised manner with a variational autoencoder. Our policies are trained in simulation with our novel instruction adherence driver model, and evaluated in simulation and through a user study (N=16) to capture the sentiments of human drivers. Our results show that our approaches successfully mitigate congestion while adapting to different driver behaviors, with up to 20% and 40% improvement as measured by a combination metric of speed and deviations in speed across time over baselines in our simulation tests and user study, respectively. Our user study further shows that our policies are human-compatible and personalize to drivers.

artificial intelligence, machine learning, residual policy, (17 more...)

arXiv.org Artificial Intelligence

2407.00553

Country:

North America > United States > Illinois (0.14)
North America > United States > Tennessee (0.14)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
(2 more...)

Add feedback

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

He, Xuan, Jiang, Dongfu, Zhang, Ge, Ku, Max, Soni, Achint, Siu, Sherman, Chen, Haonan, Chandra, Abhranil, Jiang, Ziyan, Arulraj, Aaran, Wang, Kai, Do, Quy Duc, Ni, Yuansheng, Lyu, Bohan, Narsupalli, Yaswanth, Fan, Rongqi, Lyu, Zhiheng, Lin, Yuchen, Chen, Wenhu

arXiv.org Artificial IntelligenceJun-24-2024

The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train VideoScore (initialized from Mantis) based on VideoFeedback to enable automatic video quality assessment. Experiments show that the Spearman correlation between VideoScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points. Further result on other held-out EvalCrafter, GenAI-Bench, and VBench show that VideoScore has consistently much higher correlation with human judges than other metrics. Due to these results, we believe VideoScore can serve as a great proxy for human raters to (1) rate different video models to track progress (2) simulate fine-grained human feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.15252

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins

Xia, Yuchen, Dittler, Daniel, Jazdi, Nasser, Chen, Haonan, Weyrich, Michael

arXiv.org Artificial IntelligenceMay-28-2024

This paper presents a novel design of a multi-agent system framework that applies a large language model (LLM) to automate the parametrization of process simulations in digital twins. We propose a multi-agent framework that includes four types of agents: observation, reasoning, decision and summarization. By enabling dynamic interaction between LLM agents and simulation model, the developed system can automatically explore the parametrization of the simulation and use heuristic reasoning to determine a set of parameters to control the simulation to achieve an objective. The proposed approach enhances the simulation model by infusing it with heuristics from LLM and enables autonomous search for feasible parametrization to solve a user task. Furthermore, the system has the potential to increase user-friendliness and reduce the cognitive load on human users by assisting in complex decision-making processes. The effectiveness and functionality of the system are demonstrated through a case study, and the visualized demos are available at a GitHub Repository: https://github.com/YuchenXia/LLMDrivenSimulation

artificial intelligence, digital twin, natural language, (16 more...)

arXiv.org Artificial Intelligence

2405.18092

Country: Europe > Germany (0.15)

Genre: Research Report (0.66)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval

Mao, Kelong, Deng, Chenlong, Chen, Haonan, Mo, Fengran, Liu, Zheng, Sakai, Tetsuya, Dou, Zhicheng

arXiv.org Artificial IntelligenceApr-21-2024

Conversational search requires accurate interpretation of user intent from complex multi-turn contexts. This paper presents ChatRetriever, which inherits the strong generalization capability of large language models to robustly represent complex conversational sessions for dense retrieval. To achieve this, we propose a simple and effective dual-learning approach that adapts LLM for retrieval via contrastive learning while enhancing the complex session understanding through masked instruction tuning on high-quality conversational instruction tuning data. Extensive experiments on five conversational search benchmarks demonstrate that ChatRetriever substantially outperforms existing conversational dense retrievers, achieving state-of-the-art performance on par with LLM-based rewriting approaches. Furthermore, ChatRetriever exhibits superior robustness in handling diverse conversational contexts. Our work highlights the potential of adapting LLMs for retrieval with complex inputs like conversational search sessions and proposes an effective approach to advance this research direction.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2404.13556

Country:

North America > Canada (0.46)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-training

Chen, Haonan, Dou, Zhicheng, Hao, Xuetong, Tao, Yunhao, Song, Shiren, Sheng, Zhenli

arXiv.org Artificial IntelligenceFeb-10-2024

Cloud solutions have gained significant popularity in the technology While there have been some studies focusing on designing effective industry as they offer a combination of services and tools to matching systems [1, 18, 20, 23, 29, 32, 35], none of these tackle specific problems. However, despite their widespread use, the works have explored the matching of cloud solutions and their customers, task of identifying appropriate company customers for a specific which holds significant business value. In Huawei Cloud, target solution to the sales team of a solution provider remains a the scenario is manual-driven, wherein our model identifies a list complex business problem that existing matching systems have of the top matching companies to the sales team associated with yet to adequately address. In this work, we study the B2B solution a specific solution. The sales team then manually reviews this list matching problem and identify two main challenges of this scenario: and proceeds with promoting the solution to those companies. This (1) the modeling of complex multi-field features and (2) the limited, specific scenario can be considered a matching problem, with the incomplete, and sparse transaction data. To tackle these challenges, primary goal being the identification of appropriate companies we propose a framework CAMA, which is built with a hierarchical (customers) for the sales teams to target in their promotion efforts.

cloud computing, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2402.07076

Country:

Europe (1.00)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Enterprise Applications > Customer Relationship Management (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation

Chen, Haonan, Dou, Zhicheng, Mao, Kelong, Liu, Jiongnan, Zhao, Ziliang

arXiv.org Artificial IntelligenceFeb-10-2024

Conversational search utilizes muli-turn natural language contexts to retrieve relevant passages. Existing conversational dense retrieval models mostly view a conversation as a fixed sequence of questions and responses, overlooking the severe data sparsity problem -- that is, users can perform a conversation in various ways, and these alternate conversations are unrecorded. Consequently, they often struggle to generalize to diverse conversations in real-world scenarios. In this work, we propose a framework for generalizing Conversational dense retrieval via LLM-cognition data Augmentation (ConvAug). ConvAug first generates multi-level augmented conversations to capture the diverse nature of conversational contexts. Inspired by human cognition, we devise a cognition-aware process to mitigate the generation of false positives, false negatives, and hallucinations. Moreover, we develop a difficulty-adaptive sample filter that selects challenging samples for complex conversations, thereby giving the model a larger learning space. A contrastive learning objective is then employed to train a better conversational context encoder. Extensive experiments conducted on four public datasets, under both normal and zero-shot settings, demonstrate the effectiveness, generalizability, and applicability of ConvAug.

conversation, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.07092

Country:

North America > United States (1.00)
Europe (0.93)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Wei, Cong, Chen, Yang, Chen, Haonan, Hu, Hexiang, Zhang, Ge, Fu, Jie, Ritter, Alan, Chen, Wenhu

arXiv.org Artificial IntelligenceNov-28-2023

Existing information retrieval (IR) models often assume a homogeneous format, limiting their applicability to diverse user needs, such as searching for images with text descriptions, searching for a news article with a headline image, or finding a similar photo with a query image. To approach such different information-seeking demands, we introduce UniIR, a unified instruction-guided multimodal retriever capable of handling eight distinct retrieval tasks across modalities. UniIR, a single retrieval system jointly trained on ten diverse multimodal-IR datasets, interprets user instructions to execute various retrieval tasks, demonstrating robust performance across existing datasets and zero-shot generalization to new tasks. Our experiments highlight that multi-task training and instruction tuning are keys to UniIR's generalization ability. Additionally, we construct the M-BEIR, a multimodal retrieval benchmark with comprehensive results, to standardize the evaluation of universal multimodal information retrieval.

information retrieval, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2311.17136

Country:

Asia (1.00)
Europe > Germany (0.94)
Europe > United Kingdom (0.67)
(2 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Regional Government > Europe Government (0.94)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Predicting Object Interactions with Behavior Primitives: An Application in Stowing Tasks

Chen, Haonan, Niu, Yilong, Hong, Kaiwen, Liu, Shuijing, Wang, Yixuan, Li, Yunzhu, Driggs-Campbell, Katherine

arXiv.org Artificial IntelligenceNov-3-2023

Stowing, the task of placing objects in cluttered shelves or bins, is a common task in warehouse and manufacturing operations. However, this task is still predominantly carried out by human workers as stowing is challenging to automate due to the complex multi-object interactions and long-horizon nature of the task. Previous works typically involve extensive data collection and costly human labeling of semantic priors across diverse object categories. This paper presents a method to learn a generalizable robot stowing policy from predictive model of object interactions and a single demonstration with behavior primitives. We propose a novel framework that utilizes Graph Neural Networks to predict object interactions within the parameter space of behavioral primitives. We further employ primitive-augmented trajectory optimization to search the parameters of a predefined library of heterogeneous behavioral primitives to instantiate the control action. Our framework enables robots to proficiently execute long-horizon stowing tasks with a few keyframes (3-4) from a single demonstration. Despite being solely trained in a simulation, our framework demonstrates remarkable generalization capabilities. It efficiently adapts to a broad spectrum of real-world conditions, including various shelf widths, fluctuating quantities of objects, and objects with diverse attributes such as sizes and shapes.

artificial intelligence, interaction, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2309.16873

Country:

North America > United States > Illinois (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Learning Task Skills and Goals Simultaneously from Physical Interaction

Chen, Haonan, Mun, Ye-Ji, Huang, Zhe, Niu, Yilong, Xie, Yiqing, McPherson, D. Livingston, Driggs-Campbell, Katherine

arXiv.org Artificial IntelligenceSep-8-2023

In real-world human-robot systems, it is essential for a robot to comprehend human objectives and respond accordingly while performing an extended series of motor actions. Although human objective alignment has recently emerged as a promising paradigm in the realm of physical human-robot interaction, its application is typically confined to generating simple motions due to inherent theoretical limitations. In this work, our goal is to develop a general formulation to learn manipulation functional modules and long-term task goals simultaneously from physical human-robot interaction. We show the feasibility of our framework in enabling robots to align their behaviors with the long-term task objectives inferred from human interactions.

artificial intelligence, learning task skill, physical interaction

arXiv.org Artificial Intelligence

2309.04596

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge Selection

Qian, Hongjin, Dou, Zhicheng, Tan, Jiejun, Chen, Haonan, Gu, Haoqi, Lai, Ruofei, Zhang, Xinyu, Cao, Zhao, Wen, Ji-Rong

arXiv.org Artificial IntelligenceAug-29-2023

Language models (LMs) have revolutionized the way we interact with information, but they often generate nonfactual text, raising concerns about their reliability. Previous methods use external knowledge as references for text generation to enhance factuality but often struggle with the knowledge mix-up(e.g., entity mismatch) of irrelevant references. Besides,as the length of the output text grows, the randomness of sampling can escalate, detrimentally impacting the factual accuracy of the generated text. In this paper, we present DKGen, which divide the text generation process into an iterative process. In each iteration, DKGen takes the input query, the previously generated text and a subset of the reference passages as input to generate short text. During the process, the subset is dynamically selected from the full passage set based on their relevance to the previously generated text and the query, largely eliminating the irrelevant references from input. To further enhance DKGen's ability to correctly use these external knowledge, DKGen distills the relevance order of reference passages to the cross-attention distribution of decoder. We train and evaluate DKGen on a large-scale benchmark dataset. Experiment results show that DKGen outperforms all baseline models.

artificial intelligence, dynamic knowledge selection, natural language, (2 more...)

arXiv.org Artificial Intelligence

2308.15711

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback