AITopics | Ma, Liang

Collaborating Authors

Ma, Liang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unbiased Evaluation of Large Language Models from a Causal Perspective

Chen, Meilin, Tian, Jian, Ma, Liang, Xie, Di, Chen, Weijie, Zhu, Jiang

arXiv.org Artificial IntelligenceFeb-10-2025

Benchmark contamination has become a significant concern in the LLM evaluation community. Previous Agents-as-an-Evaluator address this issue by involving agents in the generation of questions. Despite their success, the biases in Agents-as-an-Evaluator methods remain largely unexplored. In this paper, we present a theoretical formulation of evaluation bias, providing valuable insights into designing unbiased evaluation protocols. Furthermore, we identify two type of bias in Agents-as-an-Evaluator through carefully designed probing tasks on a minimal Agents-as-an-Evaluator setup. To address these issues, we propose the Unbiased Evaluator, an evaluation protocol that delivers a more comprehensive, unbiased, and interpretable assessment of LLMs.Extensive experiments reveal significant room for improvement in current LLMs. Additionally, we demonstrate that the Unbiased Evaluator not only offers strong evidence of benchmark contamination but also provides interpretable evaluation results.

intervention, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.06655

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation

Wang, Yongxin, Cao, Meng, Lin, Haokun, Han, Mingfei, Ma, Liang, Jiang, Jin, Cheng, Yuhao, Liang, Xiaodan

arXiv.org Artificial IntelligenceDec-16-2024

Multimodal large language models (MLLMs) have achieved remarkable progress on various visual question answering and reasoning tasks leveraging instruction fine-tuning specific datasets. They can also learn from preference data annotated by human to enhance their reasoning ability and mitigate hallucinations. Most of preference data is generated from the model itself. However, existing methods require high-quality critical labels, which are costly and rely on human or proprietary models like GPT-4V. In this work, we propose Enhancing Alignment in MLLMs via Critical Observation (EACO), which aligns MLLMs by self-generated preference data using only 5k images economically. Our approach begins with collecting and refining a Scoring Evaluation Instruction-tuning dataset to train a critical evaluation model, termed the Critic. This Critic observes model responses across multiple dimensions, selecting preferred and non-preferred outputs for refined Direct Preference Optimization (DPO) tuning. To further enhance model performance, we employ an additional supervised fine-tuning stage after preference tuning. EACO reduces the overall hallucinations by 65.6% on HallusionBench and improves the reasoning ability by 21.8% on MME-Cognition. EACO achieves an 8.5% improvement over LLaVA-v1.6-Mistral-7B across multiple benchmarks. Remarkably, EACO also shows the potential critical ability in open-source MLLMs, demonstrating that EACO is a viable path to boost the competence of MLLMs.

large language model, natural language, preprint arxiv, (18 more...)

arXiv.org Artificial Intelligence

2412.04903

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation

Han, Mingfei, Ma, Liang, Zhumakhanova, Kamila, Radionova, Ekaterina, Zhang, Jingyi, Chang, Xiaojun, Liang, Xiaodan, Laptev, Ivan

arXiv.org Artificial IntelligenceDec-11-2024

Vision-and-Language Navigation (VLN) suffers from the limited diversity and scale of training data, primarily constrained by the manual curation of existing simulators. To address this, we introduce RoomTour3D, a video-instruction dataset derived from web-based room tour videos that capture real-world indoor spaces and human walking demonstrations. Unlike existing VLN datasets, RoomTour3D leverages the scale and diversity of online videos to generate open-ended human walking trajectories and open-world navigable instructions. To compensate for the lack of navigation data in online videos, we perform 3D reconstruction and obtain 3D trajectories of walking paths augmented with additional information on the room types, object locations and 3D shape of surrounding scenes. Our dataset includes $\sim$100K open-ended description-enriched trajectories with $\sim$200K instructions, and 17K action-enriched trajectories from 1847 room tour environments. We demonstrate experimentally that RoomTour3D enables significant improvements across multiple VLN tasks including CVDN, SOON, R2R, and REVERIE. Moreover, RoomTour3D facilitates the development of trainable zero-shot VLN agents, showcasing the potential and challenges of advancing towards open-world navigation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.08591

Genre:

Workflow (0.93)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges

Tao, Laifa, Li, Shangyu, Liu, Haifei, Huang, Qixuan, Ma, Liang, Ning, Guoao, Chen, Yiling, Wu, Yunlong, Li, Bin, Zhang, Weiwei, Zhao, Zhengduo, Zhan, Wenchao, Cao, Wenyan, Wang, Chao, Liu, Hongmei, Ma, Jian, Suo, Mingliang, Cheng, Yujie, Ding, Yu, Song, Dengwei, Lu, Chen

arXiv.org Artificial IntelligenceJul-1-2024

Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Large Model, heralds a technological revolution with the potential to fundamentally reshape traditional technological fields and human production methods. Its capabilities, including strong generalization, reasoning, and generative attributes, present opportunities to address PHM's bottlenecks. To this end, based on a systematic analysis of the current challenges and bottlenecks in PHM, as well as the research status and advantages of Large Model, we propose a novel concept and three progressive paradigms of Prognosis and Health Management Large Model (PHM-LM) through the integration of the Large Model with PHM. Subsequently, we provide feasible technical approaches for PHM-LM to bolster PHM's core capabilities within the framework of the three paradigms. Moreover, to address core issues confronting PHM, we discuss a series of technical challenges of PHM-LM throughout the entire process of construction and application. This comprehensive effort offers a holistic PHM-LM technical framework, and provides avenues for new PHM technologies, methodologies, tools, platforms and applications, which also potentially innovates design, research & development, verification and application mode of PHM. And furthermore, a new generation of PHM with AI will also capably be realized, i.e., from custom to generalized, from discriminative to generative, and from theoretical conditions to practical applications.

knowledge management, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2407.03374

Country: Asia > China (0.67)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.65)

Industry:

Health & Medicine > Consumer Health (1.00)
Aerospace & Defense (1.00)
Energy > Oil & Gas > Upstream (0.67)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
(8 more...)

Add feedback

GarchingSim: An Autonomous Driving Simulator with Photorealistic Scenes and Minimalist Workflow

Zhou, Liguo, Song, Yinglei, Gao, Yichao, Yu, Zhou, Sodamin, Michael, Liu, Hongshen, Ma, Liang, Liu, Lian, Liu, Hao, Liu, Yang, Li, Haichuan, Chen, Guang, Knoll, Alois

arXiv.org Artificial IntelligenceJan-30-2024

Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and interdisciplinary researchers. We introduce an autonomous driving simulator with photorealistic scenes, meanwhile keeping a user-friendly workflow. The simulator is able to communicate with external algorithms through ROS2 or Socket.IO, making it compatible with existing software stacks. Furthermore, we implement a highly accurate vehicle dynamics model within the simulator to enhance the realism of the vehicle's physical effects. The simulator is able to serve various functions, including generating synthetic data and driving with machine learning-based algorithms. Moreover, we prioritize simplicity in the deployment process, ensuring that beginners find it approachable and user-friendly.

artificial intelligence, machine learning, simulator, (14 more...)

arXiv.org Artificial Intelligence

2401.15803

Country: North America > United States > California (0.14)

Genre:

Workflow (0.61)
Research Report (0.40)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MO-VLN: A Multi-Task Benchmark for Open-set Zero-Shot Vision-and-Language Navigation

Liang, Xiwen, Ma, Liang, Guo, Shanshan, Han, Jianhua, Xu, Hang, Ma, Shikui, Liang, Xiaodan

arXiv.org Artificial IntelligenceSep-26-2023

Given a natural language, a general robot has to comprehend the instruction and find the target object or location based on visual observations even in unexplored environments. Most agents rely on massive diverse training data to achieve better generalization, which requires expensive labor. These agents often focus on common objects and fewer tasks, thus are not intelligent enough to handle different types of instructions. To facilitate research in open-set vision-and-language navigation, we propose a benchmark named MO-VLN, aiming at testing the effectiveness and generalization of the agent in the multi-task setting. First, we develop a 3D simulator rendered by realistic scenarios using Unreal Engine 5, containing more realistic lights and details. The simulator contains three scenes, i.e., cafe, restaurant, and nursing house, of high value in the industry. Besides, our simulator involves multiple uncommon objects, such as takeaway cup and medical adhesive tape, which are more complicated compared with existing environments. Inspired by the recent success of large language models (e.g., ChatGPT, Vicuna), we construct diverse high-quality data of instruction type without human annotation. Our benchmark MO-VLN provides four tasks: 1) goal-conditioned navigation given a specific object category (e.g., "fork"); 2) goal-conditioned navigation given simple instructions (e.g., "Search for and move towards a tennis ball"); 3) step-by-step instruction following; 4) finding abstract object based on high-level instruction (e.g., "I am thirsty").

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.10322

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Sports > Tennis (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BUMP: A Benchmark of Unfaithful Minimal Pairs for Meta-Evaluation of Faithfulness Metrics

Ma, Liang, Cao, Shuyang, Logan, Robert L. IV, Lu, Di, Ran, Shihao, Zhang, Ke, Tetreault, Joel, Jaimes, Alejandro

arXiv.org Artificial IntelligenceJun-4-2023

The proliferation of automatic faithfulness metrics for summarization has produced a need for benchmarks to evaluate them. While existing benchmarks measure the correlation with human judgements of faithfulness on model-generated summaries, they are insufficient for diagnosing whether metrics are: 1) consistent, i.e., indicate lower faithfulness as errors are introduced into a summary, 2) effective on human-written texts, and 3) sensitive to different error types (as summaries can contain multiple errors). To address these needs, we present a benchmark of unfaithful minimal pairs (BUMP), a dataset of 889 human-written, minimally different summary pairs, where a single error is introduced to a summary from the CNN/DailyMail dataset to produce an unfaithful summary. We find BUMP complements existing benchmarks in a number of ways: 1) the summaries in BUMP are harder to discriminate and less probable under SOTA summarization models, 2) unlike non-pair-based datasets, BUMP can be used to measure the consistency of metrics, and reveals that the most discriminative metrics tend not to be the most consistent, and 3) unlike datasets containing generated summaries with multiple errors, BUMP enables the measurement of metrics' performance on individual error types.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.09955

Country:

Europe (1.00)
North America > United States > Michigan (0.14)

Genre: Research Report (0.64)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Deep Transfer Learning with Graph Neural Network for Sensor-Based Human Activity Recognition

Yan, Yan, Liao, Tianzheng, Zhao, Jinjin, Wang, Jiahong, Ma, Liang, Lv, Wei, Xiong, Jing, Wang, Lei

arXiv.org Artificial IntelligenceMar-14-2022

Abstract--The sensor-based human activity recognition (HAR) in mobile application scenarios is often confronted with sensor modalities variation and annotated data deficiency. Given this observation, we devised a graph-inspired deep learning approach toward the sensor-based HAR tasks, which was further used to build a deep transfer learning model toward giving a tentative solution for these two challenging problems. Specifically, we present a multi-layer residual structure involved graph convolutional neural network (ResGCNN) toward the sensor-based HAR tasks, namely the HAR-ResGCNN approach. Experimental results on the PAMAP2 and mHealth data sets demonstrate that our ResGCNN is effective at capturing the characteristics of actions with comparable results compared to other sensor-based HAR models (with an average accuracy of 98.18% and 99.07%, respectively). The graph-based framework shows good meta-learning ability and is supposed to be a promising solution in sensor-based HAR tasks. However, this method machines the ability to track the human activity state. HAR is restricted with the impact brought with the complex systems tracks the activity states through processing and scenarios, the uncertainty of the action, and needs to consider learning information from some carriers that can record the privacy problems caused by the camera, and is human actions (such as cameras [1], sensors [2], radars only suitable for some specific scenes.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/BIBM55620.2022.9995660

2203.0791

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Query Answering via Decentralized Search

Ma, Liang

arXiv.org Artificial IntelligenceDec-26-2020

Expert networks are formed by a group of expert-professionals with different specialties to collaboratively resolve specific queries posted to the network. In such networks, when a query reaches an expert who does not have sufficient expertise, this query needs to be routed to other experts for further processing until it is completely solved; therefore, query answering efficiency is sensitive to the underlying query routing mechanism being used. Among all possible query routing mechanisms, decentralized search, operating purely on each expert's local information without any knowledge of network global structure, represents the most basic and scalable routing mechanism, which is applicable to any network scenarios even in dynamic networks. However, there is still a lack of fundamental understanding of the efficiency of decentralized search in expert networks. In this regard, we investigate decentralized search by quantifying its performance under a variety of network settings. Our key findings reveal the existence of network conditions, under which decentralized search can achieve significantly short query routing paths (i.e., between $O(\log n)$ and $O(\log^2 n)$ hops, $n$: total number of experts in the network). Based on such theoretical foundation, we further study how the unique properties of decentralized search in expert networks is related to the anecdotal small-world phenomenon. In addition, we demonstrate that decentralized search is robust against estimation errors introduced by misinterpreting the required expertise levels. To the best of our knowledge, this is the first work studying fundamental behaviors of decentralized search in expert networks. The developed performance bounds, confirmed by real datasets, are able to assist in predicting network performance and designing complex expert networks.

artificial intelligence, decentralized search, natural language, (18 more...)

arXiv.org Artificial Intelligence

2012.12192

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Networks (0.89)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.70)

Add feedback

Influence Maximization Under Generic Threshold-based Non-submodular Model

Ma, Liang

arXiv.org Artificial IntelligenceDec-18-2020

As a widely observable social effect, influence diffusion refers to a process where innovations, trends, awareness, etc. spread across the network via the social impact among individuals. Motivated by such social effect, the concept of influence maximization is coined, where the goal is to select a bounded number of the most influential nodes (seed nodes) from a social network so that they can jointly trigger the maximal influence diffusion. A rich body of research in this area is performed under statistical diffusion models with provable submodularity, which essentially simplifies the problem as the optimal result can be approximated by the simple greedy search. When the diffusion models are non-submodular, however, the research community mostly focuses on how to bound/approximate them by tractable submodular functions so as to estimate the optimal result. In other words, there is still a lack of efficient methods that can directly resolve non-submodular influence maximization problems. In this regard, we fill the gap by proposing seed selection strategies using network graphical properties in a generalized threshold-based model, called influence barricade model, which is non-submodular. Specifically, under this model, we first establish theories to reveal graphical conditions that ensure the network generated by node removals has the same optimal seed set as that in the original network. We then exploit these theoretical conditions to develop efficient algorithms by strategically removing less-important nodes and selecting seeds only in the remaining network. To the best of our knowledge, this is the first graph-based approach that directly tackles non-submodular influence maximization.

node, optimization problem, social media, (16 more...)

arXiv.org Artificial Intelligence

2012.12309

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback