AITopics | Liang, Jing

Collaborating Authors

Liang, Jing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLM Agents for Education: Advances and Applications

Chu, Zhendong, Wang, Shen, Xie, Jian, Zhu, Tinghui, Yan, Yibo, Ye, Jinheng, Zhong, Aoxiao, Hu, Xuming, Liang, Jing, Yu, Philip S., Wen, Qingsong

arXiv.org Artificial IntelligenceMar-14-2025

Large Language Model (LLM) agents have demonstrated remarkable capabilities in automating tasks and driving innovation across diverse educational applications. In this survey, we provide a systematic review of state-of-the-art research on LLM agents in education, categorizing them into two broad classes: (1) \emph{Pedagogical Agents}, which focus on automating complex pedagogical tasks to support both teachers and students; and (2) \emph{Domain-Specific Educational Agents}, which are tailored for specialized fields such as science education, language learning, and professional development. We comprehensively examine the technological advancements underlying these LLM agents, including key datasets, benchmarks, and algorithmic frameworks that drive their effectiveness. Furthermore, we discuss critical challenges such as privacy, bias and fairness concerns, hallucination mitigation, and integration with existing educational ecosystems. This survey aims to provide a comprehensive technological overview of LLM agents for education, fostering further research and collaboration to enhance their impact for the greater good of learners and educators alike.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.11733

Country:

Asia > China (0.46)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Illinois (0.14)
North America > Mexico > Mexico City (0.14)

Genre:

Overview (1.00)
Instructional Material (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Vi-LAD: Vision-Language Attention Distillation for Socially-Aware Robot Navigation in Dynamic Environments

Elnoor, Mohamed, Weerakoon, Kasun, Seneviratne, Gershom, Liang, Jing, Rajagopal, Vignesh, Manocha, Dinesh

arXiv.org Artificial IntelligenceMar-12-2025

-- We introduce Vision-Language Attention Distillation (Vi-LAD), a novel approach for distilling socially compliant navigation knowledge from a large Vision-Language Model (VLM) into a lightweight transformer model for real-time robotic navigation. Unlike traditional methods that rely on expert demonstrations or human-annotated datasets, Vi-LAD performs knowledge distillation and fine-tuning at the intermediate layer representation level (i.e., attention maps) by leveraging the backbone of a pre-trained vision-action model. These attention maps highlight key navigational regions in a given scene, which serve as implicit guidance for socially aware motion planning. Vi-LAD fine-tunes a transformer-based model using intermediate attention maps extracted from the pre-trained vision-action model, combined with attention-like semantic maps constructed from a large VLM. T o achieve this, we introduce a novel attention-level distillation loss that fuses knowledge from both sources, generating augmented attention maps with enhanced social awareness. These refined attention maps are then utilized as a traversability costmap within a socially aware model predictive controller (MPC) for navigation. We validate our approach through real-world experiments on a Husky wheeled robot, demonstrating significant improvements over state-of-the-art (SOT A) navigation methods. Our results show up to 14.2% - 50% improvement in success rate, which highlights the effectiveness of Vi-LAD in enabling socially compliant and efficient robot navigation. I NTRODUCTION As autonomous robots become increasingly integrated into human-centered environments, ensuring safe, efficient, and socially compliant navigation is a critical challenge [1].

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.0982

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre:

Research Report > New Finding (0.54)
Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Energy > Oil & Gas (0.36)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning

Kong, Yangzhe, Song, Daeun, Liang, Jing, Manocha, Dinesh, Yao, Ziyu, Xiao, Xuesu

arXiv.org Artificial IntelligenceMar-10-2025

We present a novel method, AutoSpatial, an efficient approach with structured spatial grounding to enhance VLMs' spatial reasoning. By combining minimal manual supervision with large-scale Visual Question-Answering (VQA) pairs auto-labeling, our approach tackles the challenge of VLMs' limited spatial understanding in social navigation tasks. By applying a hierarchical two-round VQA strategy during training, AutoSpatial achieves both global and detailed understanding of scenarios, demonstrating more accurate spatial perception, movement prediction, Chain of Thought (CoT) reasoning, final action, and explanation compared to other SOTA approaches. These five components are essential for comprehensive social navigation reasoning. Our approach was evaluated using both expert systems (GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet) that provided cross-validation scores and human evaluators who assigned relative rankings to compare model performances across four key aspects. Augmented by the enhanced spatial reasoning capabilities, AutoSpatial demonstrates substantial improvements by averaged cross-validation score from expert systems in: perception & prediction (up to 10.71%), reasoning (up to 16.26%), action (up to 20.50%), and explanation (up to 18.73%) compared to baseline models trained only on manually annotated data.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.07557

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces

Payandeh, Amirreza, Song, Daeun, Nazeri, Mohammad, Liang, Jing, Mukherjee, Praneel, Raj, Amir Hossain, Kong, Yangzhe, Manocha, Dinesh, Xiao, Xuesu

arXiv.org Artificial IntelligenceDec-30-2024

Most existing social robot navigation techniques either leverage hand-crafted rules or human demonstrations to connect robot perception to socially compliant actions. However, there remains a significant gap in effectively translating perception into socially compliant actions, much like how human reasoning naturally occurs in dynamic environments. Considering the recent success of Vision-Language Models (VLMs), we propose using language to bridge the gap in human-like reasoning between perception and socially aware robot actions. We create a vision-language dataset, Social robot Navigation via Explainable Interactions (SNEI), featuring 40K human-annotated Visual Question Answers (VQAs) based on 2K human-robot social interactions in unstructured, crowded public spaces, spanning perception, prediction, chain-of-thought reasoning, action, and explanation. We fine-tune a VLM, Social-LLaVA, using SNEI to demonstrate the practical application of our dataset. Social-LLaVA outperforms state-of-the-art models like GPT-4V and Gemini, based on the average of fifteen different human-judge scores across 50 VQA. Deployed onboard a mobile robot, Social-LLaVA enables human-like reasoning, marking a promising step toward socially compliant robot navigation in dynamic public spaces through language reasoning.

artificial intelligence, enhancing robot navigation, human-language reasoning, (2 more...)

arXiv.org Artificial Intelligence

2501.09024

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes

Weerakoon, Kasun, Elnoor, Mohamed, Seneviratne, Gershom, Rajagopal, Vignesh, Arul, Senthil Hariharan, Liang, Jing, Jaffar, Mohamed Khalid M, Manocha, Dinesh

arXiv.org Artificial IntelligenceOct-2-2024

We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and associated landmarks (e.g., "the building with blue windows"), while behavioral guidelines encompass regulatory actions (e.g., "stay on") and their corresponding objects (e.g., "pavements"). We use VLMs for their zero-shot scene understanding capabilities to estimate landmark locations from RGB images for robot navigation. Further, we introduce a novel scene representation that utilizes VLMs to ground behavioral rules into a behavioral cost map. This cost map encodes the presence of behavioral objects within the scene and assigns costs based on their regulatory actions. The behavioral cost map is integrated with a LiDAR-based occupancy map for navigation. To navigate outdoor scenes while adhering to the instructed behaviors, we present an unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. We evaluate the performance of BehAV on a quadruped robot across diverse real-world scenarios, demonstrating a 22.49% improvement in alignment with human-teleoperated actions, as measured by Frechet distance, and achieving a 40% higher navigation success rate compared to state-of-the-art methods.

large language model, natural language, navigation, (18 more...)

arXiv.org Artificial Intelligence

2409.16484

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry:

Energy > Oil & Gas (0.54)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

GND: Global Navigation Dataset with Multi-Modal Perception and Multi-Category Traversability in Outdoor Campus Environments

Liang, Jing, Das, Dibyendu, Song, Daeun, Shuvo, Md Nahid Hasan, Durrani, Mohammad, Taranath, Karthik, Penskiy, Ivan, Manocha, Dinesh, Xiao, Xuesu

arXiv.org Artificial IntelligenceSep-26-2024

Navigating large-scale outdoor environments requires complex reasoning in terms of geometric structures, environmental semantics, and terrain characteristics, which are typically captured by onboard sensors such as LiDAR and cameras. While current mobile robots can navigate such environments using pre-defined, high-precision maps based on hand-crafted rules catered for the specific environment, they lack commonsense reasoning capabilities that most humans possess when navigating unknown outdoor spaces. To address this gap, we introduce the Global Navigation Dataset (GND), a large-scale dataset that integrates multi-modal sensory data, including 3D LiDAR point clouds and RGB and 360-degree images, as well as multi-category traversability maps (pedestrian walkways, vehicle roadways, stairs, off-road terrain, and obstacles) from ten university campuses. These environments encompass a variety of parks, urban settings, elevation changes, and campus layouts of different scales. The dataset covers approximately 2.7km2 and includes at least 350 buildings in total. We also present a set of novel applications of GND to showcase its utility to enable global robot navigation, such as map-based global navigation, mapless navigation, and global place recognition.

artificial intelligence, navigation, survey article, (17 more...)

arXiv.org Artificial Intelligence

2409.14262

Country: North America > United States (0.28)

Genre:

Overview (0.48)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.51)

Add feedback

MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition

Cui, Yanjie, Liu, Xiaohong, Liang, Jing, Fu, Yamin

arXiv.org Artificial IntelligenceJul-8-2024

Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information. However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain. In this paper, we propose a multi-view Graph Transformer (MVGT) based on spatial relations, which integrates information from the temporal, frequency and spatial domains, including geometric and anatomical structures, so as to enhance the expressive power of the model comprehensively. We incorporate the spatial information of EEG channels into the model as encoding, thereby improving its ability to perceive the spatial structure of the channels. Meanwhile, experimental results based on publicly available datasets demonstrate that our proposed model outperforms state-of-the-art methods in recent years. In addition, the results also show that the MVGT could extract information from multiple domains and capture inter-channel relationships in EEG emotion recognition tasks effectively.

information, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.03131

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models

Song, Daeun, Liang, Jing, Payandeh, Amirreza, Xiao, Xuesu, Manocha, Dinesh

arXiv.org Artificial IntelligenceJul-7-2024

We propose VLM-Social-Nav, a novel Vision-Language Model (VLM) based navigation approach to compute a robot's motion in human-centered environments. Our goal is to make real-time decisions on robot actions that are socially compliant with human expectations. We utilize a perception model to detect important social entities and prompt a VLM to generate guidance for socially compliant robot behavior. VLM-Social-Nav uses a VLM-based scoring module that computes a cost term that ensures socially appropriate and effective robot actions generated by the underlying planner. Our overall approach reduces reliance on large training datasets and enhances adaptability in decision-making. In practice, it results in improved socially compliant navigation in human-shared environments. We demonstrate and evaluate our system in four different real-world social navigation scenarios with a Turtlebot robot. We observe at least 27.38% improvement in the average success rate and 19.05% improvement in the average collision rate in the four social navigation scenarios. Our user study score shows that VLM-Social-Nav generates the most socially compliant navigation behavior.

large language model, natural language, navigation, (17 more...)

arXiv.org Artificial Intelligence

2404.0021

Country: North America > United States > Maryland (0.14)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (0.71)

Industry: Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

Bringing Generative AI to Adaptive Learning in Education

Li, Hang, Xu, Tianlong, Zhang, Chaoli, Chen, Eason, Liang, Jing, Fan, Xing, Li, Haoyang, Tang, Jiliang, Wen, Qingsong

arXiv.org Artificial IntelligenceJun-28-2024

The recent surge in generative AI technologies, such as large language models and diffusion models, has boosted the development of AI applications in various domains, including science, finance, and education. Concurrently, adaptive learning, a concept that has gained substantial interest in the educational sphere, has proven its efficacy in enhancing students' learning efficiency. In this position paper, we aim to shed light on the intersectional studies of these two methods, which combine generative AI with adaptive learning concepts. By presenting discussions about the benefits, challenges, and potentials in this field, we argue that this union will contribute significantly to the development of the next-stage learning format in education.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.14601

Country:

North America > United States (0.14)
Europe (0.14)
Africa (0.14)

Genre:

Research Report (1.00)
Instructional Material (1.00)
Overview (0.93)

Industry:

Education > Educational Setting (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics

Wu, Xiyang, Chakraborty, Souradip, Xian, Ruiqi, Liang, Jing, Guan, Tianrui, Liu, Fuxiao, Sadler, Brian M., Manocha, Dinesh, Bedi, Amrit Singh

arXiv.org Artificial IntelligenceJun-16-2024

In this paper, we highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications. Recent works focus on using LLMs and VLMs to improve the performance of robotics tasks, such as manipulation and navigation. Despite these improvements, analyzing the safety of such systems remains underexplored yet extremely critical. LLMs and VLMs are highly susceptible to adversarial inputs, prompting a significant inquiry into the safety of robotic systems. This concern is important because robotics operate in the physical world where erroneous actions can result in severe consequences. This paper explores this issue thoroughly, presenting a mathematical formulation of potential attacks on LLM/VLM-based robotic systems and offering experimental evidence of the safety challenges. Our empirical findings highlight a significant vulnerability: simple modifications to the input can drastically reduce system effectiveness. Specifically, our results demonstrate an average performance deterioration of 19.4% under minor input prompt modifications and a more alarming 29.1% under slight perceptual changes. These findings underscore the urgent need for robust countermeasures to ensure the safe and reliable deployment of advanced LLM/VLM-based robotic systems.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2402.1034

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > Florida > Orange County > Orlando (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback