AITopics | Lu, Junting

Collaborating Authors

Lu, Junting

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities

Wang, Hanbin, Zhou, Xiaoxuan, Xu, Zhipeng, Cheng, Keyuan, Zuo, Yuxin, Tian, Kai, Song, Jingwei, Lu, Junting, Hu, Wenhui, Liu, Xueyang

arXiv.org Artificial IntelligenceFeb-17-2025

This paper introduces Code-Vision, a benchmark designed to evaluate the logical understanding and code generation capabilities of Multimodal Large Language Models (MLLMs). It challenges MLLMs to generate a correct program that fulfills specific functionality requirements based on a given flowchart, which visually represents the desired algorithm or process. Code-Vision comprises three subsets: HumanEval-V, Algorithm, and MATH, which evaluate MLLMs' coding abilities across basic programming, algorithmic, and mathematical problem-solving domains. Our experiments evaluate 12 MLLMs on Code-Vision. Experimental results demonstrate that there is a large performance difference between proprietary and open-source models. On Hard problems, GPT-4o can achieve 79.3% pass@1, but the best open-source model only achieves 15%. Further experiments reveal that Code-Vision can pose unique challenges compared to other multimodal reasoning benchmarks MMCode and MathVista. We also explore the reason for the poor performance of the open-source models. All data and codes are available at https://github.com/wanghanbinpanda/CodeVision.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2502.11829

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Action Models: From Inception to Implementation

Wang, Lu, Yang, Fangkai, Zhang, Chaoyun, Lu, Junting, Qian, Jiaxu, He, Shilin, Zhao, Pu, Qiao, Bo, Huang, Ray, Qin, Si, Su, Qisheng, Ye, Jiayi, Zhang, Yudi, Lou, Jian-Guang, Lin, Qingwei, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

arXiv.org Artificial IntelligenceJan-13-2025

As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating textual responses, to Large Action Models (LAMs), designed for action generation and execution within dynamic environments. Enabled by agent systems, LAMs hold the potential to transform AI from passive language understanding to active task completion, marking a significant milestone in the progression toward artificial general intelligence. In this paper, we present a comprehensive framework for developing LAMs, offering a systematic approach to their creation, from inception to deployment. We begin with an overview of LAMs, highlighting their unique characteristics and delineating their differences from LLMs. Using a Windows OS-based agent as a case study, we provide a detailed, step-by-step guide on the key stages of LAM development, including data collection, model training, environment integration, grounding, and evaluation. This generalizable workflow can serve as a blueprint for creating functional LAMs in various application domains. We conclude by identifying the current limitations of LAMs and discussing directions for future research and industrial deployment, emphasizing the challenges and opportunities that lie ahead in realizing the full potential of LAMs in real-world applications. The code for the data collection process utilized in this paper is publicly available at: https://github.com/microsoft/UFO/tree/main/dataflow, and comprehensive documentation can be found at https://microsoft.github.io/UFO/dataflow/overview/.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.10047

Genre:

Workflow (1.00)
Instructional Material (1.00)

Industry:

Information Technology (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents

Lu, Junting, Zhang, Zhiyang, Yang, Fangkai, Zhang, Jue, Wang, Lu, Du, Chao, Lin, Qingwei, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

arXiv.org Artificial IntelligenceSep-25-2024

Multimodal large language models (MLLMs) have enabled LLM-based agents to directly interact with application user interfaces (UIs), enhancing agents' performance in complex tasks. However, these agents often suffer from high latency and low reliability due to the extensive sequential UI interactions. To address this issue, we propose AXIS, a novel LLM-based agents framework prioritize actions through application programming interfaces (APIs) over UI actions. This framework also facilitates the creation and expansion of APIs through automated exploration of applications. Our experiments on Office Word demonstrate that AXIS reduces task completion time by 65%-70% and cognitive workload by 38%-53%, while maintaining accuracy of 97%-98% compare to humans. Our work contributes to a new human-agent-computer interaction (HACI) framework and a fresh UI design principle for application providers in the era of LLMs. It also explores the possibility of turning every applications into agents, paving the way towards an agent-centric operating system (Agent OS).

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2409.1714

Country: North America > United States (0.15)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.70)

Industry:

Health & Medicine (0.46)
Education (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

An, Kaikai, Yang, Fangkai, Li, Liqun, Lu, Junting, Cheng, Sitao, Wang, Lu, Zhao, Pu, Cao, Lele, Lin, Qingwei, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

arXiv.org Artificial IntelligenceJun-19-2024

Current question answering systems leveraging retrieval augmented generation perform well in answering factoid questions but face challenges with non-factoid questions, particularly how-to queries requiring detailed step-by-step instructions and explanations. In this paper, we introduce Thread, a novel data organization paradigm that transforms documents into logic units based on their inter-connectivity. Extensive experiments across open-domain and industrial scenarios demonstrate that Thread outperforms existing data organization paradigms in RAG-based QA systems, significantly improving the handling of how-to questions.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

2406.13372

Country:

Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Workflow (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides

An, Kaikai, Yang, Fangkai, Lu, Junting, Li, Liqun, Ren, Zhixing, Huang, Hao, Wang, Lu, Zhao, Pu, Kang, Yu, Ding, Hua, Lin, Qingwei, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

arXiv.org Artificial IntelligenceMay-10-2024

Effective incident management is pivotal for the smooth To investigate the effect of TSGs on incident mitigation, we analyze operation of Microsoft cloud services. In order to expedite incident around 1000 high-severity incidents in the recent twelve months mitigation, service teams gather troubleshooting knowledge into that demand immediate intervention from OCEs. Consistent with Troubleshooting Guides (TSGs) accessible to On-Call Engineers findings from prior studies [8, 18, 9], which demonstrate the efficacy (OCEs). While automated pipelines are enabled to resolve the most of TSGs in incident mitigation. We found that incidents paired with frequent and easy incidents, there still exist complex incidents that TSGs exhibit a 60% shorter average time-to-mitigate (TTM) compared require OCEs' intervention. In addition, TSGs are often unstructured to those without TSGs, emphasizing the pivotal role played and incomplete, which requires manual interpretation by OCEs, leading by TSGs. This trend is consistent across various companies, as evidenced to on-call fatigue and decreased productivity, especially among by research [14, 10], even among those employing different new-hire OCEs. In this work, we propose Nissist which leverages forms of TSGs. However, despite their utility, as highlighted by unstructured TSGs and incident mitigation history to provide proactive [18, 2], the unstructured format, varying quantity, and propensity for incident mitigation suggestions, reducing human intervention.

incident, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.17531

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.49)

Add feedback