AITopics | Fang, Junjie

Plotting

Fang, Junjie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reproducibility Companion Paper: Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems

Li, Yuyuan, Fang, Junjie, Chen, Chaochao, Zheng, Xiaolin, Zhang, Yizhao, Han, Zhongxuan

arXiv.org Artificial IntelligenceMar-29-2025

In this paper, we reproduce the experimental results presented in our previous work titled "Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems," which was published in the proceedings of the 31st ACM International Conference on Multimedia. This paper aims to validate the effectiveness of our proposed method and help others reproduce our experimental results. We provide detailed descriptions of our preprocessed datasets, source code structure, configuration file settings, experimental environment, and reproduced experimental results.

artificial intelligence, machine learning, original 0, (16 more...)

arXiv.org Artificial Intelligence

2503.23032

Country: North America > United States (0.31)

Genre: Research Report (0.40)

Industry:

Information Technology (1.00)
Materials > Paper & Forest Products > Paper Products (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.73)
Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Qin, Yujia, Ye, Yining, Fang, Junjie, Wang, Haoming, Liang, Shihao, Tian, Shizuo, Zhang, Junda, Li, Jiahao, Li, Yunxin, Huang, Shijue, Zhong, Wanjun, Li, Kuanye, Yang, Jiale, Miao, Yu, Lin, Woyu, Liu, Longxiang, Jiang, Xu, Ma, Qianli, Li, Jingyu, Xiao, Xiaojun, Cai, Kai, Li, Chuang, Zheng, Yaowei, Jin, Chaolin, Li, Chen, Zhou, Xiao, Wang, Minchao, Chen, Haoli, Li, Zhaojian, Yang, Haihua, Liu, Haifeng, Lin, Feng, Peng, Tao, Liu, Xin, Shi, Guang

arXiv.org Artificial IntelligenceJan-21-2025

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks. Experiments demonstrate its superior performance: UI-TARS achieves SOTA performance in 10+ GUI agent benchmarks evaluating perception, grounding, and GUI task execution (see below). Notably, in the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude's 22.0 and 14.9 respectively. In AndroidWorld, UI-TARS achieves 46.6, surpassing GPT-4o's 34.5. UI-TARS incorporates several key innovations: (1) Enhanced Perception: leveraging a large-scale dataset of GUI screenshots for context-aware understanding of UI elements and precise captioning; (2) Unified Action Modeling, which standardizes actions into a unified space across platforms and achieves precise grounding and interaction through large-scale action traces; (3) System-2 Reasoning, which incorporates deliberate reasoning into multi-step decision making, involving multiple reasoning patterns such as task decomposition, reflection thinking, milestone recognition, etc. (4) Iterative Training with Reflective Online Traces, which addresses the data bottleneck by automatically collecting, filtering, and reflectively refining new interaction traces on hundreds of virtual machines. Through iterative training and reflection tuning, UI-TARS continuously learns from its mistakes and adapts to unforeseen situations with minimal human intervention. We also analyze the evolution path of GUI agents to guide the further development of this domain.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.12326

Country:

North America > Canada (0.28)
Europe > Middle East > Malta (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Workflow (1.00)
Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Information Technology (1.00)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Add feedback

GUICourse: From General Vision Language Models to Versatile GUI Agents

Chen, Wentong, Cui, Junbo, Hu, Jinyi, Qin, Yujia, Fang, Junjie, Zhao, Yue, Wang, Chongyi, Liu, Jun, Chen, Guirong, Huo, Yupeng, Yao, Yuan, Lin, Yankai, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceJun-17-2024

Utilizing Graphic User Interface (GUI) for human-computer interaction is essential for accessing a wide range of digital tools. Recent advancements in Vision Language Models (VLMs) highlight the compelling potential to develop versatile agents to help humans finish GUI navigation tasks. However, current VLMs are challenged in terms of fundamental abilities (OCR and grounding) and GUI knowledge (the functions and control methods of GUI elements), preventing them from becoming practical GUI agents. To solve these challenges, we contribute GUICourse, a suite of datasets to train visual-based GUI agents from general VLMs. First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs. Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions. Experiments demonstrate that our GUI agents have better performance on common GUI tasks than their baseline VLMs. Even the small-size GUI agent (with 3.1B parameters) can still work well on single-step and multi-step GUI tasks. Finally, we analyze the different varieties in the training stage of this agent by ablation study. Our source codes and datasets are released at https://github.com/yiye3/GUICourse.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.11317

Country:

Asia > China (0.46)
North America > United States (0.28)
North America > Canada > Ontario (0.14)

Genre:

Research Report (1.00)
Instructional Material (0.67)

Industry:

Education > Educational Setting > Online (0.68)
Information Technology > Services (0.46)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

UniMem: Towards a Unified View of Long-Context Large Language Models

Fang, Junjie, Tang, Likai, Bi, Hongzhe, Qin, Yujia, Sun, Si, Li, Zhenyu, Li, Haolun, Li, Yongjian, Cong, Xin, Yan, Yukun, Shi, Xiaodong, Song, Sen, Lin, Yankai, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceFeb-5-2024

Long-context processing is a critical ability that constrains the applicability of large language models. Although there exist various methods devoted to enhancing the long-context processing ability of large language models (LLMs), they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a unified framework that reformulates existing long-context methods from the view of memory augmentation of LLMs. UniMem is characterized by four key dimensions: Memory Management, Memory Writing, Memory Reading, and Memory Injection, providing a systematic theory for understanding various long-context methods. We reformulate 16 existing methods based on UniMem and analyze four representative methods: Transformer-XL, Memorizing Transformer, RMT, and Longformer into equivalent UniMem forms to reveal their design principles and strengths. Based on these analyses, we propose UniMix, an innovative approach that integrates the strengths of these algorithms. Experimental results show that UniMix achieves superior performance in handling long contexts with significantly lower perplexity than baselines.

large language model, machine learning, unimem, (16 more...)

arXiv.org Artificial Intelligence

2402.03009

Country: Asia > China (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback