AITopics | He, Tao

Collaborating Authors

He, Tao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Qwen2.5-1M Technical Report

Yang, An, Yu, Bowen, Li, Chengyuan, Liu, Dayiheng, Huang, Fei, Huang, Haoyan, Jiang, Jiandong, Tu, Jianhong, Zhang, Jianwei, Zhou, Jingren, Lin, Junyang, Dang, Kai, Yang, Kexin, Yu, Le, Li, Mei, Sun, Minmin, Zhu, Qin, Men, Rui, He, Tao, Xu, Weijia, Yin, Wenbiao, Yu, Wenyuan, Qiu, Xiafei, Ren, Xingzhang, Yang, Xinlong, Li, Yong, Xu, Zhiying, Zhang, Zipeng

arXiv.org Artificial IntelligenceJan-25-2025

We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning are employed to effectively enhance long-context performance while reducing training costs. To promote the use of long-context models among a broader user base, we present and open-source our inference framework. This framework includes a length extrapolation method that can expand the model context lengths by at least four times, or even more, without additional training. To reduce inference costs, we implement a sparse attention method along with chunked prefill optimization for deployment scenarios and a sparsity refinement method to improve precision. Additionally, we detail our optimizations in the inference engine, including kernel optimization, pipeline parallelism, and scheduling optimization, which significantly enhance overall inference performance. By leveraging our inference framework, the Qwen2.5-1M models achieve a remarkable 3x to 7x prefill speedup in scenarios with 1 million tokens of context. This framework provides an efficient and powerful solution for developing applications that require long-context processing using open-source models. The Qwen2.5-1M series currently includes the open-source models Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, as well as the API-accessed model Qwen2.5-Turbo. Evaluations show that Qwen2.5-1M models have been greatly improved in long-context tasks without compromising performance in short-context scenarios. Specifically, the Qwen2.5-14B-Instruct-1M model significantly outperforms GPT-4o-mini in long-context tasks and supports contexts eight times longer.

large language model, machine learning, qwen2, (19 more...)

arXiv.org Artificial Intelligence

2501.15383

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Int2Planner: An Intention-based Multi-modal Motion Planner for Integrated Prediction and Planning

Chen, Xiaolei, Yan, Junchi, Liao, Wenlong, He, Tao, Peng, Pai

arXiv.org Artificial IntelligenceJan-22-2025

Motion planning is a critical module in autonomous driving, with the primary challenge of uncertainty caused by interactions with other participants. As most previous methods treat prediction and planning as separate tasks, it is difficult to model these interactions. Furthermore, since the route path navigates ego vehicles to a predefined destination, it provides relatively stable intentions for ego vehicles and helps constrain uncertainty. On this basis, we construct Int2Planner, an \textbf{Int}ention-based \textbf{Int}egrated motion \textbf{Planner} achieves multi-modal planning and prediction. Instead of static intention points, Int2Planner utilizes route intention points for ego vehicles and generates corresponding planning trajectories for each intention point to facilitate multi-modal planning. The experiments on the private dataset and the public nuPlan benchmark show the effectiveness of route intention points, and Int2Planner achieves state-of-the-art performance. We also deploy it in real-world vehicles and have conducted autonomous driving for hundreds of kilometers in urban areas. It further verifies that Int2Planner can continuously interact with the traffic environment. Code will be avaliable at https://github.com/cxlz/Int2Planner.

artificial intelligence, int2planner, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.12799

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.70)
Information Technology > Robotics & Automation (0.56)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues

He, Tao, Liao, Lizi, Cao, Yixin, Liu, Yuanxing, Sun, Yiheng, Chen, Zerui, Liu, Ming, Qin, Bing

arXiv.org Artificial IntelligenceDec-19-2024

Recent advancements in proactive dialogues have garnered significant attention, particularly for more complex objectives (e.g. emotion support and persuasion). Unlike traditional task-oriented dialogues, proactive dialogues demand advanced policy planning and adaptability, requiring rich scenarios and comprehensive policy repositories to develop such systems. However, existing approaches tend to rely on Large Language Models (LLMs) for user simulation and online learning, leading to biases that diverge from realistic scenarios and result in suboptimal efficiency. Moreover, these methods depend on manually defined, context-independent, coarse-grained policies, which not only incur high expert costs but also raise concerns regarding their completeness. In our work, we highlight the potential for automatically discovering policies directly from raw, real-world dialogue records. To this end, we introduce a novel dialogue policy planning framework, LDPP. It fully automates the process from mining policies in dialogue records to learning policy planning. Specifically, we employ a variant of the Variational Autoencoder to discover fine-grained policies represented as latent vectors. After automatically annotating the data with these latent policy labels, we propose an Offline Hierarchical Reinforcement Learning (RL) algorithm in the latent space to develop effective policy planning capabilities. Our experiments demonstrate that LDPP outperforms existing methods on two proactive scenarios, even surpassing ChatGPT with only a 1.8-billion-parameter LLM.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2412.14584

Country: Asia (0.46)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting (0.65)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

A Lightweight U-like Network Utilizing Neural Memory Ordinary Differential Equations for Slimming the Decoder

He, Quansong, Yao, Xiaojun, Wu, Jun, Yi, Zhang, He, Tao

arXiv.org Artificial IntelligenceDec-9-2024

In recent years, advanced U-like networks have demonstrated remarkable performance in medical image segmentation tasks. However, their drawbacks, including excessive parameters, high computational complexity, and slow inference speed, pose challenges for practical implementation in scenarios with limited computational resources. Existing lightweight U-like networks have alleviated some of these problems, but they often have pre-designed structures and consist of inseparable modules, limiting their application scenarios. In this paper, we propose three plug-and-play decoders by employing different discretization methods of the neural memory Ordinary Differential Equations (nmODEs). These decoders integrate features at various levels of abstraction by processing information from skip connections and performing numerical operations on upward path. Through experiments on the PH2, ISIC2017, and ISIC2018 datasets, we embed these decoders into different U-like networks, demonstrating their effectiveness in significantly reducing the number of parameters and FLOPs while maintaining performance. In summary, the proposed discretized nmODEs decoders are capable of reducing the number of parameters by about 20% ~ 50% and FLOPs by up to 74%, while possessing the potential to adapt to all U-like networks. Our code is available at https://github.com/nayutayuki/Lightweight-nmODE-Decoders-For-U-like-networks.

artificial intelligence, decoder, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.06262

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Dermatology (0.94)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation

Li, Muquan, Zhang, Dongyang, He, Tao, Xie, Xiurui, Li, Yuan-Fang, Qin, Ke

arXiv.org Artificial IntelligenceOct-23-2024

Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain of model compression, substantially reducing the dependency on the original training data. Nonetheless, conventional DFKD methods that employ synthesized training data are prone to the limitations of inadequate diversity and discrepancies in distribution between the synthesized and original datasets. To address these challenges, this paper introduces an innovative approach to DFKD through diverse diffusion augmentation (DDA). Specifically, we revise the paradigm of common data synthesis in DFKD to a composite process through leveraging diffusion models subsequent to data synthesis for self-supervised augmentation, which generates a spectrum of data samples with similar distributions while retaining controlled variations. Furthermore, to mitigate excessive deviation in the embedding space, we introduce an image filtering technique grounded in cosine similarity to maintain fidelity during the knowledge distillation process. Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets showcase the superior performance of our method across various teacher-student network configurations, outperforming the contemporary state-of-the-art DFKD methods. Code will be available at:https://github.com/SLGSP/DDA.

artificial intelligence, machine learning, survey article, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3664647.3680711

2410.17606

Country:

North America (0.46)
Asia > China (0.30)
Oceania > Australia > Victoria (0.29)

Genre:

Overview > Innovation (0.34)
Research Report > Promising Solution (0.34)

Industry: Information Technology (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Planning Like Human: A Dual-process Framework for Dialogue Planning

He, Tao, Liao, Lizi, Cao, Yixin, Liu, Yuanxing, Liu, Ming, Chen, Zerui, Qin, Bing

arXiv.org Artificial IntelligenceJun-8-2024

In proactive dialogue, the challenge lies not just in generating responses but in steering conversations toward predetermined goals, a task where Large Language Models (LLMs) typically struggle due to their reactive nature. Traditional approaches to enhance dialogue planning in LLMs, ranging from elaborate prompt engineering to the integration of policy networks, either face efficiency issues or deliver suboptimal performance. Inspired by the dualprocess theory in psychology, which identifies two distinct modes of thinking - intuitive (fast) and analytical (slow), we propose the Dual-Process Dialogue Planning (DPDP) framework. DPDP embodies this theory through two complementary planning systems: an instinctive policy model for familiar contexts and a deliberative Monte Carlo Tree Search (MCTS) mechanism for complex, novel scenarios. This dual strategy is further coupled with a novel two-stage training regimen: offline Reinforcement Learning for robust initial policy model formation followed by MCTS-enhanced on-the-fly learning, which ensures a dynamic balance between efficiency and strategic depth. Our empirical evaluations across diverse dialogue tasks affirm DPDP's superiority in achieving both high-quality dialogues and operational efficiency, outpacing existing methods.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.05374

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Education (0.67)
Leisure & Entertainment > Games (0.67)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild

He, Lang, Chen, Kai, Zhao, Junnan, Wang, Yimeng, Pei, Ercheng, Chen, Haifeng, Jiang, Jiewei, Zhang, Shiqing, Zhang, Jie, Wang, Zhongmin, He, Tao, Tiwari, Prayag

arXiv.org Artificial IntelligenceMay-8-2024

Depression can significantly impact many aspects of an individual's life, including their personal and social functioning, academic and work performance, and overall quality of life. Many researchers within the field of affective computing are adopting deep learning technology to explore potential patterns related to the detection of depression. However, because of subjects' privacy protection concerns, that data in this area is still scarce, presenting a challenge for the deep discriminative models used in detecting depression. To navigate these obstacles, a large-scale multimodal vlog dataset (LMVD), for depression recognition in the wild is built. In LMVD, which has 1823 samples with 214 hours of the 1475 participants captured from four multimedia platforms (Sina Weibo, Bilibili, Tiktok, and YouTube). A novel architecture termed MDDformer to learn the non-verbal behaviors of individuals is proposed. Extensive validations are performed on the LMVD dataset, demonstrating superior performance for depression detection. We anticipate that the LMVD will contribute a valuable function to the depression detection community. The data and code will released at the link: https://github.com/helang818/LMVD/.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Artificial Intelligence

2407.00024

Country:

Europe (1.00)
North America > United States (0.94)
Asia > China (0.74)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving

Jia, Xiaosong, Shi, Shaoshuai, Chen, Zijun, Jiang, Li, Liao, Wenlong, He, Tao, Yan, Junchi

arXiv.org Artificial IntelligenceMar-21-2024

As an essential task in autonomous driving (AD), motion prediction aims to predict the future states of surround objects for navigation. One natural solution is to estimate the position of other agents in a step-by-step manner where each predicted time-step is conditioned on both observed time-steps and previously predicted time-steps, i.e., autoregressive prediction. Pioneering works like SocialLSTM and MFP design their decoders based on this intuition. However, almost all state-of-the-art works assume that all predicted time-steps are independent conditioned on observed time-steps, where they use a single linear layer to generate positions of all time-steps simultaneously. They dominate most motion prediction leaderboards due to the simplicity of training MLPs compared to autoregressive networks. In this paper, we introduce the GPT style next token prediction into motion forecasting. In this way, the input and output could be represented in a unified space and thus the autoregressive prediction becomes more feasible. However, different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations. To this end, we propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations, e.g., encoding the transformation between coordinate systems for spatial relativity while adopting RoPE for temporal relativity. Empirically, by equipping with the aforementioned tailored designs, the proposed method achieves state-of-the-art performance in the Waymo Open Motion and Waymo Interaction datasets. Notably, AMP outperforms other recent autoregressive motion prediction methods: MotionLM and StateTransformer, which demonstrates the effectiveness of the proposed designs.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.13331

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.61)
Information Technology > Robotics & Automation (0.61)
Automobiles & Trucks (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.71)
(2 more...)

Add feedback

Unicron: Economizing Self-Healing LLM Training at Scale

He, Tao, Li, Xue, Wang, Zhibin, Qian, Kun, Xu, Jingbo, Yu, Wenyuan, Zhou, Jingren

arXiv.org Artificial IntelligenceDec-29-2023

Training large-scale language models is increasingly critical in various domains, but it is hindered by frequent failures, leading to significant time and economic costs. Current failure recovery methods in cloud-based settings inadequately address the diverse and complex scenarios that arise, focusing narrowly on erasing downtime for individual tasks without considering the overall cost impact on a cluster. We introduce Unicron, a workload manager designed for efficient self-healing in large-scale language model training. Unicron optimizes the training process by minimizing failure-related costs across multiple concurrent tasks within a cluster. Its key features include in-band error detection for real-time error identification without extra overhead, a dynamic cost-aware plan generation mechanism for optimal reconfiguration, and an efficient transition strategy to reduce downtime during state changes. Deployed on a 128-GPU distributed cluster, Unicron demonstrates up to a 1.9x improvement in training efficiency over state-of-the-art methods, significantly reducing failure recovery costs and enhancing the reliability of large-scale language model training.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2401.00134

Country: North America > United States (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture (1.00)

Add feedback

A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

Chu, Zheng, Chen, Jingchang, Chen, Qianglong, Yu, Weijiang, He, Tao, Wang, Haotian, Peng, Weihua, Liu, Ming, Qin, Bing, Liu, Ting

arXiv.org Artificial IntelligenceOct-16-2023

Chain-of-thought reasoning, a cognitive process fundamental to human intelligence, has garnered significant attention in the realm of artificial intelligence and natural language processing. However, there still remains a lack of a comprehensive survey for this arena. To this end, we take the first step and present a thorough survey of this research field carefully and widely. We use X-of-Thought to refer to Chain-of-Thought in a broad sense. In detail, we systematically organize the current research according to the taxonomies of methods, including XoT construction, XoT structure variants, and enhanced XoT. Additionally, we describe XoT with frontier applications, covering planning, tool use, and distillation. Furthermore, we address challenges and discuss some future directions, including faithfulness, multi-modal, and theory. We hope this survey serves as a valuable resource for researchers seeking to innovate within the domain of chain-of-thought reasoning.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2309.15402

Country:

Europe (1.00)
Asia > China (0.68)
North America > United States > California (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Overview (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback