AITopics | Lin, Tao

Collaborating Authors

Lin, Tao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Measuring AI Ability to Complete Long Tasks

Kwa, Thomas, West, Ben, Becker, Joel, Deng, Amy, Garcia, Katharyn, Hasin, Max, Jawhar, Sami, Kinniment, Megan, Rush, Nate, Von Arx, Sydney, Bloom, Ryan, Broadley, Thomas, Du, Haoxing, Goodrich, Brian, Jurkovic, Nikola, Miles, Luke Harold, Nix, Seraphina, Lin, Tao, Parikh, Neev, Rein, David, Sato, Lucas Jun Koba, Wijk, Hjalmar, Ziegler, Daniel M., Barnes, Elizabeth, Chan, Lawrence

arXiv.org Artificial IntelligenceMar-18-2025

Despite rapid progress on AI benchmarks, the real-world meaning of benchmark performance remains unclear. To quantify the capabilities of AI systems in terms of human capabilities, we propose a new metric: 50%-task-completion time horizon. This is the time humans typically take to complete tasks that AI models can complete with 50% success rate. We first timed humans with relevant domain expertise on a combination of RE-Bench, HCAST, and 66 novel shorter tasks. On these tasks, current frontier AI models such as Claude 3.7 Sonnet have a 50% time horizon of around 50 minutes. Furthermore, frontier AI time horizon has been doubling approximately every seven months since 2019, though the trend may have accelerated in 2024. The increase in AI models' time horizons seems to be primarily driven by greater reliability and ability to adapt to mistakes, combined with better logical reasoning and tool use capabilities. We discuss the limitations of our results -- including their degree of external validity -- and the implications of increased autonomy for dangerous capabilities. If these results generalize to real-world software tasks, extrapolation of this trend predicts that within 5 years, AI systems will be capable of automating many software tasks that currently take humans a month.

agent, baseliner, time horizon, (16 more...)

arXiv.org Artificial Intelligence

2503.14499

Industry:

Leisure & Entertainment (0.49)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Information Design with Unknown Prior

Lin, Tao, Li, Ce

arXiv.org Artificial IntelligenceJan-6-2025

Classical information design models (e.g., Bayesian persuasion and cheap talk) require players to have perfect knowledge of the prior distribution of the state of the world. Our paper studies repeated persuasion problems in which the information designer does not know the prior. The information designer learns to design signaling schemes from repeated interactions with the receiver. We design learning algorithms for the information designer to achieve no regret compared to using the optimal signaling scheme with known prior, under two models of the receiver's decision-making. (1) The first model assumes that the receiver knows the prior and can perform posterior update and best respond to signals. In this model, we design a learning algorithm for the information designer with $O(\log T)$ regret in the general case, and another algorithm with $\Theta(\log \log T)$ regret in the case where the receiver has only two actions. (2) The second model assumes that the receiver does not know the prior and employs a no-regret learning algorithm to take actions. We show that the information designer can achieve regret $O(\sqrt{\mathrm{rReg}(T) T})$, where $\mathrm{rReg}(T)=o(T)$ is an upper bound on the receiver's learning regret. Our work thus provides a learning foundation for the problem of information design with unknown prior.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.05533

Country:

Europe (0.68)
North America > United States > New York (0.46)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)

Genre: Research Report (0.63)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Baichuan-Omni Technical Report

Li, Yadong, Sun, Haoze, Lin, Mingan, Li, Tianpeng, Dong, Guosheng, Zhang, Tao, Ding, Bowen, Song, Wei, Cheng, Zhenglin, Huo, Yuqi, Chen, Song, Li, Xu, Pan, Da, Zhang, Shusen, Wu, Xin, Liang, Zheng, Liu, Jun, Zhang, Tao, Lu, Keer, Zhao, Yaqi, Shen, Yanjun, Yang, Fan, Yu, Kaicheng, Lin, Tao, Xu, Jianhua, Zhou, Zenan, Chen, Weipeng

arXiv.org Artificial IntelligenceDec-27-2024

The salient multimodal capabilities and interactive experience of GPT-4o highlight its critical role in practical applications, yet it lacks a high-performing open-source counterpart. In this paper, we introduce Baichuan-omni, the first open-source 7B Multimodal Large Language Model (MLLM) adept at concurrently processing and analyzing modalities of image, video, audio, and text, while delivering an advanced multimodal interactive experience and strong performance. We propose an effective multimodal training schema starting with 7B model and proceeding through two stages of multimodal alignment and multitask fine-tuning across audio, image, video, and text modal. This approach equips the language model with the ability to handle visual and audio data effectively. Demonstrating strong performance across various omni-modal and multimodal benchmarks, we aim for this contribution to serve as a competitive baseline for the open-source community in advancing multimodal understanding and real-time interaction.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.08565

Country:

Asia (0.46)
North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learn How to Query from Unlabeled Data Streams in Federated Learning

Sun, Yuchang, Li, Xinran, Lin, Tao, Zhang, Jun

arXiv.org Artificial IntelligenceDec-11-2024

Federated learning (FL) enables collaborative learning among decentralized clients while safeguarding the privacy of their local data. Existing studies on FL typically assume offline labeled data available at each client when the training starts. Nevertheless, the training data in practice often arrive at clients in a streaming fashion without ground-truth labels. Given the expensive annotation cost, it is critical to identify a subset of informative samples for labeling on clients. However, selecting samples locally while accommodating the global training objective presents a challenge unique to FL. In this work, we tackle this conundrum by framing the data querying process in FL as a collaborative decentralized decision-making problem and proposing an effective solution named LeaDQ, which leverages multi-agent reinforcement learning algorithms. In particular, under the implicit guidance from global information, LeaDQ effectively learns the local policies for distributed clients and steers them towards selecting samples that can enhance the global model's accuracy. Extensive simulations on image and text tasks show that LeaDQ advances the model performance in various FL scenarios, outperforming the benchmarking algorithms.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2412.08138

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.40)

Industry:

Information Technology > Security & Privacy (0.94)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Generative Modeling with Explicit Memory

Tang, Yi, Sun, Peng, Cheng, Zhenglin, Lin, Tao

arXiv.org Artificial IntelligenceDec-11-2024

Recent studies indicate that the denoising process in deep generative diffusion models implicitly learns and memorizes semantic information from the data distribution. These findings suggest that capturing more complex data distributions requires larger neural networks, leading to a substantial increase in computational demands, which in turn become the primary bottleneck in both training and inference of diffusion models. To this end, we introduce \textbf{G}enerative \textbf{M}odeling with \textbf{E}xplicit \textbf{M}emory (GMem), leveraging an external memory bank in both training and sampling phases of diffusion models. This approach preserves semantic information from data distributions, reducing reliance on neural network capacity for learning and generalizing across diverse datasets. The results are significant: our GMem enhances both training, sampling efficiency, and generation quality. For instance, on ImageNet at $256 \times 256$ resolution, GMem accelerates SiT training by over $46.7\times$, achieving the performance of a SiT model trained for $7M$ steps in fewer than $150K$ steps. Compared to the most efficient existing method, REPA, GMem still offers a $16\times$ speedup, attaining an FID score of 5.75 within $250K$ steps, whereas REPA requires over $4M$ steps. Additionally, our method achieves state-of-the-art generation quality, with an FID score of {3.56} without classifier-free guidance on ImageNet $256\times256$. Our code is available at \url{https://github.com/LINs-lab/GMem}.

diffusion model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.08781

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts

Wijk, Hjalmar, Lin, Tao, Becker, Joel, Jawhar, Sami, Parikh, Neev, Broadley, Thomas, Chan, Lawrence, Chen, Michael, Clymer, Josh, Dhyani, Jai, Ericheva, Elena, Garcia, Katharyn, Goodrich, Brian, Jurkovic, Nikola, Kinniment, Megan, Lajko, Aron, Nix, Seraphina, Sato, Lucas, Saunders, William, Taran, Maksym, West, Ben, Barnes, Elizabeth

arXiv.org Artificial IntelligenceNov-22-2024

Frontier AI safety policies highlight automation of AI research and development (R&D) by AI agents as an important capability to anticipate. However, there exist few evaluations for AI R&D capabilities, and none that are highly realistic and have a direct comparison to human performance. We introduce RE-Bench (Research Engineering Benchmark, v1), which consists of 7 challenging, open-ended ML research engineering environments and data from 71 8-hour attempts by 61 distinct human experts. We confirm that our experts make progress in the environments given 8 hours, with 82% of expert attempts achieving a non-zero score and 24% matching or exceeding our strong reference solutions. We compare humans to several public frontier models through best-of-k with varying time budgets and agent designs, and find that the best AI agents achieve a score 4x higher than human experts when both are given a total time budget of 2 hours per environment. However, humans currently display better returns to increasing time budgets, narrowly exceeding the top AI agent scores given an 8-hour budget, and achieving 2x the score of the top AI agent when both are given 32 total hours (across different attempts). Qualitatively, we find that modern AI agents possess significant expertise in many ML topics -- e.g. an agent wrote a faster custom Triton kernel than any of our human experts' -- and can generate and test solutions over ten times faster than humans, at much lower cost. We open-source the evaluation environments, human expert data, analysis code and agent trajectories to facilitate future research.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.15114

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.92)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Distribution-Aware Compensation Design for Sustainable Data Rights in Machine Learning

Shao, Jiaqi, Lin, Tao, Luo, Bing

arXiv.org Artificial IntelligenceOct-23-2024

Modern distributed learning systems face a critical challenge when clients request the removal of their data influence from trained models, as this process can significantly destabilize system performance and affect remaining participants. We propose an innovative mechanism that views this challenge through the lens of game theory, establishing a leader-follower framework where a central coordinator provides strategic incentives to maintain system stability during data removal operations. Our approach quantifies the ripple effects of data removal through a comprehensive analytical model that captures both system-wide and participant-specific impacts. We establish mathematical foundations for measuring participant utility and system outcomes, revealing critical insights into how data diversity influences both individual decisions and overall system stability. The framework incorporates a computationally efficient solution method that addresses the inherent complexity of optimizing participant interactions and resource allocation.

artificial intelligence, machine learning, participant, (14 more...)

arXiv.org Artificial Intelligence

2410.15045

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.95)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

MorphAgent: Empowering Agents through Self-Evolving Profiles and Decentralized Collaboration

Lu, Siyuan, Shao, Jiaqi, Luo, Bing, Lin, Tao

arXiv.org Artificial IntelligenceOct-19-2024

The rapid advancement of Large Language Models (LLMs) (Achiam et al., 2023; Touvron et al., 2023b) has ushered in a new era of artificial intelligence, enabling the creation of sophisticated AI agents capable of tackling complex tasks across various domains (Nakajima, 2023; Torantulino, 2023). As these AI systems become more intricate, there is a growing need for effective collaboration mechanisms that allow multiple agents to work together. This collaborative approach, known as Multi-Agent Systems (MAS) (Han et al., 2024), has shown great promise in addressing challenges that are too complex or diverse for single-agent systems (Hong et al., 2024; Liu et al., 2023). While existing MAS implementations have shown promising results, they often rely on predefined roles (Li et al., 2023), centralized coordination (Guo et al., 2024; Chen et al., 2024), or rigid organizational structures (Wang et al., 2024b; Hong et al., 2024). These approaches limit cooperative resilience within MAS (Chacon-Chamorro et al., 2024), which focuses on robustness and adaptability in dynamic, unpredictable environments. Figure 1 presents two examples to illustrate the real-world challenges with details elaborated below: Example 1.1 (Domain shift). Domain shift refers to a change in the characteristics or requirements of a task as it progresses through different phases or contexts, presenting new challenges and requiring different skill sets. For instance, a scientific research project could begin with literature review, move to experiment design, and conclude with result analysis and paper writing. These transitions demand a flexible and adaptive multi-agent system that can seamlessly adjust its collaborative strategies and agent roles as the task progresses.

agent, artificial intelligence, survey article, (16 more...)

arXiv.org Artificial Intelligence

2410.15048

Genre:

Overview (0.88)
Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)

Add feedback

CollabEdit: Towards Non-destructive Collaborative Knowledge Editing

Zheng, Jiamu, Zhang, Jinghuai, Du, Tianyu, Zhang, Xuhong, Yin, Jianwei, Lin, Tao

arXiv.org Artificial IntelligenceOct-12-2024

Collaborative learning of large language models (LLMs) has emerged as a new paradigm for utilizing private data from different parties to guarantee efficiency and privacy. Meanwhile, Knowledge Editing (KE) for LLMs has also garnered increased attention due to its ability to manipulate the behaviors of LLMs explicitly, yet leaves the collaborative KE case (in which knowledge edits of multiple parties are aggregated in a privacy-preserving and continual manner) unexamined. To this end, this manuscript dives into the first investigation of collaborative KE, in which we start by carefully identifying the unique three challenges therein, including knowledge overlap, knowledge conflict, and knowledge forgetting. We then propose a non-destructive collaborative KE framework, COLLABEDIT, which employs a novel model merging mechanism to mimic the global KE behavior while preventing the severe performance drop. Extensive experiments on two canonical datasets demonstrate the superiority of COLLABEDIT compared to other destructive baselines, and results shed light on addressing three collaborative KE challenges and future applications.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.09508

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry:

Leisure & Entertainment (0.68)
Information Technology > Security & Privacy (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ELICIT: LLM Augmentation via External In-Context Capability

Wang, Futing, Yan, Jianhao, Zhang, Yue, Lin, Tao

arXiv.org Artificial IntelligenceOct-11-2024

Enhancing the adaptive capabilities of large language models is a critical pursuit in both research and application. Traditional fine-tuning methods require substantial data and computational resources, especially for enhancing specific capabilities, while in-context learning is limited by the need for appropriate demonstrations and efficient token usage. Inspired by the expression of in-context learned capabilities through task vectors and the concept of modularization, we propose \alg, a framework consisting of two modules designed to effectively store and reuse task vectors to elicit the diverse capabilities of models without additional training or inference tokens. Our comprehensive experiments and analysis demonstrate that our pipeline is highly transferable across different input formats, tasks, and model architectures. ELICIT serves as a plug-and-play performance booster to enable adaptive elicitation of model capabilities. By externally storing and reusing vectors that represent in-context learned capabilities, \alg not only demonstrates the potential to operate modular capabilities but also significantly enhances the performance, versatility, adaptability, and scalability of large language models. Our code will be publicly available at https://github.com/LINs-lab/ELICIT.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.09343

Genre:

Overview (0.93)
Research Report > New Finding (0.67)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback