AITopics | Bian, Jiang

Plotting

Bian, Jiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey

Guan, Shengyue, Xiong, Haoyi, Wang, Jindong, Bian, Jiang, Zhu, Bin, Lou, Jian-guang

arXiv.org Artificial IntelligenceMar-28-2025

This survey examines evaluation methods for large language model (LLM)-based agents in multi-turn conversational settings. Using a PRISMA-inspired framework, we systematically reviewed nearly 250 scholarly sources, capturing the state of the art from various venues of publication, and establishing a solid foundation for our analysis. Our study offers a structured approach by developing two interrelated taxonomy systems: one that defines \emph{what to evaluate} and another that explains \emph{how to evaluate}. The first taxonomy identifies key components of LLM-based agents for multi-turn conversations and their evaluation dimensions, including task completion, response quality, user experience, memory and context retention, as well as planning and tool integration. These components ensure that the performance of conversational agents is assessed in a holistic and meaningful manner. The second taxonomy system focuses on the evaluation methodologies. It categorizes approaches into annotation-based evaluations, automated metrics, hybrid strategies that combine human assessments with quantitative measures, and self-judging methods utilizing LLMs. This framework not only captures traditional metrics derived from language understanding, such as BLEU and ROUGE scores, but also incorporates advanced techniques that reflect the dynamic, interactive nature of multi-turn dialogues.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.22458

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fast Autoregressive Video Generation with Diagonal Decoding

Ye, Yang, Guo, Junliang, Wu, Haoyu, He, Tianyu, Pearce, Tim, Rashid, Tabish, Hofmann, Katja, Bian, Jiang

arXiv.org Artificial IntelligenceMar-18-2025

Autoregressive Transformer models have demonstrated impressive performance in video generation, but their sequential token-by-token decoding process poses a major bottleneck, particularly for long videos represented by tens of thousands of tokens. In this paper, we propose Diagonal Decoding (DiagD), a training-free inference acceleration algorithm for autoregressively pre-trained models that exploits spatial and temporal correlations in videos. Our method generates tokens along diagonal paths in the spatial-temporal token grid, enabling parallel decoding within each frame as well as partially overlapping across consecutive frames. The proposed algorithm is versatile and adaptive to various generative models and tasks, while providing flexible control over the trade-off between inference speed and visual quality. Furthermore, we propose a cost-effective finetuning strategy that aligns the attention patterns of the model with our decoding order, further mitigating the training-inference gap on small-scale models. Experiments on multiple autoregressive video generation models and datasets demonstrate that DiagD achieves up to $10\times$ speedup compared to naive sequential decoding, while maintaining comparable visual fidelity.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.1407

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling

Li, Hao, Huang, Yu-Hao, Xu, Chang, Schlegel, Viktor, Jiang, Ren-He, Batista-Navarro, Riza, Nenadic, Goran, Bian, Jiang

arXiv.org Artificial IntelligenceMar-5-2025

For example, realistic Time-series Generation (TSG) is a prominent synthetic medical electrocardiogram (ECG) patterns research area with broad applications in simulations, can be used to train medical residents (Hong & Chun, 2023), data augmentation, and counterfactual while simulating regional electricity usage can be used to analysis. While existing methods have shown stress test the power grid (Westgaard et al., 2021). Although promise in unconditional single-domain TSG, some remarkable works (Huang & Deng, 2023; Bao et al., real-world applications demand for cross-domain 2024) have been done for TSG, showing promising results approaches capable of controlled generation tailored in generating realistic and coherent time series (TS), most to domain-specific constraints and instancelevel of them focus on the basic setting--unconditional single requirements. In this paper, we argue that domain generation. However, in real application scenarios, text can provide semantic insights, domain information there are specific constraints or requirements for the generated and instance-specific temporal patterns, TS to be met, such as specifying domain-specific characteristics, to guide and improve TSG. We introduce "Text-incorporating prior knowledge (Yuan & Qiao, Controlled TSG", a task focused on generating realistic 2024), or satisfying operational constraints (Coletta et al., time series by incorporating textual descriptions.

data mining, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.02445

Country:

Asia (0.67)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.92)

Industry:

Transportation > Passenger (0.93)
Energy > Power Industry (0.65)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Liu, Yuliang, Lu, Junjie, Chen, Zhaoling, Qu, Chaofeng, Liu, Jason Klein, Liu, Chonghan, Cai, Zefan, Xia, Yunhui, Zhao, Li, Bian, Jiang, Zhang, Chuheng, Shen, Wei, Lin, Zhouhan

arXiv.org Artificial IntelligenceFeb-19-2025

Current approaches for training Process Reward Models (PRMs) often involve breaking down responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step's length into a fixed size. These approaches overlook the fact that specific words do not typically mark true decision points in a text. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model's confidence in predicting the next word. This division method provides more decision-making information at each step, enhancing downstream tasks, such as reward model learning. Moreover, our method does not require manual annotation. We demonstrate its effectiveness through experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation tasks. Experimental results indicate that the outcome PRM achieves state-of-the-art Best-of-N performance, surpassing greedy search strategy with token-level value-guided decoding, while also reducing construction costs by over 30% compared to existing open-source PRMs. In addition, we provide a thorough analysis and case study on the PRM's performance, transferability, and generalization capabilities.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.13943

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Generalized Temporal Tensor Decomposition with Rank-revealing Latent-ODE

Chen, Panqi, Cheng, Lei, Li, Jianlong, Li, Weichang, Liu, Weiqing, Bian, Jiang, Fang, Shikai

arXiv.org Machine LearningFeb-10-2025

Tensor decomposition is a fundamental tool for analyzing multi-dimensional data by learning low-rank factors to represent high-order interactions. While recent works on temporal tensor decomposition have made significant progress by incorporating continuous timestamps in latent factors, they still struggle with general tensor data with continuous indexes not only in the temporal mode but also in other modes, such as spatial coordinates in climate data. Additionally, the problem of determining the tensor rank remains largely unexplored in temporal tensor models. To address these limitations, we propose \underline{G}eneralized temporal tensor decomposition with \underline{R}ank-r\underline{E}vealing laten\underline{T}-ODE (GRET). Our approach encodes continuous spatial indexes as learnable Fourier features and employs neural ODEs in latent space to learn the temporal trajectories of factors. To automatically reveal the rank of temporal tensors, we introduce a rank-revealing Gaussian-Gamma prior over the factor trajectories. We develop an efficient variational inference scheme with an analytical evidence lower bound, enabling sampling-free optimization. Through extensive experiments on both synthetic and real-world datasets, we demonstrate that GRET not only reveals the underlying ranks of temporal tensors but also significantly outperforms existing methods in prediction performance and robustness against noise.

artificial intelligence, factor trajectory, machine learning, (11 more...)

arXiv.org Machine Learning

2502.06164

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation

Wang, Jinyu, Fu, Jingjing, Wang, Rui, Song, Lei, Bian, Jiang

arXiv.org Artificial IntelligenceFeb-6-2025

Despite notable advancements in Retrieval-Augmented Generation (RAG) systems that expand large language model (LLM) capabilities through external retrieval, these systems often struggle to meet the complex and diverse needs of real-world industrial applications. The reliance on retrieval alone proves insufficient for extracting deep, domain-specific knowledge performing in logical reasoning from specialized corpora. To address this, we introduce sPecIalized KnowledgE and Rationale Augmentation Generation (PIKE-RAG), focusing on extracting, understanding, and applying specialized knowledge, while constructing coherent rationale to incrementally steer LLMs toward accurate responses. Recognizing the diverse challenges of industrial tasks, we introduce a new paradigm that classifies tasks based on their complexity in knowledge extraction and application, allowing for a systematic evaluation of RAG systems' problem-solving capabilities. This strategic approach offers a roadmap for the phased development and enhancement of RAG systems, tailored to meet the evolving demands of industrial applications. Furthermore, we propose knowledge atomizing and knowledge-aware task decomposition to effectively extract multifaceted knowledge from the data chunks and iteratively construct the rationale based on original query and the accumulated knowledge, respectively, showcasing exceptional performance across various benchmarks.

knowledge management, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2501.11551

Country: Asia > China (0.28)

Genre: Research Report > Promising Solution (0.92)

Industry:

Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Government (0.68)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models

Wang, Ruiyu, Yuan, Yu, Sun, Shizhao, Bian, Jiang

arXiv.org Artificial IntelligenceFeb-5-2025

Creating Computer-Aided Design (CAD) models requires significant expertise and effort. Text-to-CAD, which converts textual descriptions into CAD parametric sequences, is crucial in streamlining this process. Recent studies have utilized ground-truth parametric sequences, known as sequential signals, as supervision to achieve this goal. However, CAD models are inherently multimodal, comprising parametric sequences and corresponding rendered visual objects. Besides,the rendering process from parametric sequences to visual objects is many-to-one. Therefore, both sequential and visual signals are critical for effective training. In this work, we introduce CADFusion, a framework that uses Large Language Models (LLMs) as the backbone and alternates between two training stages: the sequential learning (SL) stage and the visual feedback (VF) stage. In the SL stage, we train LLMs using ground-truth parametric sequences, enabling the generation of logically coherent parametric sequences. In the VF stage, we reward parametric sequences that render into visually preferred objects and penalize those that do not, allowing LLMs to learn how rendered visual objects are perceived and evaluated. These two stages alternate throughout the training, ensuring balanced learning and preserving benefits of both signals. Experiments demonstrate that CADFusion significantly improves performance, both qualitatively and quantitatively.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2501.19054

Country:

Asia (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scalable In-Context Learning on Tabular Data via Retrieval-Augmented Large Language Models

Wen, Xumeng, Zheng, Shun, Xu, Zhen, Sun, Yiming, Bian, Jiang

arXiv.org Artificial IntelligenceFeb-5-2025

Recent studies have shown that large language models (LLMs), when customized with post-training on tabular data, can acquire general tabular in-context learning (TabICL) capabilities. These models are able to transfer effectively across diverse data schemas and different task domains. However, existing LLM-based TabICL approaches are constrained to few-shot scenarios due to the sequence length limitations of LLMs, as tabular instances represented in plain text consume substantial tokens. To address this limitation and enable scalable TabICL for any data size, we propose retrieval-augmented LLMs tailored to tabular data. Our approach incorporates a customized retrieval module, combined with retrieval-guided instruction-tuning for LLMs. This enables LLMs to effectively leverage larger datasets, achieving significantly improved performance across 69 widely recognized datasets and demonstrating promising scaling behavior. Extensive comparisons with state-of-the-art tabular models reveal that, while LLM-based TabICL still lags behind well-tuned numeric models in overall performance, it uncovers powerful algorithms under limited contexts, enhances ensemble diversity, and excels on specific datasets. These unique properties underscore the potential of language as a universal and accessible interface for scalable tabular data learning.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.03147

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TimeDP: Learning to Generate Multi-Domain Time Series with Domain Prompts

Huang, Yu-Hao, Xu, Chang, Wu, Yueying, Li, Wu-Jun, Bian, Jiang

arXiv.org Artificial IntelligenceJan-9-2025

Time series generation models are crucial for applications like data augmentation and privacy preservation. Most existing time series generation models are typically designed to generate data from one specified domain. While leveraging data from other domain for better generalization is proved to work in other application areas, this approach remains challenging for time series modeling due to the large divergence in patterns among different real world time series categories. In this paper, we propose a multi-domain time series diffusion model with domain prompts, named TimeDP. In TimeDP, we utilize a time series semantic prototype module which defines time series prototypes to represent time series basis, each prototype vector serving as "word" representing some elementary time series feature. A prototype assignment module is applied to extract the extract domain specific prototype weights, for learning domain prompts as generation condition. During sampling, we extract "domain prompt" with few-shot samples from the target domain and use the domain prompts as condition to generate time series samples. Experiments demonstrate that our method outperforms baselines to provide the state-of-the-art in-domain generation quality and strong unseen domain generation capability.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.05403

Country: North America > United States (0.92)

Genre: Research Report (0.64)

Industry:

Banking & Finance (1.00)
Energy (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting

Zhang, Huanyu, Xu, Chang, Zhang, Yi-Fan, Zhang, Zhang, Wang, Liang, Bian, Jiang, Tan, Tieniu

arXiv.org Artificial IntelligenceDec-30-2024

Time series forecasting plays a crucial role in data mining, driving rapid advancements across numerous industries. With the emergence of large models, time series foundation models (TSFMs) have exhibited remarkable generalization capabilities, such as zero-shot learning, through large-scale pre-training. Meanwhile, Retrieval-Augmented Generation (RAG) methods have been widely employed to enhance the performance of foundation models on unseen data, allowing models to access to external knowledge. In this paper, we introduce TimeRAF, a Retrieval-Augmented Forecasting model that enhance zero-shot time series forecasting through retrieval-augmented techniques. We develop customized time series knowledge bases that are tailored to the specific forecasting tasks. TimeRAF employs an end-to-end learnable retriever to extract valuable information from the knowledge base. Additionally, we propose Channel Prompting for knowledge integration, which effectively extracts relevant information from the retrieved knowledge along the channel dimension. Extensive experiments demonstrate the effectiveness of our model, showing significant improvement across various domains and datasets.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.2081

Genre: Research Report (0.82)

Industry: Energy (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback