AITopics

We introduce F2LLM - Foundation to Feature Large Language Models, a suite of state-of-the-art embedding models in three sizes: 0.6B, 1.7B, and 4B. Unlike previous top-ranking embedding models that require massive contrastive pretraining, sophisticated training pipelines, and costly synthetic training data, F2LLM is directly finetuned from foundation models on 6 million query-document-negative tuples curated from open-source, non-synthetic datasets, striking a strong balance between training cost, model size, and embedding performance. On the MTEB English leaderboard, F2LLM-4B ranks 2nd among models with approximately 4B parameters and 7th overall, while F2LLM-1.7B ranks 1st among models in the 1B-2B size range. To facilitate future research in the field, we release the models, training dataset, and code, positioning F2LLM as a strong, reproducible, and budget-friendly baseline for future works.

huggingface, large language model, machine learning, (16 more...)

2510.02294

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.46)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.46)

Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems

Arora, Siddhant, Tian, Jinchuan, Futami, Hayato, Shi, Jiatong, Kashiwagi, Yosuke, Tsunoo, Emiru, Watanabe, Shinji

Most end-to-end (E2E) spoken dialogue systems (SDS) rely on voice activity detection (V AD) for turn-taking, but V AD fails to distinguish between pauses and turn completions. Duplex SDS models address this by predicting output continuously, including silence tokens, thus removing the need for explicit V AD. However, they often have complex dual-channel architecture and lag behind cascaded models in semantic reasoning. To overcome these challenges, we propose SCoT: a Streaming Chain-of-Thought (CoT) framework for Duplex SDS, alternating between processing fixed-duration user input and generating responses in a blockwise manner. Using frame-level alignments, we create intermediate targets--aligned user transcripts and system responses--for each block. Experiments show that our approach produces more coherent and interpretable responses than existing duplex methods while supporting lower-latency and overlapping interactions compared to turn-by-turn systems.

large language model, machine learning, natural language, (15 more...)

2510.02066

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage

Arora, Siddhant, Khan, Haidar, Sun, Kai, Dong, Xin Luna, Choudhary, Sajal, Moon, Seungwhan, Zhang, Xinyuan, Sagar, Adithya, Appini, Surya Teja, Patnaik, Kaushik, Sharma, Sanat, Watanabe, Shinji, Kumar, Anuj, Aly, Ahmed, Liu, Yue, Metze, Florian, Lin, Zhaojiang

End-to-end speech-in speech-out dialogue systems are emerging as a powerful alternative to traditional ASR-LLM-TTS pipelines, generating more natural, expressive responses with significantly lower latency. However, these systems remain prone to hallucinations due to limited factual grounding. While text-based dialogue systems address this challenge by integrating tools such as web search and knowledge graph APIs, we introduce the first approach to extend tool use directly into speech-in speech-out systems. A key challenge is that tool integration substantially increases response latency, disrupting conversational flow. To mitigate this, we propose Streaming Retrieval-Augmented Generation (Streaming RAG), a novel framework that reduces user-perceived latency by predicting tool queries in parallel with user speech, even before the user finishes speaking. Specifically, we develop a post-training pipeline that teaches the model when to issue tool calls during ongoing speech and how to generate spoken summaries that fuse audio queries with retrieved text results, thereby improving both accuracy and responsiveness. To evaluate our approach, we construct AudioCRAG, a benchmark created by converting queries from the publicly available CRAG dataset into speech form. Experimental results demonstrate that our streaming RAG approach increases QA accuracy by up to 200% relative (from 11.1% to 34.2% absolute) and further enhances user experience by reducing tool use latency by 20%. Importantly, our streaming RAG approach is modality-agnostic and can be applied equally to typed input, paving the way for more agentic, real-time AI assistants.

large language model, machine learning, natural language, (19 more...)

2510.02044

Country:

Asia (0.67)
North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Banking & Finance > Trading (0.93)
Media > Film (0.93)
Leisure & Entertainment > Sports > Basketball (0.47)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.87)
(2 more...)

Beyond Simple Fusion: Adaptive Gated Fusion for Robust Multimodal Sentiment Analysis

Wu, Han, Sun, Yanming, Yang, Yunhe, Wong, Derek F.

Multimodal sentiment analysis (MSA) leverages information fusion from diverse modalities (e.g., text, audio, visual) to enhance sentiment prediction. However, simple fusion techniques often fail to account for variations in modality quality, such as those that are noisy, missing, or semantically conflicting. This oversight leads to suboptimal performance, especially in discerning subtle emotional nuances. To mitigate this limitation, we introduce a simple yet efficient \textbf{A}daptive \textbf{G}ated \textbf{F}usion \textbf{N}etwork that adaptively adjusts feature weights via a dual gate fusion mechanism based on information entropy and modality importance. This mechanism mitigates the influence of noisy modalities and prioritizes informative cues following unimodal encoding and cross-modal interaction. Experiments on CMU-MOSI and CMU-MOSEI show that AGFN significantly outperforms strong baselines in accuracy, effectively discerning subtle emotions with robust performance. Visualization analysis of feature representations demonstrates that AGFN enhances generalization by learning from a broader feature distribution, achieved by reducing the correlation between feature location and prediction error, thereby decreasing reliance on specific locations and creating more robust multimodal feature representations.

machine learning, natural language, sentiment analysis, (18 more...)

2510.01677

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.87)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.66)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.66)

Shinichi Nakajima, Issei None Sato, Masashi Sugiyama, Kazuho Watanabe, Hiroko Kobayashi

Analysis of Variational Bayesian Latent Dirichlet Allocation: Weaker Sparsity Than MAP

Neural Information Processing SystemsOct-2-2025, 21:12:54 GMT

Neural Information Processing Systems http://nips.cc/

map, variational bayesian latent dirichlet allocation, weaker sparsity

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.40)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.40)

Pranjal Awasthi, Andrej Risteski

On some provably correct cases of variational inference for topic models

Neural Information Processing SystemsOct-2-2025, 08:06:56 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, natural language, (17 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.52)

Neural Information Processing SystemsSep-30-2025, 13:11:45 GMT

Lexical and Hierarchical Topic Regression

Inspired by a two-level theory that unifies agenda setting and ideological framing, we propose supervised hierarchical latent Dirichlet allocation (SHLDA) which jointly captures documents' multi-level topic structure and their polar response variables. Our model extends the nested Chinese restaurant process to discover a tree-structured topic hierarchy and uses both per-topic hierarchical and per-word lexical regression parameters to model the response variables. Experiments in a political domain and on sentiment analysis tasks show that SHLDA improves predictive accuracy while adding a new dimension of insight into how topics under discussion are framed.

lexical and hierarchical topic regression, response variable

Country: Asia > Middle East > Jordan (0.13)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.71)

Neural Information Processing SystemsSep-30-2025, 10:22:12 GMT

Spectral Methods for Supervised Topic Models

Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on either variational approximation or Monte Carlo sampling. This paper presents a novel spectral decomposition algorithm to recover the parameters of supervised latent Dirichlet allocation (sLDA) models. The Spectral-sLDA algorithm is provably correct and computationally efficient. We prove a sample complexity bound and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on a diverse range of synthetic and real-world datasets verify the theory and demonstrate the practical effectiveness of the algorithm.

name change, spectral method, supervised topic model, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.95)

Neural Information Processing SystemsSep-30-2025, 09:11:43 GMT

Analysis of Variational Bayesian Latent Dirichlet Allocation: Weaker Sparsity Than MAP

Latent Dirichlet allocation (LDA) is a popular generative model of various objects such as texts and images, where an object is expressed as a mixture of latent topics. In this paper, we theoretically investigate variational Bayesian (VB) learning in LDA. More specifically, we analytically derive the leading term of the VB free energy under an asymptotic setup, and show that there exist transition thresholds in Dirichlet hyperparameters around which the sparsity-inducing behavior drastically changes. Then we further theoretically reveal the notable phenomenon that VB tends to induce weaker sparsity than MAP in the LDA model, which is opposed to other models. We experimentally demonstrate the practical validity of our asymptotic theory on real-world Last.FM music data.

name change, variational bayesian latent dirichlet allocation, weaker sparsity, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)

Neural Information Processing SystemsSep-30-2025, 09:09:02 GMT

A provable SVD-based algorithm for learning topics in dominant admixture corpus

Topic models, such as Latent Dirichlet Allocation (LDA), posit that documents are drawn from admixtures of distributions over words, known as topics. The inference problem of recovering topics from such a collection of documents drawn from admixtures, is NP-hard. Making a strong assumption called separability, [4] gave the first provable algorithm for inference. For the widely used LDA model, [6] gave a provable algorithm using clever tensor-methods. But [4, 6] do not learn topic vectors with bounded $l_1$ error (a natural measure for probability vectors).

algorithm, dominant admixture corpus, provable svd-based algorithm, (10 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.58)