AITopics

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.81)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.59)

Neural Information Processing SystemsFeb-16-2026, 17:58:04 GMT

941938d6c3c57b4ef4a518965e238a6d-Paper-Conference.pdf

large language model, machine learning, natural language, (17 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
(2 more...)

Stanislav Fort, Stanislaw Jastrzebski

Large Scale Structure of Neural Network Loss Landscapes

Neural Information Processing SystemsFeb-12-2026, 01:58:40 GMT

Neural Information Processing Systems http://nips.cc/

landscape, loss landscape, optima, (16 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.04)
North America > Canada (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsFeb-10-2026, 17:14:02 GMT

GenRL: Multimodal-foundation world models for generalization in embodied agents

Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem.

agent, artificial intelligence, machine learning, (18 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Europe > Belgium > Flanders (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsFeb-9-2026, 06:14:35 GMT

VisualAnchorsAreStrongInformationAggregators ForMultimodalLargeLanguageModel

IntherealmofMultimodal LargeLanguage Models(MLLMs), vision-language connector plays acrucial role to link the pre-trained vision encoders with Large Language Models (LLMs). Despite itsimportance, thevision-language connector has been relatively less explored. In this study, we aim to propose a strong vision-language connector that enables MLLMs toachievehigh accuracywhile maintainlowcomputationcost.

justification, large language model, machine learning, (16 more...)

Country: Asia > China > Jiangsu Province > Changzhou (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Vendrame, Katia, Yusuf, Bolaji, Kesiraju, Santosh, Sedláček, Šimon, Plchot, Oldřich, Černocký, Jan

Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking

arXiv.org Artificial IntelligenceDec-1-2025

End-to-end spoken dialogue state tracking (DST) is made difficult by the tandem of having to handle speech input and data scarcity. Combining speech foundation encoders and large language models has been proposed in recent work as to alleviate some of this difficulty. Although this approach has been shown to result in strong spoken DST models, achieving state-of-the-art performance in realistic multi-turn DST, it struggles to generalize across domains and requires annotated spoken DST training data for each domain of interest. However, collecting such data for every target domain is both costly and difficult. Noting that textual DST data is more easily obtained for various domains, in this work, we propose jointly training on available spoken DST data and written textual data from other domains as a way to achieve cross-domain generalization. We conduct experiments which show the efficacy of our proposed method for getting good cross-domain DST performance without relying on spoken training data from the target domains.

artificial intelligence, natural language, target domain, (15 more...)

2511.22503

Country: Europe > Czechia (0.47)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

arXiv.org Artificial IntelligenceNov-18-2025

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention

Pang, Yuqi, Yang, Bowen, Cao, Yun, Fan, Rong, Li, Xiaoyu, He, Chen

Vision large language models (VLLMs) are focusing primarily on handling complex and fine-grained visual information by incorporating advanced vision encoders and scaling up visual models. However, these approaches face high training and inference costs, as well as challenges in extracting visual details, effectively bridging across modalities. In this work, we propose a novel visual framework, MoCHA, to address these issues. Our framework integrates four vision backbones (i.e., CLIP, SigLIP, DINOv2 and ConvNeXt) to extract complementary visual features and is equipped with a sparse Mixture of Experts Connectors (MoECs) module to dynamically select experts tailored to different visual dimensions. To mitigate redundant or insufficient use of the visual information encoded by the MoECs module, we further design a Hierarchical Group Attention (HGA) with intra- and inter-group operations and an adaptive gating strategy for encoded visual features. We train MoCHA on two mainstream LLMs (e.g., Phi2-2.7B and Vicuna-7B) and evaluate their performance across various benchmarks. Notably, MoCHA outperforms state-of-the-art open-weight models on various tasks. For example, compared to CuMo (Mistral-7B), our MoCHA (Phi2-2.7B) presents outstanding abilities to mitigate hallucination by showing improvements of 3.25% in POPE and to follow visual instructions by raising 153 points on MME. Finally, ablation studies further confirm the effectiveness and robustness of the proposed MoECs and HGA in improving the overall performance of MoCHA.

large language model, machine learning, natural language, (20 more...)

2507.22805

Country:

Europe (1.00)
North America > United States (0.69)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Fortier, Alexandrine, Thebaud, Thomas, Villalba, Jesús, Dehak, Najim, Cardinal, Patrick

Backdoor Attacks Against Speech Language Models

arXiv.org Artificial IntelligenceNov-14-2025

Large Language Models (LLMs) and their multimodal extensions are becoming increasingly popular. One common approach to enable multimodality is to cascade domain-specific encoders with an LLM, making the resulting model inherit vulnerabilities from all of its components. In this work, we present the first systematic study of audio backdoor attacks against speech language models. We demonstrate its effectiveness across four speech encoders and three datasets, covering four tasks: automatic speech recognition (ASR), speech emotion recognition, and gender and age prediction. The attack consistently achieves high success rates, ranging from 90.76% to 99.41%. To better understand how backdoors propagate, we conduct a component-wise analysis to identify the most vulnerable stages of the pipeline. Finally, we propose a fine-tuning-based defense that mitigates the threat of poisoned pretrained encoders. Large language models (LLMs) are increasingly extended to multimodal settings, processing combinations of text, images, video, and audio (DeepMind, 2023; Biadsy et al., 2023; Radford et al., 2021; Rajaa & Tushar, 2024). While powerful, these systems inherit vulnerabilities from each of their components. Among them are backdoor attacks, in which a model behaves normally on clean inputs but produces targeted outputs when a hidden trigger is present (Gu et al., 2017). Prior backdoor studies have largely focused on single-modality large language models (Xu et al., 2023; Y ao et al., 2024) or speech processing models (Zhai et al., 2021; Koffas et al., 2022), leaving open questions about how such attacks propagate in a cascaded speech language model.

large language model, machine learning, natural language, (21 more...)

2510.01157

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Krchova, Ivona, Vieyra, Mariana Vargas, Scriminaci, Mario, Sidorenko, Andrey

Democratizing Tabular Data Access with an Open$\unicode{x2013}$Source Synthetic$\unicode{x2013}$Data SDK

arXiv.org Artificial IntelligenceNov-14-2025

Abstract--Machine learning development critically depends on access to high-quality data. However, increasing restrictions due to privacy, proprietary interests, and ethical concerns have created significant barriers to data accessibility. Synthetic data offers a viable solution by enabling safe, broad data usage without compromising sensitive information. This paper presents the MOSTL Y AI Synthetic Data Software Development Kit (SDK), an open-source toolkit designed specifically for synthesizing high-quality tabular data. The SDK integrates robust features such as differential privacy guarantees, fairness-aware data generation, and automated quality assurance into a flexible and accessible Python interface. Leveraging the T abularARGN autoregressive framework, the SDK supports diverse data types and complex multi-table and sequential datasets, delivering competitive performance with notable improvements in speed and usability. Currently deployed both as a cloud service and locally installable software, the SDK has seen rapid adoption, highlighting its practicality in addressing real-world data bottlenecks and promoting widespread data democratization. HE development of Machine Learning applications requires broad access to training data. This necessity has become more critical in recent years with the advent of Deep Learning, which requires large-scale datasets to effectively train models.

artificial intelligence, deep learning, machine learning, (18 more...)

2508.00718

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)