AITopics

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Neural Information Processing SystemsFeb-7-2026, 21:04:50 GMT

AMulti-dimensionalSafetyEvaluationSuite forMultimodalLargeLanguageModels

Atypical MLLM [7]consists of a pre-trained LLM, a pre-trained modality encoder, and a modality interface to connect them.

large language model, machine learning, natural language, (17 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(5 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Neural Information Processing SystemsDec-25-2025, 01:42:49 GMT

Dense Connector for MLLMs

The recent outstanding performance of MLLMs in multimodal understanding has garnered broad attention from both academia and industry. In the current MLLM rat race, the focus seems to be predominantly on the linguistic side.

artificial intelligence, dense connector, mllm, (4 more...)

Technology: Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsOct-10-2025, 22:29:56 GMT

fde7f40f8ced5735006810534dc66b33-Supplemental-Datasets_and_Benchmarks_Track.pdf

dataset, lesion, llava-v1, (12 more...)

Country:

North America > United States > Texas (0.04)
North America > United States > North Carolina (0.04)
North America > United States > Illinois (0.04)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Security & Privacy (0.68)

Neural Information Processing SystemsOct-9-2025, 22:13:01 GMT

2e3073cb65608aa887bb945c382e687f-Paper-Conference.pdf

arxiv preprint arxiv, dataset, preference dataset, (14 more...)

Country:

Europe (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(2 more...)

arXiv.org Artificial IntelligenceSep-16-2025

EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression

Xiao, Jingyu, Zhang, Zhongyi, Wan, Yuxuan, Huo, Yintong, Liu, Yang, Lyu, Michael R.

Multimodal Large Language Models have demonstrated exceptional performance in UI2Code tasks, significantly enhancing website development efficiency. However, these tasks incur substantially higher computational overhead than traditional code generation due to the large number of input image tokens and extensive output code tokens required. Our comprehensive study identifies significant redundancies in both image and code tokens that exacerbate computational complexity and hinder focus on key UI elements, resulting in excessively lengthy and often invalid HTML files. We propose EfficientUICoder, a compression framework for efficient UI code generation with three key components. First, Element and Layout-aware Token Compression preserves essential UI information by detecting element regions and constructing UI element trees. Second, Region-aware Token Refinement leverages attention scores to discard low-attention tokens from selected regions while integrating high-attention tokens from unselected regions. Third, Adaptive Duplicate Token Suppression dynamically reduces repetitive generation by tracking HTML/CSS structure frequencies and applying exponential penalties. Extensive experiments show EfficientUICoderachieves a 55%-60% compression ratio without compromising webpage quality and delivers superior efficiency improvements: reducing computational cost by 44.9%, generated tokens by 41.4%, prefill time by 46.6%, and inference time by 48.8% on 34B-level MLLMs. Code is available at https://github.com/WebPAI/EfficientUICoder.

artificial intelligence, machine learning, natural language, (19 more...)

2509.12159

Country:

Asia (0.70)
North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceSep-15-2025

QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads

Ahmad, Muhammad, Mazher, Khurram, Akram, Saqib, Tameem, Ahmad, Nasir, Saad Bin

We present QuantX: a tailored suite of recipes for LLM and VLM quantization. It is capable of quantizing down to 3-bit resolutions with minimal loss in performance. The quantization strategies in QuantX take into account hardware-specific constraints to achieve efficient dequantization during inference ensuring flexible trade-off between runtime speed, memory requirement and model accuracy. Our results demonstrate that QuantX achieves performance within 6% of the unquantized model for LlaVa-v1.6 quantized down to 3-bits for multiple end user tasks and outperforms recently published state-of-the-art quantization techniques. We further integrate one particular technique from QuantX into the popular Llama.cpp framework and show its feasibility in terms of runtime compared to the mainstream quantization techniques from Llama.cpp. Lastly, this manuscript provides insights into the LLM quantization process that motivated the range of recipes and options that are incorporated in QuantX.

large language model, machine learning, quantization, (18 more...)

2505.07531

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

arXiv.org Artificial IntelligenceJul-15-2025

OPENXRD: A Comprehensive Benchmark and Enhancement Framework for LLM/MLLM XRD Question Answering

Vosoughi, Ali, Shahnazari, Ayoub, Xi, Yufeng, Zhang, Zeliang, Hess, Griffin, Xu, Chenliang, Abdolrahim, Niaz

This work presents OPENXRD, an open-book pipeline designed for crystallography question answering, which integrates textual prompts with concise supporting content generated by GPT-4.5. Instead of using scanned textbooks, which may lead to copyright issues, OPENXRD generates compact, domain-specific references that help smaller models understand key concepts in X-ray diffraction (XRD). We evaluate OPENXRD on a well-defined set of 217 expert-level XRD questions by comparing different vision-language models, including GPT-4 and LLaVA-based frameworks such as Mistral, LLaMA, and QWEN, under both closed-book (without supporting material) and open-book (with supporting material) conditions. Our experimental results show significant accuracy improvements in models that use the GPT-4.5-generated summaries, particularly those with limited prior training in crystallography. OPENXRD uses knowledge from larger models to fill knowledge gaps in crystallography and shows that AI-generated texts can help smaller models reason more effectively in scientific tasks. While the current version of OPENXRD focuses on text-based inputs, we also explore future extensions such as adding real crystal diagrams or diffraction patterns to improve interpretation in specialized materials science contexts. Overall, OPENXRD shows that specialized open-book systems can be useful in materials science and provides a foundation for broader natural language processing (NLP) tools in critical scientific fields.

crystallography, large language model, machine learning, (21 more...)

2507.09155

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (0.68)
Government (0.46)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-25-2025

Visual hallucination detection in large vision-language models via evidential conflict

Huang, Tao, Liu, Zhekun, Wang, Rui, Zhang, Yang, Jing, Liping

Despite the remarkable multimodal capabilities of Large Vision-Language Models (LVLMs), discrepancies often occur between visual inputs and textual outputs--a phenomenon we term visual hallucination. This critical reliability gap poses substantial risks in safety-critical Artificial Intelligence (AI) applications, necessitating a comprehensive evaluation benchmark and effective detection methods. Firstly, we observe that existing visual-centric hallucination benchmarks mainly assess LVLMs from a perception perspective, overlooking hallucinations arising from advanced reasoning capabilities. We develop the Perception-Reasoning Evaluation Hallucination (PRE-HAL) dataset, which enables the systematic evaluation of both perception and reasoning capabilities of LVLMs across multiple visual semantics, such as instances, scenes, and relations. Comprehensive evaluation with this new benchmark exposed more visual vulnerabilities, particularly in the more challenging task of relation reasoning. To address this issue, we propose, to the best of our knowledge, the first Dempster-Shafer theory (DST)-based visual hallucination detection method for LVLMs through uncertainty estimation. This method aims to efficiently capture the degree of conflict in high-level features at the model inference phase. Specifically, our approach employs simple mass functions to mitigate the computational complexity of evidence combination on power sets. We conduct an extensive evaluation of state-of-the-art LVLMs, LLaVA-v1.5, mPLUG-Owl2 and mPLUG-Owl3, with the new PRE-HAL benchmark. Experimental results indicate that our method outperforms five baseline uncertainty metrics, achieving average AUROC improvements of 4%, 10%, and 7% across three LVLMs. Our code is available at https://github.com/HT86159/Evidential-Conflict.

large language model, machine learning, natural language, (18 more...)

doi: 10.1016/j.ijar.2025.109507

2506.19513

Country: Asia > China (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Neural Information Processing SystemsMay-26-2025, 21:42:27 GMT

Dense Connector for MLLMs

Do we fully leverage the potential of visual encoder in Multimodal Large Language Models (MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has garnered broad attention from both academia and industry. In the current MLLM rat race, the focus seems to be predominantly on the linguistic side. Yet, scant attention has been directed towards the visual signals utilized by MLLMs, often assumed to be the final high-level features extracted by a frozen visual encoder. In this paper, we introduce the Dense Connector - a simple, effective, and plug-and-play vision-language connector that significantly enhances existing MLLMs by leveraging multi-layer visual features, with minimal additional computational overhead. Building on this, we also propose the Efficient Dense Connector, which achieves performance comparable to LLaVA-v1.5 with only 25% of the visual tokens.

artificial intelligence, large language model, natural language, (4 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.61)