AITopics | best performance

Collaborating Authors

best performance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DualCnst: Enhancing Zero-Shot Out-of-Distribution Detection via Text-Image Consistency in Vision-Language Models

Neural Information Processing SystemsJun-22-2026, 23:52:43 GMT

Pretrained vision-language models (VLMs), such as CLIP, have shown promising zero-shot out-of-distribution (OOD) detection capabilities by leveraging semantic similarities between input images and textual labels. However, most existing approaches focus solely on expanding the label space in the text domain, ignoring complementary visual cues that can further enhance discriminative power. In this paper, we introduce DualCnst, a novel framework that integrates text-image dual consistency for improved zero-shot OOD detection. Specifically, we generate synthetic images from both ID and mined OOD textual labels using a text-to-image generative model, and jointly evaluate each test image based on (i) its semantic similarity to class labels and (ii) its visual similarity to the synthesized images. The resulting unified score function effectively combines multimodal information without requiring access to in-distribution images or additional training. We further provide theoretical analysis showing that incorporating multimodal negative labels reduces score variance and improves OOD separability. Extensive experiments across diverse OOD benchmarks demonstrate that DualCnst achieves state-of-theart performance while remaining scalable, data-agnostic, and fully compatible with prior text-only VLM-based methods. The code is publicly available at: https: //github.com/TMLSIAT/DualCnst.

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

59d2eaa5842fa641ff9b8e4c7ff0f6ee-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-17-2026, 12:19:52 GMT

While text-to-image models like GPT-4o-Image and FLUX are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequently undergo inadequate evaluation of their capabilities and limitations, potentially leading to misalignment and unsafe fine-tuning outcomes. To address this issue, we introduce MJ-BENCH, a novel benchmark which incorporates a comprehensive preference dataset to evaluate multimodal judges in providing feedback for image generation models across six key perspectives: alignment, safety, image quality, bias, composition, and visualization. Specifically, we evaluate a large variety of multimodal judges including smaller-sized CLIP-based scoring models, open-source VLMs, and close-source VLMs on each decomposed subcategory of our preference dataset. Experiments reveal that close-source VLMs generally provide better feedback, with GPT-4o outperforming other judges in average. Compared with open-source VLMs, smaller-sized scoring models can provide better feedback regarding textimage alignment and image quality, while VLMs provide more accurate feedback regarding safety and generation bias due to their stronger reasoning capabilities. Further studies in feedback scale reveal that VLM judges can generally provide more accurate and stable feedback in natural language than numerical scales. Notably, human evaluations on end-to-end fine-tuned models using separate feedback from these multimodal judges provide similar conclusions, further confirming the effectiveness of MJ-BENCH.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (0.93)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pool Me Wisely: On the Effect of Pooling in Transformer-Based Models

Neural Information Processing SystemsJun-16-2026, 17:45:26 GMT

Transformer models have become the dominant backbone for sequence modeling, leveraging self-attention to produce contextualized token representations. These are typically aggregated into fixed-size vectors via pooling operations for downstream tasks. While much of the literature has focused on attention mechanisms, the role of pooling remains underexplored despite its critical impact on model behavior. In this paper, we introduce a theoretical framework that rigorously characterizes the expressivity of Transformer-based models equipped with widely used pooling methods by deriving closed-form bounds on their representational capacity and the ability to distinguish similar inputs. Our analysis extends to different variations of attention formulations, demonstrating that these bounds hold across diverse architectural variants. We empirically evaluate pooling strategies across tasks requiring both global and local contextual understanding, spanning three major modalities: computer vision, natural language processing, and time-series analysis. Results reveal consistent trends in how pooling choices affect accuracy, sensitivity, and optimization behavior. Our findings unify theoretical and empirical perspectives, providing practical guidance for selecting or designing pooling mechanisms suited to specific tasks. This work positions pooling as a key architectural component in Transformer models and lays the foundation for more principled model design beyond attention alone.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Utility-Preserving Machine Unlearning with Implicit Gradient Surgery

Neural Information Processing SystemsJun-14-2026, 11:33:02 GMT

Machine unlearning (MU) aims to efficiently remove sensitive or harmful memory from a pre-trained model. The key challenge is to balance the potential tradeoff between unlearning efficacy and utility preservation, which involves forgetting undesirable information as defined while maintaining the model's original performance. One potential way to tackle this problem is to use multi-objective optimization to jointly optimize both the unlearning and utility preservation objectives. However, existing multi-objective methods only guarantee finding a Pareto-optimal solution without fine-grained control, which causes under-optimization of the unlearning objective. To this end, we first model MU as a constrained optimization problem, that is, optimizing the unlearning objective under the constraint of a bounded increase for utility loss.

artificial intelligence, machine learning, objective, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.45)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

AnomalyCoT: A Multi-Scenario Chain-of-Thought Dataset for Multimodal Large Language Models

Neural Information Processing SystemsJun-12-2026, 12:34:33 GMT

Industrial Anomaly Detection (IAD) is an indispensable quality control technology in modern production processes. Recently, on account of the outstanding visual comprehension and cross-domain knowledge transfer capabilities of multimodal large language models (MLLMs), existing studies have explored the application of MLLMs in the IAD domain and established some multimodal IAD datasets. However, although the latest datasets contain various fundamental IAD tasks, they formulate tasks in a general question-and-answer format lacking a rigorous reasoning process, and they are relatively limited in the diversity of scenarios, which restricts their reliability in practical applications. In this paper, we propose AnomalyCoT, a multimodal Chain-of-Thought (CoT) dataset for multi-scenario IAD tasks. It consists of 37,565 IAD samples with the CoT data and is defined by challenging composite IAD tasks. Meanwhile, the CoT data for each sample provides precise coordinates of anomaly regions, thereby improving visual comprehension of defects across different types. AnomalyCoT is constructed through a systematic pipeline and involves multiple manual operations. Based on AnomalyCoT, we conducted a comprehensive evaluation of various mainstream MLLMs and fine-tuned representative models in different ways. The final results show that Gemini-2.0-flash

data mining, large language model, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.77)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Add feedback

564127c03caab942e503ee6f810f54fd-Supplemental.pdf

Neural Information Processing SystemsApr-26-2026, 00:06:03 GMT

This paper solves three NP-hard routing problems, traveling salesman problem (TSP), prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP). This section provides detailed descriptions of PCTSP and CVRP (for TSP, see section 3). The PCTSP is similar to TSP, while there are differences in that we do not have to visit all the nodes and that the destination is not the first node but the depot node, i.e., a tour is not a cycle. Let N be the number of nodes. The problem instance of PCTSP is s = {(xi,λi,µi)}N+1i=1, where the xi R2 is in 2D euclidean coordinates, λi R is the penalty of unvisited node, and µi R is the prize of visited node. The L(π|s) is the tour length, and λ(π|s) is the total penalty of the unvisited nodes.

artificial intelligence, experiment, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Transportation (0.76)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Add feedback

MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making

Neural Information Processing SystemsMar-21-2026, 14:57:32 GMT

Foundation models are becoming valuable tools in medicine. Yet despite their promise, the best way to leverage Large Language Models (LLMs) in complex medical tasks remains an open question. We introduce a novel multi-agent framework, named **M**edical **D**ecision-making **Agents** (**MDAgents**) that helps to address this gap by automatically assigning a collaboration structure to a team of LLMs. The assigned solo or group collaboration structure is tailored to the medical task at hand, a simple emulation inspired by the way real-world medical decision-making processes are adapted to tasks of different complexities. We evaluate our framework and baseline methods using state-of-the-art LLMs across a suite of real-world medical knowledge and clinical diagnosis benchmarks, including a comparison ofLLMs' medical complexity classification against human physicians. MDAgents achieved the **best performance in seven out of ten** benchmarks on tasks requiring an understanding of medical knowledge and multi-modal reasoning, showing a significant **improvement of up to 4.2\%** ($p$ < 0.05) compared to previous methods' best performances. Ablation studies reveal that MDAgents effectively determines medical complexity to optimize for efficiency and accuracy across diverse medical tasks. Notably, the combination of moderator review and external medical knowledge in group collaboration resulted in an average accuracy **improvement of 11.8\%**.

artificial intelligence, large language model, natural language, (10 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.62)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SupplementaryforEmergenceofShapeBiasin ConvolutionalNeuralNetworksthroughActivation Sparsity 1 FurtherResultsoftheimpactofsparsityonShapeBiasBenchmark

Neural Information Processing SystemsFeb-17-2026, 15:18:51 GMT

We utilize the sparsity operation proposed in Section 3.1 for ResNet-50. We generalize section 4.2 in the main text to ResNet-50 and ViT-B architectures (Figure 1). We apply the Sparsity layer in a subset of the network. It is based on the intuition that the brain utilizes sparsity for long range communication butcan allowlocal dense computation. Wedivide thenetworks into chunks where within each chunk theneuron'sactivities areallowed tobedense (keep original) but the communication across different chunks is set to be sparse.

artificial intelligence, crazy-jack nips2023, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Appendix A Theory

Neural Information Processing SystemsFeb-14-2026, 16:24:51 GMT

In this section, we show the proofs of the results in the main body. Eq. (1) satisfies the triangle inequality, i.e., for any scoring functions For the second inequality, we prove it similarly. Before we present the proof of the theorem, we first provide some lemmas. By applying Lemma A.2, the following holds with probability at least 1 α: null R F). Thus we have: null R A.1, we can get that the margin loss satisfies the triangle inequality. By Lemma A.4, we have R By Theorem 4.4, the following holds for any Based on Theorem A.6, the following standard error bound for gradual AST can be derived similarly to Corollary 4.6.

artificial intelligence, denote, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

best performance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

DualCnst: Enhancing Zero-Shot Out-of-Distribution Detection via Text-Image Consistency in Vision-Language Models

59d2eaa5842fa641ff9b8e4c7ff0f6ee-Paper-Datasets_and_Benchmarks_Track.pdf

Pool Me Wisely: On the Effect of Pooling in Transformer-Based Models

Efficient Utility-Preserving Machine Unlearning with Implicit Gradient Surgery

AnomalyCoT: A Multi-Scenario Chain-of-Thought Dataset for Multimodal Large Language Models

564127c03caab942e503ee6f810f54fd-Supplemental.pdf

0d6270381e018b3d83eb9be7d0b06036-Paper-Conference.pdf

MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making

SupplementaryforEmergenceofShapeBiasin ConvolutionalNeuralNetworksthroughActivation Sparsity 1 FurtherResultsoftheimpactofsparsityonShapeBiasBenchmark

Appendix A Theory