AITopics

Plotting

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-modal Situated Reasoning in 3D Scenes

Neural Information Processing SystemsMay-25-2025, 21:40:36 GMT

Situation awareness is essential for understanding and reasoning about 3D scenes in embodied AI agents. However, existing datasets and benchmarks for situated understanding are limited in data modality, diversity, scale, and task scope. To address these limitations, we propose Multi-modal Situated Question Answering (MSQA), a large-scale multi-modal situated reasoning dataset, scalably collected leveraging 3D scene graphs and vision-language models (VLMs) across a diverse range of real-world 3D scenes. MSQA includes 251K situated question-answering pairs across 9 distinct question categories, covering complex scenarios within 3D scenes. We introduce a novel interleaved multi-modal input setting in our benchmark to provide text, image, and point cloud for situation and question description, resolving ambiguity in previous single-modality convention (e.g., text). Additionally, we devise the Multi-modal Situated Next-step Navigation (MSNN) benchmark to evaluate models' situated reasoning for navigation. Comprehensive evaluations on MSQA and MSNN highlight the limitations of existing vision-language models and underscore the importance of handling multi-modal interleaved inputs and situation modeling. Experiments on data scaling and cross-domain transfer further demonstrate the efficacy of leveraging MSQA as a pre-training dataset for developing more powerful situated reasoning models.

large language model, machine learning, question answering, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.45)

Industry: Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
(3 more...)

Add feedback

Hyperbolic Embeddings of Supervised Models

Neural Information Processing SystemsMay-25-2025, 21:40:24 GMT

Models of hyperbolic geometry have been successfully used in ML for two main tasks: embedding models in unsupervised learning (e.g.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
(3 more...)

Add feedback

A Unifying Normative Framework of Decision Confidence

Neural Information Processing SystemsMay-25-2025, 21:40:12 GMT

Self-assessment of one's choices, i.e., confidence, is the topic of many decision neuroscience studies. Computational models of confidence, however, are limited to specific scenarios such as between choices with the same value. Here we present a normative framework for modeling decision confidence that is generalizable to various tasks and experimental setups.

experiment, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Washington > King County > Seattle (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding

Neural Information Processing SystemsMay-25-2025, 21:40:07 GMT

However, in PopManga, such a page receives only partial annotations, primarily highlighting large-scale depictions and main characters. This selective approach underscores our focus on significant elements over exhaustive detailing, which aligns with our annotation strategy to emphasize clarity and relevance in highly complex scenes.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding Marco Bertini 2

Neural Information Processing SystemsMay-25-2025, 21:40:01 GMT

The comic domain is rapidly advancing with the development of single-page analysis and synthesis models. However, evaluation metrics and datasets lag behind, often limited to small-scale or single-style test sets. We introduce a novel benchmark, CoMix, designed to evaluate the multi-task capabilities of models in comic analysis. Unlike existing benchmarks that focus on isolated tasks such as object detection or text recognition, CoMix addresses a broader range of tasks including object detection, speaker identification, character re-identification, reading order, and multi-modal reasoning tasks like character naming and dialogue generation. Our benchmark comprises three existing datasets with expanded annotations to support multi-task evaluation.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Government (0.46)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Supplementary Material - WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Models

Neural Information Processing SystemsMay-25-2025, 21:39:53 GMT

Q1 For what purpose was the dataset created? Was there a specific task in mind? Q2 Who created the dataset (e.g., which team, research group) and on behalf of which Q3 Who funded the creation of the dataset? Q1 What do the instances that comprise the dataset represent (e.g., documents, photos, Are there multiple types of instances (e.g., movies, users, and ratings; Q2 How many instances are there in total (of each type, if appropriate)? Is the sample representative of the larger set (e.g., geographic coverage)?

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Industry:

Law (0.68)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Models

Neural Information Processing SystemsMay-25-2025, 21:39:50 GMT

Cross-modal (image-to-text and text-to-image) retrieval is an established task used in evaluation benchmarks to test the performance of vision-language models (VLMs).

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Oceania > Australia (0.14)

Industry: Information Technology (0.70)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery Jürgen Schmidhuber 1,3

Neural Information Processing SystemsMay-25-2025, 21:39:39 GMT

Current state-of-the-art synchrony-based models encode object bindings with complex-valued activations and compute with real-valued weights in feedforward architectures. We argue for the computational advantages of a recurrent architecture with complex-valued weights. We propose a fully convolutional autoencoder, SynCx, that performs iterative constraint satisfaction: at each iteration, a hidden layer bottleneck encodes statistically regular configurations of features in particular phase relationships; over iterations, local constraints propagate and the model converges to a globally consistent configuration of phase assignments. Binding is achieved simply by the matrix-vector product operation between complex-valued weights and activations, without the need for additional mechanisms that have been incorporated into current synchrony-based models. SynCx outperforms or is strongly competitive with current models for unsupervised object discovery. SynCx also avoids certain systematic grouping errors of current models, such as the inability to separate similarly colored objects without additional supervision.

artificial intelligence, iteration, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Asia > Middle East > Saudi Arabia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

A survey and benchmark of high-dimensional Bayesian optimization of discrete sequences Richard Michael University of Copenhagen University of Copenhagen Simon Bartels

Neural Information Processing SystemsMay-25-2025, 21:38:23 GMT

Optimizing discrete black box functions is key in several domains, e.g. protein engineering and drug design. Due to the lack of gradient information and the need for sample efficiency, Bayesian optimization is an ideal candidate for these tasks. Several methods for high-dimensional continuous and categorical Bayesian optimization have been proposed recently. However, our survey of the field reveals highly heterogeneous experimental set-ups across methods and technical barriers for the replicability and application of published algorithms to real-world tasks. To address these issues, we develop a unified framework to test a vast array of high-dimensional Bayesian optimization methods and a collection of standardized black box functions representing real-world application domains in chemistry and biology. These two components of the benchmark are each supported by flexible, scalable, and easily extendable software libraries (poli and poli-baselines), allowing practitioners to readily incorporate new optimization objectives or discrete optimizers.

machine learning, natural language, optimization, (14 more...)

Neural Information Processing Systems

Country:

Europe > Denmark > Capital Region > Copenhagen (0.76)
North America > United States (0.67)

Genre: Overview (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Topic-Conversation Relevance (TCR) Dataset and Benchmarks Yaran Fan Jamie Pool Microsoft

Neural Information Processing SystemsMay-25-2025, 21:30:38 GMT

Workplace meetings are vital to organizational collaboration, yet a large percentage of meetings are rated as ineffective [1]. To help improve meeting effectiveness by understanding if the conversation is on topic, we create a comprehensive Topic-Conversation Relevance (TCR) dataset that covers a variety of domains and meeting styles. The TCR dataset includes 1,500 unique meetings, 22 million words in transcripts, and over 15,000 meeting topics, sourced from both newly collected Speech Interruption Meeting (SIM) data and existing public datasets. Along with the text data, we also open source scripts to generate synthetic meetings or create augmented meetings from the TCR dataset to enhance data diversity.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: Research Report (0.46)

Industry: Government (0.67)

Technology: