AITopics | claude 3

Collaborating Authors

claude 3

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving

Neural Information Processing SystemsJun-23-2026, 10:21:07 GMT

The ultimate goal of code agents is to solve complex tasks autonomously. Although large language models (LLMs) have made substantial progress in code generation, real-world tasks typically demand full-fledged code repositories rather than simple scripts. Building such repositories from scratch remains a major challenge. Fortunately, GitHub hosts a vast, evolving collection of open-source repositories, which developers frequently reuse as modular components for complex tasks. Yet, existing frameworks like OpenHands and SWE-Agent still struggle to effectively leverage these valuable resources.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia (0.45)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.68)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education (1.00)
Information Technology (0.92)
Health & Medicine > Diagnostic Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Linguini: A benchmark for language-agnostic linguistic reasoning

Neural Information Processing SystemsJun-23-2026, 02:13:37 GMT

We propose a new benchmark to measure a language model's linguistic reasoning skills without relying on pre-existing language-specific knowledge. The test covers 894 questions grouped in 160 problems across 75 (mostly) extremely low-resource languages, extracted from the International Linguistic Olympiad corpus. To attain high accuracy on this benchmark, models don't need previous knowledge of the tested language, as all the information needed to solve the linguistic puzzle is presented in the context. We find that, while all analyzed models rank below 25% accuracy, there is a significant gap between open and closed models, with the best-performing proprietary model scoring 24.05% and the best-performing open model 8.84%.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

ASemantic Parsing Framework for End-to-End Time Normalization

Neural Information Processing SystemsJun-23-2026, 00:43:08 GMT

Time normalization is the task of converting natural language temporal expressions into machine-readable representations.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.92)
Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

e5fb1d7e7c1541eb0b6132b7839baf34-Paper-Conference.pdf

Neural Information Processing SystemsJun-23-2026, 00:36:40 GMT

Human communication is motivated: people speak, write, and create content with a particular communicative intent in mind. As a result, information that large language models (LLMs) and AI agents process is inherently framed by humans' intentions and incentives. People are adept at navigating such nuanced information: we routinely identify benevolent or self-serving motives in order to decide what statements to trust. For LLMs to be effective in the real world, they too must critically evaluate content by factoring in the motivations of the source--for instance, weighing the credibility of claims made in a sales pitch. In this paper, we undertake a comprehensive study of whether LLMs have this capacity for motivational vigilance.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (1.00)
Health & Medicine > Therapeutic Area (0.67)
Education (0.67)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ALE-Bench: ABenchmark for Long-Horizon Objective-Driven Algorithm Engineering

Neural Information Processing SystemsJun-23-2026, 00:26:39 GMT

How well do AI systems perform in algorithm engineering for hard optimization problems in domains such as package-delivery routing, crew scheduling, factory production planning, and power-grid balancing? We introduce ALE-Bench, a new benchmark for evaluating AI systems on score-based algorithmic programming contests. Drawing on real tasks from the AtCoder Heuristic Contests, ALE-Bench presents optimization problems that are computationally hard and admit no known exact solution. Unlike short-duration, pass/fail coding benchmarks, ALE-Bench encourages iterative solution refinement over long time horizons. Our software framework supports interactive agent architectures that leverage test-run feedback and visualizations. Our evaluation of frontier LLMs revealed that while they demonstrate high performance on specific problems, a notable gap remains compared to humans in terms of consistency across problems and long-horizon problem-solving capabilities. This highlights the need for this benchmark to foster future AI advancements.

large language model, machine learning, natural language, (24 more...)

Neural Information Processing Systems

Country: Asia > Japan (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Information Technology > Services (0.45)
Transportation > Freight & Logistics Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

bench Goes Live

Neural Information Processing SystemsJun-22-2026, 22:17:37 GMT

The issue-resolving task, where a model generates patches to fix real-world bugs, has emerged as a key benchmark for evaluating the capabilities of large language models (LLMs). While SWE-bench has become the dominant benchmark in this domain, it suffers from several limitations: it has not been updated since its release, is restricted to only 12 repositories, and relies heavily on manual effort for constructing test instances and setting up executable environments, significantly limiting its scalability. We present SWE-bench-Live3, a live-updatable benchmark designed to address these limitations. SWE-bench-Live currently includes 1,890 tasks derived from real GitHub issues created since 2024, spanning 223 repositories. Each task is accompanied by a dedicated Docker image to ensure reproducible execution. Additionally, we introduce an automated curation pipeline that streamlines the entire process from instance creation to environment setup, removing manual bottlenecks and enabling scalability and continuous updates. We evaluate a range of state-of-the-art models and agent frameworks on SWE-bench-Live, offering detailed empirical insights into their real-world bug-fixing capabilities. By providing a fresh, diverse, and executable benchmark grounded in live repository activity, SWE-bench-Live supports reliable, large-scale assessment of code LLMs and code agents in realistic development settings.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

c42c8d51556fabb4b57fc86d3d3d0d09-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-22-2026, 16:45:22 GMT

QuestBench: acquire inf Can ormation LLMs ask in reasoning the right tasks? question to Lar ingly ge being language applied models to reasoning (LLMs) tasks are increassuch as math ning/coding tions typically [15, 34 [ , 18 46 assume , ], 59 logic , 63 all , 6 [ necessary 70 , 10 , 12 ]. Users orld scenarios may omit often crucial violate details this in in such en math cas vironme es, problems, LLMs nts with need and partial the robots ability observ might to proacti ability operate v .

information, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.45)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

767 A. Ablation on the Annotation Pipeline

Neural Information Processing SystemsJun-21-2026, 22:57:24 GMT

Notably, it is crucial for objects located at 772 the edges of images to maintain the closure of their bounding squares. Requiring existing MLLMs to 775 rethink may still not improve the accuracy of their responses. This may be because InternVL has been trained on more autonomous driving data. The final MLLM and prompt achieve an accuracy rate of approximately 781 90% on the entire OpenAD data. We conduct experiments by employing diverse visual Acc of and te+xtual prompts, along with various MLLMs, and select the*optimal approach.

ablation, annotation pipeline, artificial intelligence, (15 more...)

Neural Information Processing Systems

Industry: Transportation > Ground > Road (0.53)

Technology: Information Technology > Artificial Intelligence (0.37)

Add feedback

impacts

Neural Information Processing SystemsJun-20-2026, 18:40:04 GMT

The primary goal of PACBench is to catalyze the development of more capable, reliable, and physically grounded VLMs and their fine-tuned variants, often called VLAs for real-world robotic applications. Because VLA fine-tuning typically relies on low-level trajectory data rather than higher level reasoning, probing the underlying VLM's understanding of object Properties, action Affordances, and physical Constraints (PAC) gives us a grounded lens into the capabilities that downstream robotic policies will inherit. By diagnosing PAC weaknesses in the base model, researchers can distinguish whether a VLA's performance stems from genuine physical common sense or simply memorized motion patterns, and thus guide targeted improvements in model architectures, training methodologies, and dataset curation. In doing so, PACBench helps ensure that robotic systems become more predictable, less prone to errors from a lack of physical understanding, and better equipped for safe, effective collaboration in complex, everyday environments. By providing a fine-grained diagnostic tool, PACBench can help researchers and developers identify specific weaknesses in current models, thereby guiding targeted improvements in model architectures, training methodologies, and dataset curation. This, in turn, can lead to robotic systems that are more predictable, less prone to errors stemming from a lack of physical common sense, and better able to perform a wide range of useful tasks. The open release of our benchmark and its diverse data sources (including web-scale images, real-world humanoid captures, and simulated scenarios) is intended to foster broad community engagement and accelerate progress in this crucial area of AI. While any advancement in AI capabilities warrants ongoing consideration of its societal implications, our work focuses on enhancing the fundamental understanding and robustness of AI systems, which we see as a positive step towards more responsible AI development.

constraint, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Industry: Appliances & Durable Goods (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

9ecafb09de180aaad7b7205be7eb24a4-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-20-2026, 18:40:00 GMT

Vision-Language Models (VLMs) are increasingly pivotal for generalist robot manipulation, enabling tasks such as physical reasoning, policy generation, and failure detection. However, their proficiency in these high-level applications often assumes a deep understanding of low-level physical prerequisites, a capability that is largely unverified. To perform actions reliably, robots must comprehend intrinsic object properties (e.g., material, weight), action affordances (e.g., graspable, stackable), and physical constraints (e.g., stability, reachability, or an object's state like being closed). Despite their ubiquitous use in manipulation, we argue that off-the-shelf VLMs may lack this granular, physically-grounded understanding, as these specific prerequisites are often overlooked during training. Addressing this critical gap, we introduce PACBench, a comprehensive benchmark designed to systematically evaluate VLMs on their understanding of these core Properties, Affordances, and Constraints (PAC) from a task executability perspective. PAC Bench features a diverse dataset with more than 30,000 annotations, comprising 673 real-world images (115 object classes, 15 property types, 1-3 affordances defined per object class), 100 real-world humanoid view scenarios, and 120 unique simulated constraint scenarios across four tasks. Our evaluations reveal significant gaps in the ability of VLMs to grasp fundamental physical concepts, underscoring their current limitations for reliable robot manipulation and pointing to key areas that require targeted research. PACBench also serves as a standardized benchmark for rigorously evaluating the physical reasoning capabilities of VLMs guiding the development of more robust and physically grounded models for robot manipulation.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Appliances & Durable Goods (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback