AITopics | question ask

Collaborating Authors

question ask

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

28aad3b3b315d86910d7f4ee2867dfa4-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-9-2026, 21:16:24 GMT

large language model, machine learning, vlm, (20 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
(2 more...)

Genre: Research Report (0.46)

Industry: Transportation > Passenger (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

28aad3b3b315d86910d7f4ee2867dfa4-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-9-2025, 21:32:45 GMT

large language model, machine learning, question ask, (18 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.15)

Industry: Transportation > Passenger (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.52)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.37)

Add feedback

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs Irene Huang 1 Wei Lin

Neural Information Processing SystemsOct-9-2025, 21:32:42 GMT

Compositional Reasoning (CR) entails grasping the significance of attributes, relations, and word order.

large language model, machine learning, vlm, (20 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
(2 more...)

Genre: Research Report (0.46)

Industry: Transportation > Passenger (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions

Bortoletto, Matteo, Ruhdorfer, Constantin, Bulling, Andreas

arXiv.org Artificial IntelligenceSep-17-2025

Most existing Theory of Mind (ToM) benchmarks for foundation models rely on variations of the Sally-Anne test, offering only a very limited perspective on ToM and neglecting the complexity of human social interactions. To address this gap, we propose ToM-SSI: a new benchmark specifically designed to test ToM capabilities in environments rich with social interactions and spatial dynamics. While current ToM benchmarks are limited to text-only or dyadic interactions, ToM-SSI is multimodal and includes group interactions of up to four agents that communicate and move in situated environments. This unique design allows us to study, for the first time, mixed cooperative-obstructive settings and reasoning about multiple agents' mental state in parallel, thus capturing a wider range of social cognition than existing benchmarks. Our evaluations reveal that the current models' performance is still severely limited, especially in these new tasks, highlighting critical gaps for future research.

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.05066

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
North America > Dominican Republic (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

How to Capture and Study Conversations Between Research Participants and ChatGPT: GPT for Researchers (g4r.org)

Kim, Jin

arXiv.org Artificial IntelligenceMar-23-2025

As large language models (LLMs) like ChatGPT become increasingly integrated into our everyday lives--from customer service and education to creative work and personal productivity--understanding how people interact with these AI systems has become a pressing issue. Despite the widespread use of LLMs, researchers lack standardized tools for systematically studying people's interactions with LLMs. To address this issue, we introduce GPT for Researchers (G4R), or g4r.org, a free website that researchers can use to easily create and integrate a GPT Interface into their studies. At g4r.org, researchers can (1) enable their study participants to interact with GPT (such as ChatGPT), (2) customize GPT Interfaces to guide participants' interactions with GPT (e.g., set constraints on topics or adjust GPT's tone or response style), and (3) capture participants' interactions with GPT by downloading data on messages exchanged between participants and GPT. By facilitating study participants' interactions with GPT and providing detailed data on these interactions, G4R can support research on topics such as consumer interactions with AI agents or LLMs, AI-assisted decision-making, and linguistic patterns in human-AI communication. With this goal in mind, we provide a step-by-step guide to using G4R at g4r.org.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.18303

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > Experimental Study (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Faithfulness of LLM Self-Explanations for Commonsense Tasks: Larger Is Better, and Instruction-Tuning Allows Trade-Offs but Not Pareto Dominance

Siegel, Noah Y., Heess, Nicolas, Perez-Ortiz, Maria, Camburu, Oana-Maria

arXiv.org Artificial IntelligenceMar-17-2025

As large language models (LLMs) become increasingly capable, ensuring that their self-generated explanations are faithful to their internal decision-making process is critical for safety and oversight. In this work, we conduct a comprehensive counterfactual faithfulness analysis across 62 models from 8 families, encompassing both pretrained and instruction-tuned variants and significantly extending prior studies of counterfactual tests. We introduce phi-CCT, a simplified variant of the Correlational Counterfactual Test, which avoids the need for token probabilities while explaining most of the variance of the original test. Our findings reveal clear scaling trends: larger models are consistently more faithful on our metrics. However, when comparing instruction-tuned and human-imitated explanations, we find that observed differences in faithfulness can often be attributed to explanation verbosity, leading to shifts along the true-positive/false-positive Pareto frontier. While instruction-tuning and prompting can influence this trade-off, we find limited evidence that they fundamentally expand the frontier of explanatory faithfulness beyond what is achievable with pretrained models of comparable size. Our analysis highlights the nuanced relationship between instruction-tuning, verbosity, and the faithful representation of model decision processes.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.13445

Country:

Asia > Middle East > Republic of Türkiye (0.06)
Europe > France (0.04)
North America > United States > New York (0.04)
(23 more...)

Genre: Research Report > New Finding (0.47)

Industry:

Retail (1.00)
Media (1.00)
Health & Medicine (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Only 20% of Harvard students aced this three-question IQ test... how will YOU get on?

Daily Mail - Science & techJul-22-2024, 18:22:13 GMT

The world's shortest IQ test not only reveals your intelligence but also your level of patience. The test, called a Cognitive Reflection Test (CRT), consists of three math-based questions that target a person's ability to ignore their initial gut response in favor of a more rational thought process. Many quickly assume the answers are simple, but the Yale University professor who created the exam warned it isn't as straightforward as it may seem. Professor Shane Frederick created the CRT in 2005 and only 20 to 40 percent of students who have attempted it have passed. A Yale University professor designed a Cognitive Reflection Test ( CRT) that consists of three math-based questions that target a person's ability to ignore their initial gut response in favor of a more rational thought process Mathematical brain teasers are useful in helping people develop logical thinking by promoting brain stimulation and build visual and spatial reasoning skills.

harvard student aced, question ask, three-question iq test, (10 more...)

Daily Mail - Science & tech

Industry:

Education > Educational Setting > Higher Education (0.60)
Health & Medicine > Therapeutic Area > Neurology (0.37)

Technology: Information Technology > Artificial Intelligence > Cognitive Science > Creativity & Intelligence (0.64)

Add feedback

Functionality learning through specification instructions

de Araujo, Pedro Henrique Luz, Roth, Benjamin

arXiv.org Artificial IntelligenceNov-14-2023

Test suites assess natural language processing models' performance on specific functionalities: cases of interest involving model robustness, fairness, or particular linguistic capabilities. They enable fine-grained evaluations of model aspects that would otherwise go unnoticed in standard evaluation datasets, but they do not address the problem of how to fix the failure cases. Previous work has explored functionality learning by fine-tuning models on suite data. While this improves performance on seen functionalities, it often does not generalize to unseen ones and can harm general performance. This paper analyses a fine-tuning-free approach to functionality learning. For each functionality in a suite, we generate a specification instruction that encodes it. We combine the obtained specification instructions to create specification-augmented prompts, which we feed to language models pre-trained on natural instruction data to generate suite predictions. A core aspect of our analysis is to measure the effect that including a set of specifications has on a held-out set of unseen, qualitatively different specifications. Our experiments across four tasks and models ranging from 80M to 175B parameters show that smaller models struggle to follow specification instructions. However, larger models (> 3B params.) can benefit from specifications and even generalize desirable behaviors across functionalities.

large language model, machine learning, sentiment, (18 more...)

arXiv.org Artificial Intelligence

2311.08481

Country: