AITopics | ullman

Collaborating Authors

ullman

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

COINPRESS: PracticalPrivateMean andCovarianceEstimation

Neural Information Processing SystemsFeb-9-2026, 16:47:41 GMT

We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data that are accurate at small sample sizes.

algorithm, artificial intelligence, privacy, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > District of Columbia > Washington (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.55)

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

AdversariallyRobustStreamingAlgorithmsvia DifferentialPrivacy

Neural Information Processing SystemsFeb-7-2026, 07:18:39 GMT

Consider ascenario inwhich data items are being generated one by one, e.g., IP traffic monitoring or web searches.

algorithm, artificial intelligence, privacy, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Security & Privacy (0.68)

Add feedback

Towards aligned body representations in vision models

Gizdov, Andrey, Procopio, Andrea, Li, Yichen, Harari, Daniel, Ullman, Tomer

arXiv.org Artificial IntelligenceDec-2-2025

Human physical reasoning relies on internal "body" representations -- coarse, volumetric approximations that capture an object's extent and support intuitive predictions about motion and physics. While psychophysical evidence suggests humans use such coarse representations, their internal structure remains largely unknown. Here we test whether vision models trained for segmentation develop comparable representations. We adapt a psychophysical experiment conducted with 50 human participants to a semantic segmentation task and test a family of seven segmentation networks, varying in size. We find that smaller models naturally form human-like coarse body representations, whereas larger models tend toward overly detailed, fine-grain encodings. Our results demonstrate that coarse representations can emerge under limited computational resources, and that machine representations can provide a scalable path toward understanding the structure of physical reasoning in the brain.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2512.00365

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

A simulation-heuristics dual-process model for intuitive physics

Li, Shiqian, Ma, Yuxi, Yan, Jiajun, Dai, Bo, Peng, Yujia, Zhang, Chi, Zhu, Yixin

arXiv.org Artificial IntelligenceMay-20-2025

The role of mental simulation in human physical reasoning is widely acknowledged, but whether it is employed across scenarios with varying simulation costs and where its boundary lies remains unclear. Using a pouring-marble task, our human study revealed two distinct error patterns when predicting pouring angles, differentiated by simulation time. While mental simulation accurately captured human judgments in simpler scenarios, a linear heuristic model better matched human predictions when simulation time exceeded a certain boundary. Motivated by these observations, we propose a dual-process framework, Simulation-Heuristics Model (SHM), where intuitive physics employs simulation for short-time simulation but switches to heuristics when simulation becomes costly. By integrating computational methods previously viewed as separate into a unified model, SHM quantitatively captures their switching mechanism. The SHM aligns more precisely with human behavior and demonstrates consistent predictive performance across diverse scenarios, advancing our understanding of the adaptive nature of intuitive physical reasoning.

artificial intelligence, machine learning, physics, (16 more...)

arXiv.org Artificial Intelligence

2504.09546

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

A Systematic Review on the Evaluation of Large Language Models in Theory of Mind Tasks

Sarıtaş, Karahan, Tezören, Kıvanç, Durmazkeser, Yavuz

arXiv.org Artificial IntelligenceFeb-12-2025

In recent years, evaluating the Theory of Mind (ToM) capabilities of large language models (LLMs) has received significant attention within the research community. As the field rapidly evolves, navigating the diverse approaches and methodologies has become increasingly complex. This systematic review synthesizes current efforts to assess LLMs' ability to perform ToM tasks, an essential aspect of human cognition involving the attribution of mental states to oneself and others. Despite notable advancements, the proficiency of LLMs in ToM remains a contentious issue. By categorizing benchmarks and tasks through a taxonomy rooted in cognitive science, this review critically examines evaluation techniques, prompting strategies, and the inherent limitations of LLMs in replicating human-like mental state reasoning. A recurring theme in the literature reveals that while LLMs demonstrate emerging competence in ToM tasks, significant gaps persist in their emulation of human cognitive abilities.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.08796

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Missouri > Jackson County > Kansas City (0.14)
Asia > Singapore (0.04)
(13 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.88)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lower Bounds for Private Estimation of Gaussian Covariance Matrices under All Reasonable Parameter Regimes

Portella, Victor S., Harvey, Nick

arXiv.org Machine LearningApr-26-2024

We prove lower bounds on the number of samples needed to privately estimate the covariance matrix of a Gaussian distribution. Our bounds match existing upper bounds in the widest known setting of parameters. Our analysis relies on the Stein-Haff identity, an extension of the classical Stein's identity used in previous fingerprinting lemma arguments.

gradient, matrix, wishart distribution, (16 more...)

arXiv.org Machine Learning

2404.17714

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > British Columbia (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Smooth Lower Bounds for Differentially Private Algorithms via Padding-and-Permuting Fingerprinting Codes

Peter, Naty, Tsfadia, Eliad, Ullman, Jonathan

arXiv.org Artificial IntelligenceFeb-3-2024

Fingerprinting arguments, first introduced by Bun, Ullman, and Vadhan (STOC 2014), are the most widely used method for establishing lower bounds on the sample complexity or error of approximately differentially private (DP) algorithms. Still, there are many problems in differential privacy for which we don't know suitable lower bounds, and even for problems that we do, the lower bounds are not smooth, and usually become vacuous when the error is larger than some threshold. In this work, we present a new framework and tools to generate smooth lower bounds on the sample complexity of differentially private algorithms satisfying very weak accuracy. We illustrate the applicability of our method by providing new lower bounds in various settings: 1. A tight lower bound for DP averaging in the low-accuracy regime, which in particular implies a lower bound for the private 1-cluster problem introduced by Nissim, Stemmer, and Vadhan (PODS 2016). 2. A lower bound on the additive error of DP algorithms for approximate k-means clustering, as a function of the multiplicative error, which is tight for a constant multiplication error. 3. A lower bound for estimating the top singular vector of a matrix under DP in low-accuracy regimes, which is a special case of DP subspace estimation studied by Singhal and Steinke (NeurIPS 2021). Our main technique is to apply a padding-and-permuting transformation to a fingerprinting code. However, rather than proving our results using a black-box access to an existing fingerprinting code (e.g., Tardos' code), we develop a new fingerprinting lemma that is stronger than those of Dwork et al. (FOCS 2015) and Bun et al. (SODA 2017), and prove our lower bounds directly from the lemma. Our lemma, in particular, gives a simpler fingerprinting code construction with optimal rate (up to polylogarithmic factors) that is of independent interest.

algorithm, definition 1, theorem 1, (15 more...)

arXiv.org Artificial Intelligence

2307.07604

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

Narayanan, Shyam

arXiv.org Artificial IntelligenceJan-4-2024

We provide optimal lower bounds for two well-known parameter estimation (also known as statistical estimation) tasks in high dimensions with approximate differential privacy. First, we prove that for any $\alpha \le O(1)$, estimating the covariance of a Gaussian up to spectral error $\alpha$ requires $\tilde{\Omega}\left(\frac{d^{3/2}}{\alpha \varepsilon} + \frac{d}{\alpha^2}\right)$ samples, which is tight up to logarithmic factors. This result improves over previous work which established this for $\alpha \le O\left(\frac{1}{\sqrt{d}}\right)$, and is also simpler than previous work. Next, we prove that estimating the mean of a heavy-tailed distribution with bounded $k$th moments requires $\tilde{\Omega}\left(\frac{d}{\alpha^{k/(k-1)} \varepsilon} + \frac{d}{\alpha^2}\right)$ samples. Previous work for this problem was only able to establish this lower bound against pure differential privacy, or in the special case of $k = 2$. Our techniques follow the method of fingerprinting and are generally quite simple. Our lower bound for heavy-tailed estimation is based on a black-box reduction from privately estimating identity-covariance Gaussians. Our lower bound for covariance estimation utilizes a Bayesian approach to show that, under an Inverse Wishart prior distribution for the covariance matrix, no private estimator can be accurate even in expectation, without sufficiently many samples.

covariance estimation, differential privacy, estimation, (15 more...)

arXiv.org Artificial Intelligence

2310.06289

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind

Ma, Xiaomeng, Gao, Lingyu, Xu, Qihui

arXiv.org Artificial IntelligenceOct-23-2023

Theory of Mind (ToM), the capacity to comprehend the mental states of distinct individuals, is essential for numerous practical applications. With the development of large language models (LLMs), there is a heated debate about whether they are able to perform ToM tasks. Previous studies have used different tasks and prompts to test the ToM on LLMs and the results are inconsistent: some studies asserted these models are capable of exhibiting ToM, while others suggest the opposite. In this study, We present ToMChallenges, a dataset for comprehensively evaluating the Theory of Mind based on the Sally-Anne and Smarties tests with a diverse set of tasks. In addition, we also propose an auto-grader to streamline the answer evaluation process. We tested three models: davinci, turbo, and gpt-4. Our evaluation results and error analyses show that LLMs have inconsistent behaviors across prompts and tasks. Performing the ToM tasks robustly remains a challenge for the LLMs. In addition, our paper wants to raise awareness in evaluating the ToM in LLMs and we want to invite more discussion on how to design the prompts and tasks for ToM tasks that can better assess the LLMs' ability.

juanita, narrative, order belief, (15 more...)

arXiv.org Artificial Intelligence

2305.15068

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Media > Film (0.68)
Leisure & Entertainment (0.68)
Education (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models

Shapira, Natalie, Levy, Mosh, Alavi, Seyed Hossein, Zhou, Xuhui, Choi, Yejin, Goldberg, Yoav, Sap, Maarten, Shwartz, Vered

arXiv.org Artificial IntelligenceMay-24-2023

The escalating debate on AI's capabilities warrants developing reliable metrics to assess machine "intelligence". Recently, many anecdotal examples were used to suggest that newer large language models (LLMs) like ChatGPT and GPT-4 exhibit Neural Theory-of-Mind (N-ToM); however, prior work reached conflicting conclusions regarding those abilities. We investigate the extent of LLMs' N-ToM through an extensive evaluation on 6 tasks and find that while LLMs exhibit certain N-ToM abilities, this behavior is far from being robust. We further examine the factors impacting performance on N-ToM tasks and discover that LLMs struggle with adversarial examples, indicating reliance on shallow heuristics rather than robust ToM abilities. We caution against drawing conclusions from anecdotal examples, limited benchmark testing, and using human-designed psychological tests to evaluate models.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2305.14763

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > Canada > British Columbia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback