Goto

Collaborating Authors

 statistical reasoning


The Rarity Blind Spot: A Framework for Evaluating Statistical Reasoning in LLMs

arXiv.org Artificial Intelligence

Effective decision-making often relies on identifying what makes each candidate distinctive. While existing benchmarks for LLMs emphasize retrieving or summarizing information relevant to a given query, they do not evaluate a model's ability to identify globally distinctive features across a set of documents. We introduce Distinctive Feature Mining (DFM), a new task that challenges models to analyze a small-to-medium collection (10-40 documents) and surface features that are rare in the global context (e.g., appearing in less than 10% of documents). This setting mirrors real-world scenarios such as candidate selection or product differentiation, where statistical reasoning, not retrieval, is key. To enable systematic evaluation of this capability, we present DiFBench, a configurable benchmark creation framework with controllable parameters such as document set size and distinctiveness thresholds. Using DiFBench, we perform a large-scale assessment of distinctive feature mining across ten state-of-the-art LLMs. Our findings reveal a significant performance gap between general-purpose and reasoning-enhanced models. All models, however, substantially degrade as the task complexity and document count increase. We also find that a common failure mode is misidentifying frequent features as distinctive. These insights reveal core limitations in contemporary LLMs' abilities to perform fine-grained, statistical reasoning and rarity detection.


ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts

arXiv.org Artificial Intelligence

Scientific fact-checking has mostly focused on text and tables, overlooking scientific charts, which are key for presenting quantitative evidence and statistical reasoning. We introduce ClimateViz, the first large-scale benchmark for scientific fact-checking using expert-curated scientific charts. ClimateViz contains 49,862 claims linked to 2,896 visualizations, each labeled as support, refute, or not enough information. To improve interpretability, each example includes structured knowledge graph explanations covering trends, comparisons, and causal relations. We evaluate state-of-the-art multimodal language models, including both proprietary and open-source systems, in zero-shot and few-shot settings. Results show that current models struggle with chart-based reasoning: even the best systems, such as Gemini 2.5 and InternVL 2.5, reach only 76.2 to 77.8 percent accuracy in label-only settings, far below human performance (89.3 and 92.7 percent). Explanation-augmented outputs improve performance in some models. We released our dataset and code alongside the paper.


GPT's Judgements Under Uncertainty

arXiv.org Artificial Intelligence

--We investigate the presence of cognitive biases in three large language models (LLMs): GPT -4o, Gemma 2, and Llama 3.1. The study uses 1,500 experiments across nine established cognitive biases to evaluate the responses and consistency of the models. GPT -4o demonstrated the strongest overall performance. Gemma 2 showed strengths in addressing the sunk cost fallacy and prospect theory; however, its performance varied across different biases. Llama 3.1 consistently underperformed, relying on heuristics and exhibiting frequent inconsistencies and contradictions. The findings highlight the challenges of achieving robust and generalizable reasoning in LLMs, and underscore the need for further development to mitigate biases in artificial general intelligence (AGI). The study emphasizes the importance of integrating statistical reasoning and ethical considerations in future AI development. Cognitive biases and heuristics are well-established phenomena of the human mind, shaping how individuals process information, make judgments, and make decisions. These biases emerge from heuristics -- mental shortcuts that simplify complex tasks by substituting them with cognitively easier alternatives [1]. While heuristics enable quick and efficient reasoning, they also introduce systematic errors that impact judgment and decision-making [2]-[4]. Understanding whether such biases, embedded in the data and interactions that shape Large Language Models (LLMs), are reflected in their outputs is not only critical for evaluating their alignment with human cognition but also vital for the development of Artificial General Intelligence (AGI). AGI, envisioned as systems capable of performing any intellectual task a human can, must navigate the intricacies of human-like reasoning while avoiding harmful or irresponsible biases.


Statistical Reasoning for Public Health 2: Regression Methods Coursera

@machinelearnbot

Structure: Good structure and went through all the basic principles of statistics in detail. Appreciated how it did not have to go through the methodology of each method, but taught us how to appreciate it and understand the data as it was presented in the literature. I liked how John went through the examples in the literature so it was good to see how it was utilised in practice. I wish there was a separate course to teach us how to use these methods with sample data, perhaps a taster of this would have been good to include? but I do understand that would be challenging for some. I think some in-video questions would have been good to check-up on the progress of learning.


Statistical Reasoning for Public Health 2: Regression Methods Coursera

#artificialintelligence

This module, along with module 2B introduces two key concepts in statistics/epidemiology, confounding and effect modification. A relation between an outcome and exposure of interested can be confounded if a another variable (or variables) is associated with both the outcome and the exposure. In such cases the crude outcome/exposure associate may over or under-estimate the association of interest. Confounding is an ever-present threat in non-randomized studies, but results of interest can be adjusted for potential confounders.


Intelligent Things It's all about machine learning

#artificialintelligence

Evolving from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores software algorithms that can learn from, and make predictions on volumes of data. Simply stated... Machine learning helps humans make data-driven decisions. Machine learning offers practical solutions that can maximize resource utilization, prolong the lifespan of IoT sensors, platforms and networks, and enables dynamic services architecture. Our connected world is increasingly dependent on big data -- at rest, and in years to come, streaming fast data -- in motion." With real-time predictive models, once a streaming fast data point has been observed it might never be seen again.


Intelligent Things It's all about machine learning

#artificialintelligence

Machine learning is increasingly being employed as a tool to help companies collect billions of data points, boil them down to what is actually meaningful, and predict what is likely to happen in the future. Simply stated... Machine learning helps make data-driven decisions. Machine learning offers practical solutions that can maximize resource utilization, prolong the lifespan of IoT sensors, platforms and networks, and enables dynamic services architecture. Our connected world is increasingly dependent on big data -- at rest, and in years to come, streaming fast data -- in motion." With real-time predictive models, once a streaming fast data point has been observed it might never be seen again.


Artificial Intelligence: Logic Reasoning v. Statistical Reasoning - DATAVERSITY

#artificialintelligence

Mitch De Felice recently wrote in CIO.com, "As a technology decision maker, all the vocabulary of artificial intelligence might be a bit overwhelming. In Figure 1 [to the left], starting from the bottom going up illustrates knowledge acquisition capabilities from a data usage perspective. By no means does this represent all the approaches to achieving an AI solution, but rather it illustrates how big data fits into the AI picture. Machine learning is represented by the right side of the above diagram, labeled, 'Statistical Reasoning.' There are two types of machine learning, unsupervised and supervised. When big data vendors speak of machine learning, they are usually speaking of supervised machine learning that has existed since the 1950s."