Goto

Collaborating Authors

 chi-squared test


On the Analogy between Human Brain and LLMs: Spotting Key Neurons in Grammar Perception

arXiv.org Artificial Intelligence

Artificial Neural Networks, the building blocks of AI, were inspired by the human brain's network of neurons. Over the years, these networks have evolved to replicate the complex capabilities of the brain, allowing them to handle tasks such as image and language processing. In the realm of Large Language Models, there has been a keen interest in making the language learning process more akin to that of humans. While neuroscientific research has shown that different grammatical categories are processed by different neurons in the brain, we show that LLMs operate in a similar way. Utilizing Llama 3, we identify the most important neurons associated with the prediction of words belonging to different part-of-speech tags. Using the achieved knowledge, we train a classifier on a dataset, which shows that the activation patterns of these key neurons can reliably predict part-of-speech tags on fresh data. The results suggest the presence of a subspace in LLMs focused on capturing part-of-speech tag concepts, resembling patterns observed in lesion studies of the brain in neuroscience.


Downsized and Compromised?: Assessing the Faithfulness of Model Compression

arXiv.org Artificial Intelligence

In real-world applications, computational constraints often require transforming large models into smaller, more efficient versions through model compression. While these techniques aim to reduce size and computational cost without sacrificing performance, their evaluations have traditionally focused on the trade-off between size and accuracy, overlooking the aspect of model faithfulness. This limited view is insufficient for high-stakes domains like healthcare, finance, and criminal justice, where compressed models must remain faithful to the behavior of their original counterparts. This paper presents a novel approach to evaluating faithfulness in compressed models, moving beyond standard metrics. We introduce and demonstrate a set of faithfulness metrics that capture how model behavior changes post-compression. Our contributions include introducing techniques to assess predictive consistency between the original and compressed models using model agreement, and applying chi-squared tests to detect statistically significant changes in predictive patterns across both the overall dataset and demographic subgroups, thereby exposing shifts that aggregate fairness metrics may obscure. We demonstrate our approaches by applying quantization and pruning to artificial neural networks (ANNs) trained on three diverse and socially meaningful datasets. Our findings show that high accuracy does not guarantee faithfulness, and our statistical tests detect subtle yet significant shifts that are missed by standard metrics, such as Accuracy and Equalized Odds. The proposed metrics provide a practical and more direct method for ensuring that efficiency gains through compression do not compromise the fairness or faithfulness essential for trustworthy AI.


Scaling Truth: The Confidence Paradox in AI Fact-Checking

arXiv.org Artificial Intelligence

The rise of misinformation underscores the need for scalable and reliable fact-checking solutions. Large language models (LLMs) hold promise in automating fact verification, yet their effectiveness across global contexts remains uncertain. We systematically evaluate nine established LLMs across multiple categories (open/closed-source, multiple sizes, diverse architectures, reasoning-based) using 5,000 claims previously assessed by 174 professional fact-checking organizations across 47 languages. Our methodology tests model generalizability on claims postdating training cutoffs and four prompting strategies mirroring both citizen and professional fact-checker interactions, with over 240,000 human annotations as ground truth. Findings reveal a concerning pattern resembling the Dunning-Kruger effect: smaller, accessible models show high confidence despite lower accuracy, while larger models demonstrate higher accuracy but lower confidence. This risks systemic bias in information verification, as resource-constrained organizations typically use smaller models. Performance gaps are most pronounced for non-English languages and claims originating from the Global South, threatening to widen existing information inequalities. These results establish a multilingual benchmark for future research and provide an evidence base for policy aimed at ensuring equitable access to trustworthy, AI-assisted fact-checking.


ggplot2 Based Plots with Statistical Details โ€ข ggstatsplot

#artificialintelligence

Extension of ggplot2, ggstatsplot creates graphics with details from statistical tests included in the plots themselves. It provides an easier syntax to generate information-rich plots for statistical analysis of continuous (violin plots, scatterplots, histograms, dot plots, dot-and-whisker plots) or categorical (pie and bar charts) data. Currently, it supports the most common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian versions of t-test/ANOVA, correlation analyses, contingency table analysis, meta-analysis, and regression analyses. References: Patil (2021) .


Race Bias Analysis of Bona Fide Errors in face anti-spoofing

arXiv.org Artificial Intelligence

Face recognition is the method of choice behind some of the most widely deployed biometric authentication systems, currently supporting a range of applications, from passport control at airports, to mobile phone or laptop login. A key weaknesses of the technology, preventing it from being employed in security sensitive applications in uncontrolled environments, as for example ATM machines for money withdrawal, is its vulnerability to presentation attacks, where imposters attempt to gain wrongful access by presenting in front of the system's camera a photo, or a video, or by wearing a mask resembling a registered person. As a solution to this problem, algorithms for presentation attack detection (PAD) are developed, that is, binary classifiers trained to distinguish between the bona fide samples coming from live subjects, and those coming from imposters. The large variety in the types of possible presentation attacks, and the large variation in the environmental conditions under which they might take place, make PAD a particularly challenging problem. However, the current state-of-the-art, utilising the power of deep learning, comprises classifiers with excellent accuracy rates, and a satisfactory generalisation power to at least a limited number of previously unseen attacks. Cross-database generalisation is still problematic, however, it is debatable if this is a real obstacle to the deployment of PAD algorithms in practical applications, since such algorithms as usually embedded in specific face recognition systems, with given camera specifications and configurations. Here, we deal with the problem of race bias in face anti-spoofing algorithms. It is a topic that has attracted considerably less research interest than accuracy and generalisation power, despite the fact that it raises ethical, legal, and regulatory considerations, which, by their own, can prevent adoption in specific applications. Addressing this gap, the aim of this paper is to provide a framework for studying the question: Does the classifier work equally well on people from all races?.


Feature selection: A comprehensive list of strategies

#artificialintelligence

Of course, the simplest strategy is to use your intuition. Sometimes it's obvious that some columns will not be used in any form in the final model (columns such as "ID", "FirstName", "LastName" etc). If you know that a particular column will not be used, feel free to drop it upfront. In our data, none of the columns stand out as such, so I'm not removing any in this step. Having missing values is not acceptable in machine learning, so people apply different strategies to clean up missing data (e.g., imputation).


Analysis on the Biodiversity in National Parks Projects

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. In this blog, we are going to be performing an analysis on the data set "Biodiversity in National Parks Projects", which is available in Kaggle.


Mistakes in Applying Univariate Feature Selection Methods

#artificialintelligence

In Python scikit-learn library, there are various univariate feature selection methods such as Regression F-score, ANOVA and Chi-squared. Perhaps due to the ease of applying these methods (sometimes with just a single line of code), it might be tempting to just use these methods without taking into consideration the type of features you have. I have seen some machine learning practitioners took this for granted and made this mistake (including myself). While the scikit-learn documentation is clear on which feature selection method should be used for regression and classification, it does not specify whether these methods are suitable to apply to both continuous and categorical features. Let's say you have a classification task and after reading the documentation, you know you should use either Chi-squared test or ANOVA.


Using the Chi-Squared test for feature selection with implementation

#artificialintelligence

Let's approach this problem of feature selection using Chi-Square a question and answer style. If you are a video guy, you may check out our youtube lecture on the same. Question 1: What is a feature? For any ML or DL problem, the data is arranged in rows and columns. Let's take the example of a titanic shipwreck problem. Question 2: What are the different types of features?


Graph-Based Intrusion Detection System for Controller Area Networks

arXiv.org Artificial Intelligence

The controller area network (CAN) is the most widely used intra-vehicular communication network in the automotive industry. Because of its simplicity in design, it lacks most of the requirements needed for a security-proven communication protocol. However, a safe and secured environment is imperative for autonomous as well as connected vehicles. Therefore CAN security is considered one of the important topics in the automotive research community. In this paper, we propose a four-stage intrusion detection system that uses the chi-squared method and can detect any kind of strong and weak cyber attacks in a CAN. This work is the first-ever graph-based defense system proposed for the CAN. Our experimental results show that we have a very low 5.26% misclassification for denial of service (DoS) attack, 10% misclassification for fuzzy attack, 4.76% misclassification for replay attack, and no misclassification for spoofing attack. In addition, the proposed methodology exhibits up to 13.73% better accuracy compared to existing ID sequence-based methods.