AITopics | characterizing

Collaborating Authors

characterizing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KScope: A Framework for Characterizing the Knowledge Status of Language Models

Neural Information Processing SystemsJun-13-2026, 08:36:27 GMT

Characterizing a large language model's (LLM's) knowledge of a given question is challenging. As a result, prior work has primarily examined LLM behavior under knowledge conflicts, where the model's internal parametric memory contradicts information in the external context. However, this does not fully reflect how well the model knows the answer to the question. In this paper, we first introduce a taxonomy of five knowledge statuses based on the consistency and correctness of LLM knowledge modes. We then propose KScope, a hierarchical framework of statistical tests that progressively refines hypotheses about knowledge modes and characterizes LLM knowledge into one of these five statuses. We apply KScope to nine LLMs across four datasets and systematically establish: (1) Supporting context narrows knowledge gaps across models.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Characterizing the Representational Capacity of Neural Processes

Young, Robin

arXiv.org Machine LearningMay-26-2026

What functions can Neural Processes represent? We analyze the representational capacity of popular NP architectures: Conditional Neural Processes (CNPs), Attentive Neural Processes (ANPs), Transformer Neural Processes (TNPs), and their latent variants. We prove these architectures form a strict hierarchy. CNP-representable functions are exactly those depending on finitely many expected features of the context distribution. ANPs strictly generalize CNPs via query-dependent reweighting, enabling kernel smoothers. ConvCNPs and ANPs are incomparable; each contains functions outside the other, separated by stationarity versus translation equivariance. TNPs with $L$ self-attention layers capture $L$-hop context interactions. For latent NPs, we show finite-dimensional latents provide coherent sampling but do not circumvent encoder limitations; matching GP posterior distributions requires latent dimension scaling with context size. These results provide a theoretical foundation for architecture selection based on task structure.

artificial intelligence, machine learning, representational capacity, (16 more...)

arXiv.org Machine Learning

2605.2421

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation

Morisset, Lucas, Durmus, Alain, Hardy, Adrien

arXiv.org Machine LearningMay-12-2026

Data augmentation (DA) is now a standard ingredient in modern machine learning pipelines, with extensive empirical evidence reporting improvements in generalization across modalities and tasks Mumuni and Mumuni (2022); Wang et al. (2025). It is often used to encode task-relevant symmetries directly into the training procedure, for instance by encouraging invariance to image rotations or other transformations of the input Shorten and Khoshgoftaar (2019); Chen et al. (2020). It has also been identified as one of the most effective regularization techniques across both supervised learning settings Bishop (1995); Cubuk et al. (2019); Mumuni and Mumuni (2022); Wang et al. (2025) and self-supervised/unsupervised learning Feng et al. (2021); Van Assel et al. (2025). Domain-specific augmentation pipelines have been central to progress in computer vision Shorten and Khoshgoftaar (2019); Kumar et al. (2024), natural language processing Feng et al. (2021); Shorten et al. (2021); Bayer et al. (2022), and time-series or audio applications Wen et al. (2020); Iwana and Uchida (2021); Iglesias et al. (2023). Despite these empirical successes, the benefits of DA remain highly task-and data-dependent, and augmentation schemes are often engineered in an ad hoc manner Fawzi et al. (2016); Cubuk et al. (2019); Lim et al. (2019); Hataya et al. (2020). In contrast with this rich empirical literature, comprehensive theoretical analyses of DA remain relatively scarce. Two classical starting points are, first, the interpretation of additive Gaussian noise as a form of explicit (ridge-like) regularization Bishop (1995); Lin et al. (2024), and second, the idea that leveraging distributional invariances and group structure in the learning objective helps decrease the variance of the model without increasing its bias Chen et al. (2020). Yet, when applied to modern and complex augmentation schemes, these works either provide only upper bounds on the generalization error Lin et al. (2024), or require very strong assumptions on the data distribution (e.g.

machine learning, natural language, random feature regression, (18 more...)

arXiv.org Machine Learning

2605.1029

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.45)

Add feedback

Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

Neural Information Processing SystemsDec-26-2025, 02:02:08 GMT

In this paper, we provide a unified analysis of temporal difference learning algorithms with linear function approximators by exploiting their connections to Markov jump linear systems (MJLS). We tailor the MJLS theory developed in the control community to characterize the exact behaviors of the first and second order moments of a large family of temporal difference learning algorithms. For both the IID and Markov noise cases, we show that the evolution of some augmented versions of the mean and covariance matrix of the TD estimation error exactly follows the trajectory of a deterministic linear time-invariant (LTI) dynamical system. Applying the well-known LTI system theory, we obtain closed-form expressions for the mean and covariance matrix of the TD estimation error at any time step. We provide a tight matrix spectral radius condition to guarantee the convergence of the covariance matrix of the TD estimation error, and perform a perturbation analysis to characterize the dependence of the TD behaviors on learning rate. For the IID case, we provide an exact formula characterizing how the mean and covariance matrix of the TD estimation error converge to the steady state values at a linear rate. For the Markov case, we use our formulas to explain how the behaviors of TD learning algorithms are affected by learning rate and the underlying Markov chain. For both cases, upper and lower bounds for the mean square TD error are provided. The mean square TD error is shown to converge linearly to an exact limit.

characterizing, covariance matrix, temporal difference learning algorithm, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games

Neural Information Processing SystemsDec-24-2025, 08:57:24 GMT

In the first-order query model for zero-sum $K\times K$ matrix games, players observe the expected pay-offs for all their possible actions under the randomized action played by their opponent.

characterizing, first-order query complexity, nash equilibria, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.43)

Add feedback

Characterizing the Ventral Visual Stream with Response-Optimized Neural Encoding Models

Neural Information Processing SystemsDec-24-2025, 02:01:50 GMT

Decades of experimental research based on simple, abstract stimuli has revealed the coding principles of the ventral visual processing hierarchy, from the presence of edge detectors in the primary visual cortex to the selectivity for complex visual categories in the anterior ventral stream. However, these studies are, by construction, constrained by their $\textit{a priori}$ hypotheses. Furthermore, beyond the early stages, precise neuronal tuning properties and representational transformations along the ventral visual pathway remain poorly understood. In this work, we propose to employ response-optimized encoding models trained solely to predict the functional MRI activation, in order to gain insights into the tuning properties and representational transformations in the series of areas along the ventral visual pathway. We demonstrate the strong generalization abilities of these models on artificial stimuli and novel datasets. Intriguingly, we find that response-optimized models trained towards the ventral-occipital and lateral-occipital areas, but not early visual areas, can recapitulate complex visual behaviors like object categorization and perceived image-similarity in humans. We further probe the trained networks to reveal representational biases in different visual areas and generate experimentally testable hypotheses. Our analyses suggest a shape-based processing along the ventral visual stream and provide a unified picture of multiple neural phenomena characterized over the last decades with controlled fMRI studies.

characterizing, response-optimized neural encoding model, ventral visual stream, (5 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.59)

Industry: Health & Medicine (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Characterizing the Impacts of Semi-supervised Learning for Weak Supervision

Neural Information Processing SystemsDec-23-2025, 21:37:55 GMT

Labeling training data is a critical and expensive step in producing high accuracy ML models, whether training from scratch or fine-tuning. To make labeling more efficient, two major approaches are programmatic weak supervision (WS) and semi-supervised learning (SSL). More recent works have either explicitly or implicitly used techniques at their intersection, but in various complex and ad hoc ways. In this work, we define a simple, modular design space to study the use of SSL techniques for WS more systematically. Surprisingly, we find that fairly simple methods from our design space match the performance of more complex state-of-the-art methods, averaging a 3 p.p. increase in accuracy/F1-score across 8 standard WS benchmarks. Further, we provide practical guidance on when different components are worth their added complexity and training costs. Contrary to current understanding, we find using SSL is not necessary to obtain the best performance on most WS benchmarks but is more effective when: (1) end models are smaller, and (2) WS provides labels for only a small portion of training examples.

characterizing, name change, semi-supervised learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective

Go, Seokjin, Park, Joongun, More, Spandan, Wu, Hanjiang, Wang, Irene, Jezghani, Aaron, Krishna, Tushar, Mahajan, Divya

arXiv.org Artificial IntelligenceSep-22-2025

The rapid scaling of Large Language Models (LLMs) has pushed training workloads far beyond the limits of single-node analysis, demanding a deeper understanding of how these models behave across large-scale, multi-GPU systems. In this paper, we present a comprehensive characterization of LLM training across diverse real-world workloads and hardware platforms, including NVIDIA H100/H200 and AMD MI250 GPUs. We analyze dense and sparse models under various parallelism strategies -- tensor, pipeline, data, and expert -- and evaluate their effects on hardware utilization, power consumption, and thermal behavior. We further evaluate the effectiveness of optimizations such as activation recomputation and compute-communication overlap. Our findings show that performance is not determined solely by scaling hardware capacity. Scale-up systems with fewer, higher-memory GPUs can outperform scale-out systems in communication-bound regimes, but only under carefully tuned configurations; in other cases, scale-out deployments achieve superior throughput. We also show that certain parallelism combinations, such as tensor with pipeline, lead to bandwidth underutilization due to inefficient data chunking, while increasing microbatch sizes beyond a certain point induces bursty execution and peak power excursions that worsen thermal throttling. These insights reveal how training performance is shaped by complex interactions between hardware, system topology, and model execution. We conclude by offering recommendations for system and hardware design to improve the scalability and reliability of future LLM systems and workloads. The source code of this project is available at https://github.com/sitar-lab/CharLLM-PPT.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.10371

Country: North America > United States > Georgia > Fulton County > Atlanta (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Hardware (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Characterizing the Impacts of Semi-supervised Learning for Weak Supervision

Neural Information Processing SystemsMay-26-2025, 15:52:13 GMT

artificial intelligence, machine learning, semi-supervised learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.65)

Add feedback

Characterizing the Investigative Methods of Fictional Detectives with Large Language Models

de Lima, Edirlei Soares, Casanova, Marco A., Feijó, Bruno, Furtado, Antonio L.

arXiv.org Artificial IntelligenceMay-13-2025

Detective fiction, a genre defined by its complex narrative structures and character-driven storytelling, presents unique challenges for computational narratology, a research field focused on integrating literary theory into automated narrative generation. While traditional literary studies have offered deep insights into the methods and archetypes of fictional detectives, these analyses often focus on a limited number of characters and lack the scalability needed for the extraction of unique traits that can be used to guide narrative generation methods. In this paper, we present an AI-driven approach for systematically characterizing the investigative methods of fictional detectives. Our multi-phase workflow explores the capabilities of 15 Large Language Models (LLMs) to extract, synthesize, and validate distinctive investigative traits of fictional detectives. This approach was tested on a diverse set of seven iconic detectives - Hercule Poirot, Sherlock Holmes, William Murdoch, Columbo, Father Brown, Miss Marple, and Auguste Dupin - capturing the distinctive investigative styles that define each character. The identified traits were validated against existing literary analyses and further tested in a reverse identification phase, achieving an overall accuracy of 91.43%, demonstrating the method's effectiveness in capturing the distinctive investigative approaches of each detective. This work contributes to the broader field of computational narratology by providing a scalable framework for character analysis, with potential applications in AI-driven interactive storytelling and automated narrative generation.

detective, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.07601

Country:

Oceania > New Zealand (0.28)
Europe > United Kingdom (0.28)
South America > Brazil > Rio de Janeiro (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Law > Criminal Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback