AITopics | Country

Collaborating Authors

Country

Apple to pay 250m to iPhone buyers over AI features lawsuit

BBC NewsMay-6-2026, 00:15:14 GMT

Apple has agreed to pay some iPhone buyers a collective $250m (£184m) to end a lawsuit accusing the company of misleading people about new artificial intelligence (AI) features and capabilities. In a settlement filed Tuesday in California federal court, Apple did not admit any wrongdoing, but agreed to a deal that will resolve claims in a large consolidated class action lawsuit filed last year. It accused Apple of false advertising around its AI features on the iPhone, which the company called Apple Intelligence, including an enhancement of its Siri voice assistant. Apple will pay between $25 and $95 to people who bought an iPhone 15 and iPhone 16 between June 2024 and March 2025. An Apple spokeswoman said the lawsuit was focused on the availability of two additional features in a lineup of many released as part of its Apple Intelligence rollout.

artificial intelligence, chatbot, natural language, (13 more...)

BBC News

Country:

Europe > United Kingdom (0.52)
North America > United States > California (0.25)

Industry: Law > Litigation (1.00)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)

Add feedback

SOC-ICNN: From Polyhedral to Conic Geometry for Learning Convex Surrogate Functions

Liu, Kang, Hu, Jianchen, Peng, Wei

arXiv.org Machine LearningMay-6-2026

Classical ReLU-based Input Convex Neural Networks (ICNNs) are equivalent to the optimal value functions of Linear Programming (LP). This intrinsic structural equivalence restricts their representational capacity to piecewise-linear polyhedral functions. To overcome this representational bottleneck, we propose the SOC-ICNN, an architecture that generalizes the underlying optimization class from LP to Second-Order Cone Programming (SOCP). By explicitly injecting positive semi-definite curvature and Euclidean norm-based conic primitives, our formulation introduces native smooth curvature into the representation while preserving a rigorous optimization-theoretic interpretation. We formally prove that SOC-ICNNs strictly expand the representational space of ReLU-ICNNs without increasing the asymptotic order of forward-pass complexity. Extensive experiments demonstrate that SOC-ICNN substantially improves function approximation, while delivering competitive downstream decision quality. The code is available at https://anonymous.4open.science/r/SOC-ICNN-4B18/.

artificial intelligence, machine learning, optimization problem, (20 more...)

arXiv.org Machine Learning

2604.22355

Country: North America > United States (0.28)

Genre: Research Report (0.63)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

Adaptive Confidence Intervals in Efron's Gaussian Two-Groups Model

Wang, Qiaosen, Chai, Shuwen, Gao, Chao

arXiv.org Machine LearningMay-6-2026

Robust uncertainty quantification is increasingly important in modern data analysis and is often formalized under Huber's model, which allows an $\varepsilon$-fraction of arbitrary corruptions. In many experimental sciences, however, the measurement protocol is well controlled, and contamination is more plausibly introduced upstream. Motivated by this noise-oblivious nature of adversaries, we study confidence intervals for the null location parameter $θ$ in Efron's Gaussian two-groups model, where an unknown fraction $\varepsilon$ of observations have arbitrarily shifted means, but all samples share the same law of additive Gaussian measurement noise with variance $σ^2$. We characterize the minimax-optimal length among confidence intervals with a prescribed coverage level uniformly over the unknown contamination proportion and all noise-oblivious adversaries. Although prior work has shown that the minimax point estimation rate of theta does not deteriorate when $\varepsilon$ becomes unknown, our results reveal that, with a given $σ^2$, the minimax-optimal length of confidence intervals that are adaptive to unknown $\varepsilon$ is of order $σ(n^{-1/4}+\varepsilon^{1/2}/\max\{1, \log(en \varepsilon^2)\}^{1/2})$, which is polynomially worse than the optimal length when $\varepsilon$ is known. When the variance $σ^2$ is also unknown, we show a further degradation: no adaptive confidence interval can be shorter than $Ω(σn^{-1/8})$. Algorithmically, we introduce a Fourier-based certification procedure built on Carathéodory's positive-semidefiniteness constraints. By scanning candidate points and accepting those whose residual characteristic function is certifiably consistent with a Gaussian location mixture, our algorithm attains the minimax lower bound in the known-variance setting and is computable in polynomial time.

artificial intelligence, characteristic function, confidence interval, (17 more...)

arXiv.org Machine Learning

2604.26992

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.45)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Analysis and Explainability of LLMs Via Evolutionary Methods

Gallagher, Shannon K., Rallapalli, Swati, Brooks, Tyler, Loughin, Chuck, Sezgin, Michele, Yurko, Ronald

arXiv.org Machine LearningMay-6-2026

Evolutionary methods have long been useful for analysis and explanation in genetics, biology, ecology, and related fields. In this work, we extend these methods to neural networks, specifically large language models (LLMs), to better analyze and explain relationships among models. We show how relating weights to genotypes and output text to phenotypes can improve our understanding of model lineage, important datasets, the roles of different model layers, and visualization of model relationships. We demonstrate this in a controlled experiment, where our estimated evolutionary trees reliably recover the topology of the ground-truth training tree. We further identify the most important weight layers according to weight differences and show through phenotypic experiments that one training dataset appears to contribute more useful information than the others. Finally, we generate an unsupervised evolutionary tree of black-box foundation models. Throughout, we provide visualizations that support a clearer understanding of evolutionary relationships among LLMs.

large language model, machine learning, public release and unlimited distribution, (19 more...)

arXiv.org Machine Learning

2605.0293

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (0.55)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Information Theory and Statistical Learning

Gamal, Abbas El

arXiv.org Machine LearningMay-6-2026

This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of {\em Cover and Thomas's Elements of Information Theory}, posted with permission from Wiley. The table of contents EIT-3 ToC of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu Learning and information theory intersect in both model training and the characterization of fundamental performance limits. This manuscript provides a concise and accessible treatment of the first intersection, requiring only basic background in information theory and statistics at the senior undergraduate or first-year graduate level. End-of-chapter exercises make the material well suited for classroom use as well as self-study. The chapter focuses on the role of divergence measures in model training, with examples ranging from linear and logistic regression to autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and score-based models. It introduces the evidence lower bound (ELBO), $f$\!-divergences, and the Fisher divergence. In particular, the treatment of the generative diffusion model provides a more systematic and explicit derivation than is typical in the literature.

artificial intelligence, fdata, machine learning, (17 more...)

arXiv.org Machine Learning

2605.02989

Country:

Europe (0.46)
North America > United States > California > Santa Clara County > Palo Alto (0.24)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Bayesian inference with sources of uncertainty: from confidence modelling to sparse estimation

Rosa, Rafael Mouallem, Arbel, Julyan, Nguyen, Hien Duy

arXiv.org Machine LearningMay-6-2026

We introduce a general framework that extends Bayesian inference by allowing the researcher to explicitly encode confidence in each source of uncertainty within the model. This mechanism provides a new handle for model design and regularisation control. Building on this framework, we develop a general approach for inducing sparsity in statistical models and illustrate its use in linear and logistic regression, as well as in Bayesian neural networks.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2605.03134

Country:

Europe (0.28)
Asia > Japan (0.28)

Genre: Research Report (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Conformalized Percentile Interval: Finite Sample Validity and Improved Conditional Performance

Zou, Ran, Zhu, Wanrong, Nan, Bin

arXiv.org Machine LearningMay-6-2026

Conformal prediction provides distribution-free predictive intervals with finite-sample marginal coverage. However, achieving conditional validity and interval efficiency (in terms of short interval length) remains challenging, particularly in complex settings with heteroskedasticity, skewed responses, or estimation errors. We propose a conformal-style calibration method for responses obtained by the probability integral transform (PIT) of the conditional cumulative distribution function (CDF) estimated via neural networks to construct a finite-sample-adjusted percentile interval with the shortest length determined by the estimated conditional CDF. Calibrating in PIT space is effective because PIT values are asymptotically feature-independent when the CDF estimator is accurate, which mitigates feature-dependent miscoverage and improves conditional calibration. On the other hand, our percentile calibration adapts to the empirical PIT distribution, which is robust against a possibly imperfect estimation of the conditional CDF. We prove the finite-sample marginal coverage property of the proposed method and show its asymptotic conditional coverage under mild consistency conditions. Experiments on diverse synthetic and real-world benchmarks demonstrate better conditional calibration and substantially shorter intervals than existing methods.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

2605.03233

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Partially Observed Structural Causal Models

Orujlu, Turan, Matelsky, Jordan, Butz, Martin V., Wu, Charley M., Kording, Konrad P.

arXiv.org Machine LearningMay-6-2026

Here we introduce Partially Observed Structural Causal Models (POSCMs) that formalize causal systems where latent contexts co-determine both the interaction structure and downstream mechanisms on observed variables. POSCMs provide an extension of structural causal models (SCMs), as a self-contained causal modeling framework for endogenous graphs, allowing for an intervention hierarchy spanning node- and edge-level context and endogenous variable interventions. To enable surgical edge interventions, we adopt a Kolmogorov-Arnold-Sprecher edge-functional decomposition, an existence theorem for representing each node mechanism as a sum of univariate functions of its parents, yielding an explicit parametrization of dyadic functional contributions. We provide an identifiability theory that clarifies which intervention families would suffice to disentangle structure formation from mechanisms. We empirically validate these predictions in a biophysically detailed virtual human retina simulator, constructing intervention protocols that (i) reproduce the non-identifiability predicted when context is latent and no context-level interventions are available, (ii) exhibit structure-mechanism confounding under latent edges when only node interventions are observed, and (iii) recover synaptic input-output relationships via targeted node interventions, consistent with our positive kernel identifiability result. Our work generalizes SCMs in a way that allows it to work in a world closer to the one we live in.

arXiv.org Machine Learning

2605.03268

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.28)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

Add feedback

On the Spectral Structure and Objective Equivalence of Orthogonal Multilabel Fisher Discriminants

Keith-Norambuena, Brian, Bekios-Calfa, Juan

arXiv.org Machine LearningMay-6-2026

We provide a unified theoretical analysis of Linear Discriminant Analysis with simultaneous multilabel scatter matrix formulations and Stiefel orthogonality constraints. Our contributions span both algebraic structure and statistical guarantees. On the algebraic side, we characterize the rank of the multilabel between-class scatter matrix, showing that the effective discriminant dimensionality can strictly exceed the classical single-label bound of $C-1$; we establish a multilabel partition of variance and prove that all four Fisher objectives are equivalent under the $W^\top S_t^{ML} W = I_r$ constraint while characterizing their divergence under the Stiefel constraint; and we prove a two-sided label-distance preservation bound relating projected distances to Hamming distances in label space. On the statistical side, we establish a finite-sample $O(k_{\max}\sqrt{d\log d/n}/gap_r)$ bound on the subspace estimation error under sub-Gaussian noise with a matching $Ω(σ^2 d/(n\,gap_r))$ minimax lower bound, establishing a near-minimax-optimal rate (matching up to logarithmic and $k_{\max}$ factors) for multilabel discriminant subspace estimation. We further provide high-probability distance concentration, robustness guarantees under label interactions, and a regularization analysis preserving the spectral structure when $d \gg n$. All results are verified numerically on synthetic data generated from the linear label-effect model, covering both the algebraic identities and the multilabel-specific quantities ($k_{\max}$, $κ(S_t^{ML})$, $\|Γ/n\|_2$, $Δ_r$) that govern the statistical bounds. The numerical experiments are designed as a sanity check for the theorems rather than as an empirical benchmark; evaluation on real multilabel datasets is left to future work targeting application-oriented venues.

data mining, machine learning, smlt, (19 more...)

arXiv.org Machine Learning

2605.03283

Country: North America > United States (0.67)

Genre: Research Report (0.49)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Imbalanced Classification under Capacity Constraints

Fraiman, Daniel, Fraiman, Ricardo

arXiv.org Machine LearningMay-6-2026

In many classification settings, the class of primary interest is underrepresented, leading to imbalanced data problems that arise in applications such as rare disease detection and fraud identification. In these contexts, identifying a potential positive instance typically triggers costly follow-up actions, such as medical imaging or detailed transaction inspection, which are subject to limited operational capacity. Motivated by this setting, we consider classification problems where data may arrive sequentially and decisions must be made under constraints on the number of instances that can be selected for further analysis. We propose a classification framework that explicitly controls the rate of positive predictions, enforcing a user-defined bound on the proportion of observations classified as belonging to the minority class while maximizing detection performance. The approach can be implemented using standard learning methods and naturally extends to online settings, where decisions are taken in real time. We show that incorporating capacity constraints leads to substantial improvements over classical approaches, including resampling techniques such as SMOTE, which do not directly control the selection rate.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

2605.03289

Country: South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.40)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback