AITopics | Genre

Collaborating Authors

Genre

'We're up against forces that have all the money in the world': Erin Brockovich on her battle against AI datacentres

The GuardianJun-29-2026, 04:00:28 GMT

'We're up against forces that have all the money in the world': Erin Brockovich on her battle against AI datacentres In 1993, she squeezed a $333m settlement from a Californian energy company in a scandal over contaminated water. Three decades later, she has a new target in her sights - and it's global When Erin Brockovich woke to find 30 emails from people from the same town, she realised something was going on. People email Brockovich all the time because of what happened in 1993, when she was instrumental in suing Pacific Gas and Electric Company (PG&E) on behalf of residents of the town of Hinkley, California, whose groundwater had been contaminated. The case resulted in a settlement of $333m - then the largest ever payout for a direct-action lawsuit. When she was immortalised by Julia Roberts in the 2000 film Erin Brockovich, she became the hero we didn't know we needed, a modern day Joan of Arc.

artificial intelligence, brockovich, information management, (16 more...)

The Guardian

Country: North America > United States > California (0.49)

Genre: Personal (0.67)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Energy > Power Industry (0.90)
(2 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (0.70)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.60)

Add feedback

Annealed Entropic Allocation for Ranking and Selection

Fei, Xin, Branke, Juergen

arXiv.org Machine LearningJun-29-2026

We propose annealed entropic allocation, an adaptive sampling policy based on an annealed, weighted soft-min formulation of static budget allocation. We replace the maximin large-deviation rate objective with a weighted log-sum-exp surrogate that blends challenger-specific pairwise scores through soft-min weights, avoiding hard switching when several challengers are nearly active. To capture tail behavior beyond the leading exponent, the surrogate incorporates saddlepoint prefactors from refined pairwise tail asymptotics. Because these corrections are subexponential, decreasing the annealing temperature with the budget preserves the same first-order target allocation. For the static problem, we prove uniform convergence to the hard minimum, concentration of soft-min weights on active challengers, and continuity of the induced target-allocation map under fixed weights. Experiments show that the proposed methods are consistently competitive: the no-saddlepoint ablation performs best in symmetric Gaussian and exponential slippage settings, while saddlepoint weighting can help in heterogeneous or asymmetric cases.

allocation, artificial intelligence, challenger, (15 more...)

arXiv.org Machine Learning

2606.11347

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

Adversarial Contamination Meets Hard Thresholding: An Iterative Algorithm with Signal Adaptivity and Minimax Optimality

Liu, Shixiang, Yang, Hanming

arXiv.org Machine LearningJun-29-2026

Pervasive data contamination -- stemming from measurement errors, outliers, or adversarial corruption -- has motivated the development of robust statistical methods. In this context, we propose a two-stage Adversarial Contamination-resistant Iterative Hard Thresholding (AC-IHT) algorithm for high-dimensional regression with contamination. Our nonconvex algorithm achieves minimax near-optimal (up to logarithmic terms) estimation by iteratively updating the coefficient vector and the contamination vector with different thresholding scales. We further demonstrate that our AC-IHT estimator is signal-adaptive: under proper signal conditions, it adaptively attains a sharper estimation rate and more accurate support recovery. Moreover, it enjoys the strong oracle property, laying a theoretical foundation for asymptotic inference. Numerical experiments confirm its superior finite-sample performance. Finally, we discuss theoretical extensions of the proposed procedure to generalized linear models and to heavy-tailed noise settings.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2606.27685

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)

Add feedback

VGB for Masked Diffusion Model: Efficient Test-time Scaling for Reward Satisfaction and Sample Editing

Jeon, Kijung, Vuong, Thuy-Duong, Tao, Molei

arXiv.org Machine LearningJun-29-2026

Inference-time scaling is a promising paradigm to improve generative models, especially when outputs must satisfy structural constraints or optimize downstream rewards. We consider Masked Diffusion Model (MDM) and introduce MDM-VGB, a discrete diffusion sampler that augments unmasking generation with theoretically principled reward-guided remasking. Inspired by the recent success of the classical Jerrum-Sinclair backtracking Markov chain in reward-tilted generation, MDM-VGB extends the backtracking random walk from a fixed prefix tree to a masked-state graph, allowing tokens to be unmasked and remasked at arbitrary positions. The resulting sampler favors unmasking and remasking moves that lead to higher-value partial configurations, enabling both effective high-reward generation and efficient repair of low-reward samples. We prove that MDM-VGB is robust to process-verifier noise and achieves quadratic complexity, while popular test-time heuristics such as best-of-$N$ can incur exponential complexity due to error accumulation. Our theoretical findings are corroborated by strong empirical performance, particularly on popular constraint-satisfaction and scientific benchmarks such as Sudoku and QM9.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2606.28301

Country: Asia (0.27)

Genre: Research Report (0.63)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.45)

Add feedback

Dangerous Liaisons of Convex Learning and Non-Affine Aggregation

Boudou, Thomas, Bars, Batiste Le, Gupta, Nirupam, Bellet, Aurélien

arXiv.org Machine LearningJun-29-2026

Last-iterate convergence and generalization guarantees in first-order convex learning hinge on the monotonicity of the update operator. While linear averaging preserves the monotonicity of gradient updates, this property is often violated when gradients are aggregated non-affinely, as in modern pipelines enforcing constraints like adaptivity, privacy, robustness or fairness. Whether it is possible to design non-affine aggregation rules that maintain monotonicity has remained an open question. We answer this question negatively: we prove that the monotonicity of aggregated gradients is preserved if and only if the aggregation rule is positively affine. Consequently, non-affine aggregation prevents steady convergence and substantially degrade algorithmic stability. We quantify these drawbacks and propose a path forward by identifying sufficient conditions under which monotonicity can be restored. Our results provide a unified theoretical framework explaining the disparate failure modes observed in modern learning systems.

aggregation rule, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2606.28123

Country: Europe > France (0.28)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Disentangling Continuous-Time Latent Dynamics: Identifiability of Latent SDEs via Diffusion Shifts

Wang, Yuanyuan, Wang, Wenjie, Li, Haoxuan, Gong, Mingming, Zhang, Kun

arXiv.org Machine LearningJun-29-2026

Causal representation learning for time series has developed strong identifiability results in discrete-time latent causal models, but identifiability in continuous-time latent stochastic differential equation (SDE) models remains largely open. We address this gap using environment-induced shifts in diffusion covariance. We study additive-noise latent SDEs observed through an unknown nonlinear diffeomorphism, with shared drift but environment-specific diffusion covariance. We show that two diagonal diffusion regimes with pairwise distinct coordinate-wise variance ratios identify the latent coordinates up to permutation and scaling, without any sparsity assumption on the drift. We first prove this result for linear Ornstein--Uhlenbeck systems and then extend it to general additive-noise latent SDEs. Under mild smoothness, the instantaneous drift-Jacobian causal graph is identifiable up to the same permutation. We propose a two-stage estimator for latent disentanglement and optional graph recovery; experiments on synthetic systems confirm the predicted identifiability boundary, and an application to Hardanger Bridge monitoring data illustrates the approach on real sensor trajectories.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Machine Learning

2606.28228

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)

Add feedback

The Decision Geometry of Covariance Estimation for the Global Minimum-Variance Portfolio under Heavy Tails

Fonseca, Xavier

arXiv.org Machine LearningJun-29-2026

The global minimum-variance portfolio (GMVP) is the canonical decision built from an estimated covariance matrix, yet covariance estimators are universally evaluated by matrix-norm loss, which is not the object the decision depends on. We characterise exactly how covariance-estimation error maps into GMVP suboptimality. We prove an exact regret identity and a non-asymptotic bound showing decision regret depends on the estimation error only through its action on the portfolio weights, scaled by portfolio concentration and the conditioning of the true covariance. From this we derive the decision geometry: GMVP regret is invariant to a (p-1)-dimensional projection of the p^2-dimensional error matrix, with invariance to the covariance-scale direction as an exact special case. We then apply the framework to heavy-tailed returns (tail index kappa in (2,4)), establishing the regret convergence rate implied by the centred operator-norm rate, and confirm the theory on a skew-t/t-copula simulation design with pre-registered analysis. The decision-focused advantage is a sharper constant and a concentration discount rather than a faster rate; we report an honest high-conditioning boundary of the rate prediction. The results complement recent decision-focused learning approaches by supplying the exact estimation geometry and consistency theory they lack.

artificial intelligence, machine learning, portfolio, (16 more...)

arXiv.org Machine Learning

2606.27462

Country:

North America (0.46)
Europe (0.46)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Conformal Bayes under Label Shift: Post-Hoc Calibration vs. In-Training Adaptation

Choi, Seungjin

arXiv.org Machine LearningJun-29-2026

Conformal Bayes combines Bayesian posterior predictives with conformal calibration to produce prediction sets that are both statistically valid and geometrically efficient. We study conformal Bayes under label shift from a unified perspective, identifying two complementary approaches that restore nominal target-domain coverage through importance-weighted conformal calibration but operate through independent mechanisms. \emph{Post-hoc calibration} tilts the posterior predictive toward the target domain and corrects the conformal threshold via an importance-weighted quantile, leaving the parameter posterior unchanged. \emph{In-training adaptation} tilts the parameter posterior itself to the target domain, producing a corrected predictive whose highest predictive density region serves as the highest predictive density (HPD)-based prediction set under the fitted target predictive; efficiency is model-dependent and does not imply finite-sample conditional optimality. Two controlled experiments isolate the regime-dependence of each strategy: in the low-dimensional, well-estimated regime Strategy~A produces the narrowest valid intervals, while in the high-dimensional, underdetermined regime Strategy~B achieves up to $43\%$ width reduction at unchanged coverage, under the stated source-sampling and label-shift assumptions.

artificial intelligence, dtr, machine learning, (16 more...)

arXiv.org Machine Learning

2606.11865

Country: Asia > South Korea (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Benchmarking on Tasks That Matter: Dataset Selection for Preserving Model Rankings

Gusev, Rostislav, Zaytsev, Alexey

arXiv.org Machine LearningJun-29-2026

Benchmarks of machine learning models often include many datasets, making evaluation expensive. For efficiency, it is preferable to perform evaluations on small, representative datasets instead. The selection of such subsets typically relies on heuristics and is rarely analyzed for the robustness of the resulting model rankings. We introduce a framework to perform the task of selecting datasets subsets with an evaluation of how different selection strategies preserve the global model rankings. Our framework includes bootstrap aggregation, which provides valid confidence intervals, allowing a principled comparison of selection strategies. We consider clustering, design criteria (A/D-optimality), random baselines, and greedy farthest-first (FAFI). For the latter, we derive upper bounds on selection quality in terms of ranking errors as a function of the number of selected datasets. Empirically, in time series classification (TSC, 112 datasets) and in a supplementary natural language processing benchmark derived from MTEB (57 tasks), several selection strategies improve rank preservation compared with random subsets, including simple FAFI. In contrast, in recommender systems (30 datasets), the improvement of strategies over random selection is small and typically statistically insignificant. For TSC, our best-performing strategy achieves a Spearman correlation of 0.95 with the full benchmark model rankings using only five selected datasets. Additional experiments indicate that the effectiveness of selection approaches depends on both the quality of dataset representations and the scale of the benchmarking regime.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

doi: 10.1145/3770855.3817569

2606.27997

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Surprises in Proper Positive-Only Learning

Ben-David, Shai, Mansouri, Farnam, Mehrotra, Anay, Zampetakis, Manolis

arXiv.org Machine LearningJun-29-2026

Binary classification from positive-only samples is a variant of PAC learning in which the learner receives i.i.d. samples from the positive region of an unknown target concept, but is evaluated under the original distribution (which places mass on both positive and negative regions). This model dates back to Natarajan [1987, STOC], and the characterization of improper learning is well-known -- it even appears in textbooks. The characterization of proper positive-only learning, however, has long remained open. In this work, we revisit and settle this question: a concept class is properly learnable from positive-only samples if and only if it has finite VC dimension and satisfies a new combinatorial condition, which we call uniform exterior separability. Together with several separation results, this characterization reveals a surprisingly rich landscape that differs sharply from standard PAC learning: proper and improper learning are separated, randomized and deterministic proper learning are separated, there are classes for which no ERM is a learner, and finite VC dimension does not suffice even for non-uniform learning. Along the way, we introduce new combinatorial dimensions that we believe can be of broader interest in learning theory.

artificial intelligence, learner, machine learning, (17 more...)

arXiv.org Machine Learning

2606.28309

Genre:

Research Report (0.64)
Instructional Material (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback