AITopics

2606.29732

Country: North America > United States (0.67)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Bhattacharyya, Aabesh, Ding, Tiffany, Barber, Rina Foygel

Conformal Prediction with Macro-Coverage Guarantees

arXiv.org Machine LearningJun-30-2026

Prediction sets should have high coverage to be useful, but some coverage notions are more practically relevant than others. In the classification setting, class-conditional coverage requires that the prediction set (i.e., the set of candidate labels for a new test point) must achieve the target accuracy level within each class, which may be challenging to satisfy when many classes are rare and have few calibration points. At the other extreme, marginal coverage requires only that coverage holds on average over the distribution of all classes, which can lead to low-probability labels being essentially ignored. To find a middle ground, recent work has introduced macro-coverage, defined as the unweighted average of class-conditional coverages. Macro-coverage offers a compromise between marginal coverage and class-conditional coverage that is particularly appropriate for long-tailed settings. In this work, we show that label-weighted conformal prediction can be used to produce prediction sets with a finite-sample macro-coverage guarantee, and more generally a guarantee on a family of generalized macro-coverage objectives that aggregate coverage at the level of arbitrary class groupings and take a weighted average. We further characterize the form of the smallest prediction sets satisfying a given generalized macro-coverage objective and propose a corresponding conformal score function. We validate our theoretical results on two large-scale image classification datasets.

artificial intelligence, machine learning, prediction, (18 more...)

2606.28598

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

arXiv.org Machine LearningJun-30-2026

A Sieve-Accelerated Quadrature Method for Exact Privacy Accounting in the 2020 U.S. Decennial Census

Su, Buxin, Su, Weijie, Wang, Chendi

In 2020, the U.S. Census Bureau adopted differential privacy for the Decennial Census by injecting integer-valued Gaussian noise into published census tabulations. Exactly evaluating the privacy guarantees of these data releases would enable the Bureau to determine the absolute minimum noise required to satisfy a given privacy budget, preventing the injection of unnecessary excess noise and thereby substantially enhancing the statistical utility of the data for downstream applications such as federal funding allocation and political redistricting. In this paper, we introduce a computationally efficient and mathematically rigorous quadrature method to evaluate the exact privacy profile of practical, large-scale census releases under the composition of heterogeneous discrete Gaussian mechanisms. Mathematically, this problem reduces to evaluating the tail probabilities of high-dimensional convolutions of integer-valued random variables sampled from heterogeneous discrete Gaussian distributions under exceptionally stringent numerical error tolerances (e.g., $10^{-35}$). By recasting the exact privacy accounting as a numerical integration problem via the discrete Fourier transform, we explicitly exploit the exponential convergence of the trapezoidal rule for complex analytic, periodic characteristic functions. Furthermore, to overcome the computational bottleneck of evaluating highly oscillatory integrands in high dimensions, we develop a sieve algorithm that identifies and prunes negligible quadrature nodes, accelerating the computation by three orders of magnitude. Taken together, these numerical innovations enable the first exact, assumption-free privacy accounting for the 2020 Census Demographic and Housing Characteristics File, achieving a 1,824-fold speedup over prior methods while maintaining census-mandated error tolerances.

artificial intelligence, data quality, machine learning, (15 more...)

2606.29835

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.50)

Industry:

Government > Regional Government > North America Government > United States Government (0.89)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (0.68)
Information Technology > Security & Privacy (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Liu, Shixiang, Yang, Hanming

Adversarial Contamination Meets Hard Thresholding: An Iterative Algorithm with Signal Adaptivity and Minimax Optimality

arXiv.org Machine LearningJun-29-2026

Pervasive data contamination -- stemming from measurement errors, outliers, or adversarial corruption -- has motivated the development of robust statistical methods. In this context, we propose a two-stage Adversarial Contamination-resistant Iterative Hard Thresholding (AC-IHT) algorithm for high-dimensional regression with contamination. Our nonconvex algorithm achieves minimax near-optimal (up to logarithmic terms) estimation by iteratively updating the coefficient vector and the contamination vector with different thresholding scales. We further demonstrate that our AC-IHT estimator is signal-adaptive: under proper signal conditions, it adaptively attains a sharper estimation rate and more accurate support recovery. Moreover, it enjoys the strong oracle property, laying a theoretical foundation for asymptotic inference. Numerical experiments confirm its superior finite-sample performance. Finally, we discuss theoretical extensions of the proposed procedure to generalized linear models and to heavy-tailed noise settings.

artificial intelligence, data mining, machine learning, (19 more...)

2606.27685

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)

Kim, Jung-hun, Grebennikova, Anna, Perchet, Vianney

Asymptotically Optimal Learning for Parametric Prophet Inequalities

arXiv.org Machine LearningJun-26-2026

We study learning in prophet inequalities with i.i.d. rewards drawn from an exponential-type parametric family with an unknown parameter $θ$, a class that includes exponential, Pareto, and bounded-support power-family distributions. We first characterize the optimal full-information asymptotic competitive ratio for this family. In the unbounded-support case, the limit is $ {\left(θ/({θ-c_+})\right)^{c_+/θ}}/ {Γ(1-c_+/θ)},$ while in the bounded-support case, the limit is $1$. We then propose a confidence-based dynamic-programming policy for online learning. By exploiting the explicit parametric structure, the policy achieves the same optimal asymptotic competitive ratio using only online observations, without external offline samples. We further derive distribution-specific convergence rates for canonical examples. Finally, numerical experiments on synthetic instances illustrate the performance of our algorithm.

artificial intelligence, logn, machine learning, (18 more...)

2606.26893

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.40)

Neural Information Processing SystemsJun-23-2026, 11:48:16 GMT

Streaming Stochastic Submodular Maximization with On-Demand User Requests

We explore a novel problem in streaming submodular maximization, inspired by the dynamics of news-recommendation platforms. We consider a setting where users can visit a news website at any time, and upon each visit, the website must display up to k news items. User interactions are inherently stochastic: each news item presented to the user is consumed with a certain acceptance probability by the user, and each news item covers certain topics. Our goal is to design a streaming algorithm that maximizes the expected total topic coverage. To address this problem, we establish a connection to submodular maximization subject to a matroid constraint.

artificial intelligence, machine learning, natural language, (19 more...)

Country: Europe (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)
(2 more...)

Neural Information Processing SystemsJun-23-2026, 08:33:01 GMT

Geometric Algorithms for Neural Combinatorial Optimization with Constraints

Self-Supervised Learning (SSL) for Combinatorial Optimization (CO) is an emerging paradigm for solving combinatorial problems using neural networks. In this paper, we address a central challenge of SSL for CO: solving problems with discrete constraints. We design an end-to-end differentiable framework that enables us to solve discrete constrained optimization problems with neural networks. Concretely, we leverage algorithmic techniques from the literature on convex geometry and Carathéodory's theorem to decompose neural network outputs into convex combinations of polytope corners that correspond to feasible sets. This decomposition-based approach enables self-supervised training but also ensures efficient quality-preserving rounding of the neural net output into feasible solutions. Extensive experiments in cardinality-constrained optimization show that our approach can consistently outperform neural baselines. We further provide workedout examples of how our method can be applied beyond cardinality-constrained problems to a diverse set of combinatorial optimization tasks, including finding independent sets in graphs, and solving matroid-constrained problems.

artificial intelligence, machine learning, optimization problem, (15 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Transportation (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsJun-23-2026, 08:23:09 GMT

Uncertainty Quantification for Physics-Informed Neural Networks with Extended Fiducial Inference

Uncertainty quantification (UQ) in scientific machine learning is increasingly critical as neural networks are widely adopted to tackle complex problems across diverse scientific disciplines. For physics-informed neural networks (PINNs), a prominent model in scientific machine learning, uncertainty is typically quantified using Bayesian or dropout methods. However, both approaches suffer from a fundamental limitation: the prior distribution or dropout rate required to construct honest confidence sets cannot be determined without additional information. In this paper, we propose a novel method within the framework of extended fiducial inference (EFI) to provide rigorous uncertainty quantification for PINNs. The proposed method leverages a narrow-neck hyper-network to learn the parameters of the PINN and quantify their uncertainty based on imputed random errors in the observations. This approach overcomes the limitations of Bayesian and dropout methods, enabling the construction of honest confidence sets based solely on observed data. This advancement represents a significant breakthrough for PINNs, greatly enhancing their reliability, interpretability, and applicability to real-world scientific and engineering challenges. Moreover, it establishes a new theoretical framework for EFI, extending its application to large-scale models, eliminating the need for sparse hyper-networks, and significantly improving the automaticity and robustness of statistical inference.

artificial intelligence, deep learning, machine learning, (18 more...)

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Neural Information Processing SystemsJun-23-2026, 06:27:31 GMT

AGeneralized Binary Tree Mechanism for Private Approximation of All-Pair Shortest Distances

We study the problem of approximating all-pair distances in a weighted undirected graph with differential privacy, introduced by Sealfon [Sea16]. Given a publicly known undirected graph, we treat the weights of edges as sensitive information, and two graphs are neighbors if their edge weights differ in one edge by at most one. We obtain efficient algorithms with significantly improved bounds on a broad class of graphs which we refer to as recursively separable. In particular, for any n-vertex Kh-minor-free graph, our algorithm achieve an additive error of eO(h(nW)1/3), where W represents the maximum edge weight; For grid graphs, the same algorithmic scheme achieve additive error of eO(n1/4 W). Our approach can be seen as a generalization of the celebrated binary tree mechanism for range queries, as releasing range queries is equivalent to computing all-pair distances on a path graph. In essence, our approach is based on generalizing the binary tree mechanism to graphs that are recursively separable. JL and ZZ have been supported by National Science Foundation of China under Grant No. 62472212 and the New Cornerstone Science Foundation. Supported in part by NSF award 2228995 JU's research was funded by the NSFCNS 2433628, Google Seed Fund grant, Google Research Scholar Award, Dean Research Seed Fund, and Rutgers Decanal Grant no.

artificial intelligence, machine learning, natural language, (19 more...)

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.87)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Neural Information Processing SystemsJun-23-2026, 06:27:09 GMT

New Parallel and Streaming Algorithms for Directed Densest Subgraph

Finding dense subgraphs is a fundamental problem with applications to community detection, clustering, and data mining. Our work focuses on finding approximate densest subgraphs in directed graphs in computational models for processing massive data. We consider two such models: Massively Parallel Computation (MPC) and semi-streaming. We show how to find a (2+ε)-approximation in O( logn) MPC rounds with sublinear memory per machine.

artificial intelligence, data mining, machine learning, (19 more...)

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.93)