AITopics | skl

Collaborating Authors

skl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Conformal Prediction for Dyadic Regression Under Complex Missingness

Lunde, Robert, Yang, Minjie, Levina, Elizaveta, Zhu, Ji

arXiv.org Machine LearningJun-18-2026

We develop a framework for conformal prediction in dyadic regression problems under complex missingness mechanisms. At the theoretical level, we develop general technical tools for establishing finite-sample validity of conformal prediction under distributional invariance conditions weaker than exchangeability. A key result handles the case where the sample itself is a random subset of the index set, a setting not covered by existing theory, via a novel bijection argument that constructs an explicit measure-preserving correspondence between events. In addition, we propose conformal prediction procedures for jointly exchangeable arrays, including full conformal, split conformal, a row-column approach exploiting similarities within rows and columns, and a selective conformal procedure achieving mask-conditional validity. For missing elements, we establish asymptotic validity of a weighted conformal procedure under a nonparametric graphon model for the missingness mechanism. We further establish conditional validity results for both continuous and discrete responses; to the best of our knowledge, this is the first formal proof of asymptotic conditional validity for weighted conformal prediction under a missing-not-at-random assumption. The proposed methods are illustrated on synthetic and real network data.

data mining, machine learning, prediction, (19 more...)

arXiv.org Machine Learning

2606.11136

Country: North America > United States (0.45)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.46)
Information Technology (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Add feedback

445e24b5f22cacb9d51a837c10e91a3f-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 15:44:58 GMT

artificial intelligence, generalization error, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Finite-TimeRegretofThompsonSampling AlgorithmsforExponentialFamilyMulti-Armed Bandits

Neural Information Processing SystemsFeb-13-2026, 00:54:47 GMT

Weprovideatightregretanalysis forExpTS, whichsimultaneously yields both the finite-timeregret bound as well as the asymptotic regret bound.

artificial intelligence, loglogt, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

DistiLLM: Towards Streamlined Distillation for Large Language Models

Ko, Jongwoo, Kim, Sungnyun, Chen, Tianyi, Yun, Se-Young

arXiv.org Artificial IntelligenceFeb-6-2024

Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller student model, reducing its inference cost and memory footprint while preserving model capabilities. However, current KD methods for auto-regressive sequence models (e.g., large language models) suffer from missing a standardized objective function. Moreover, the recent use of student-generated outputs to address training-inference mismatches has significantly escalated computational costs. To tackle these issues, we introduce DistiLLM, a more effective and efficient KD framework for auto-regressive language models. DistiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs. Extensive experiments, including instruction-following tasks, demonstrate the effectiveness of DistiLLM in building high-performing student models while achieving up to 4.3$\times$ speedup compared to recent KD methods.

isti llm, streamlined distillation, student model, (14 more...)

arXiv.org Artificial Intelligence

2402.03898

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts > Hampden County > West Springfield (0.04)
North America > United States > Massachusetts > Hampden County > Springfield (0.04)
(15 more...)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm?

He, Haiyun, Aminian, Gholamali, Bu, Yuheng, Rodrigues, Miguel, Tan, Vincent Y. F.

arXiv.org Artificial IntelligenceJun-15-2023

We provide an exact characterization of the expected generalization error (gen-error) for semi-supervised learning (SSL) with pseudo-labeling via the Gibbs algorithm. The gen-error is expressed in terms of the symmetrized KL information between the output hypothesis, the pseudo-labeled dataset, and the labeled dataset. Distribution-free upper and lower bounds on the gen-error can also be obtained. Our findings offer new insights that the generalization performance of SSL with pseudo-labeling is affected not only by the information between the output hypothesis and input training data but also by the information {\em shared} between the {\em labeled} and {\em pseudo-labeled} data samples. This serves as a guideline to choose an appropriate pseudo-labeling method from a given family of methods. To deepen our understanding, we further explore two examples -- mean estimation and logistic regression. In particular, we analyze how the ratio of the number of unlabeled to labeled data $\lambda$ affects the gen-error under both scenarios. As $\lambda$ increases, the gen-error for mean estimation decreases and then saturates at a value larger than when all the samples are labeled, and the gap can be quantified {\em exactly} with our analysis, and is dependent on the \emph{cross-covariance} between the labeled and pseudo-labeled data samples. For logistic regression, the gen-error and the variance component of the excess risk also decrease as $\lambda$ increases.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.08188

Country:

Asia > Singapore (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Add feedback

On the Generalization Error of Meta Learning for the Gibbs Algorithm

Bu, Yuheng, Tetali, Harsha Vardhan, Aminian, Gholamali, Rodrigues, Miguel, Wornell, Gregory

arXiv.org Artificial IntelligenceApr-27-2023

We analyze the generalization ability of joint-training meta learning algorithms via the Gibbs algorithm. Our exact characterization of the expected meta generalization error for the meta Gibbs algorithm is based on symmetrized KL information, which measures the dependence between all meta-training datasets and the output parameters, including task-specific and meta parameters. Additionally, we derive an exact characterization of the meta generalization error for the super-task Gibbs algorithm, in terms of conditional symmetrized KL information within the super-sample and super-task framework introduced in Steinke and Zakynthinou (2020) and Hellstrom and Durisi (2022) respectively. Our results also enable us to provide novel distribution-free generalization error upper bounds for these Gibbs algorithms applicable to meta learning.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2304.14332

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Characterizing and Understanding the Generalization Error of Transfer Learning with Gibbs Algorithm

Bu, Yuheng, Aminian, Gholamali, Toni, Laura, Rodrigues, Miguel, Wornell, Gregory

arXiv.org Machine LearningNov-2-2021

We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular transfer learning approaches, $\alpha$-weighted-ERM and two-stage-ERM. Our key result is an exact characterization of the generalization behaviour using the conditional symmetrized KL information between the output hypothesis and the target training samples given the source samples. Our results can also be applied to provide novel distribution-free generalization error upper bounds on these two aforementioned Gibbs algorithms. Our approach is versatile, as it also characterizes the generalization errors and excess risks of these two Gibbs algorithms in the asymptotic regime, where they converge to the $\alpha$-weighted-ERM and two-stage-ERM, respectively. Based on our theoretical results, we show that the benefits of transfer learning can be viewed as a bias-variance trade-off, with the bias induced by the source distribution and the variance induced by the lack of target samples. We believe this viewpoint can guide the choice of transfer learning algorithms in practice.

algorithm, generalization error, gibbs algorithm, (15 more...)

arXiv.org Machine Learning

2111.01635

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)

Add feedback

An Easy to Interpret Diagnostic for Approximate Inference: Symmetric Divergence Over Simulations

Domke, Justin

arXiv.org Machine LearningFeb-25-2021

It is important to estimate the errors of probabilistic inference algorithms. Existing diagnostics for Markov chain Monte Carlo methods assume inference is asymptotically exact, and are not appropriate for approximate methods like variational inference or Laplace's method. This paper introduces a diagnostic based on repeatedly simulating datasets from the prior and performing inference on each. The central observation is that it is possible to estimate a symmetric KL-divergence defined over these simulations.

inference, inference method, laplace, (15 more...)

arXiv.org Machine Learning

2103.0103

Country:

North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Surgery (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Accuracy-Memory Tradeoffs and Phase Transitions in Belief Propagation

Jain, Vishesh, Koehler, Frederic, Liu, Jingbo, Mossel, Elchanan

arXiv.org Machine LearningMay-24-2019

The analysis of Belief Propagation and other algorithms for the {\em reconstruction problem} plays a key role in the analysis of community detection in inference on graphs, phylogenetic reconstruction in bioinformatics, and the cavity method in statistical physics. We prove a conjecture of Evans, Kenyon, Peres, and Schulman (2000) which states that any bounded memory message passing algorithm is statistically much weaker than Belief Propagation for the reconstruction problem. More formally, any recursive algorithm with bounded memory for the reconstruction problem on the trees with the binary symmetric channel has a phase transition strictly below the Belief Propagation threshold, also known as the Kesten-Stigum bound. The proof combines in novel fashion tools from recursive reconstruction, information theory, and optimal transport, and also establishes an asymptotic normality result for BP and other message-passing algorithms near the critical threshold.

accuracy-memory tradeoff and phase transition, artificial intelligence, belief revision, (15 more...)

arXiv.org Machine Learning

1905.10031

Country:

North America > United States > Massachusetts (0.14)
Europe (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.88)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)

Add feedback