AITopics

2512.14713

Country:

Asia > Middle East > Jordan (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Iowa (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Brie, cheddar, and other high-fat cheeses linked to lower dementia risk

Breakthroughs, discoveries, and DIY tips sent every weekday. It's been found in ancient human feces . The U.S. government stored 6.4 metric tons of it in mountains . And a big hunk of it played a major role in a presidential farewell party . While too much of the popular dairy product can spell tummy troubles and high cholesterol for some, new research suggests that eating more high-fat cheese and cream could be linked to a lower risk of developing dementia .

dementia, high-fat cheese, sara chodosh, (14 more...)

Popular Science

Country:

North America > United States (0.35)
Europe > Sweden (0.08)
Asia > Mongolia (0.05)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Health & Safety > School Nutrition (1.00)
Health & Medicine > Therapeutic Area > Neurology > Dementia (0.82)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.53)

Technology: Information Technology > Artificial Intelligence (0.36)

FOX NewsDec-17-2025, 13:34:26 GMT

Purdue to require AI competency for all undergrads as universities race to adapt

Purdue University introduces new AI working competency requirement for all undergraduate students at West Lafayette and Indianapolis campuses.

fox new show programming schedule, lifestyle real estate tech science, university, (8 more...)

FOX News

Country:

North America > United States > Indiana > Marion County > Indianapolis (0.25)
North America > United States > Minnesota (0.05)
North America > United States > California > Los Angeles County > Los Angeles (0.05)

Industry:

Leisure & Entertainment > Sports (1.00)
Media > News (0.92)
Education (0.90)
(2 more...)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.75)

BBC NewsDec-17-2025, 00:04:27 GMT

Essay cheating at universities an 'open secret'

A BBC investigation has uncovered claims that essay cheating remains widespread at UK universities despite the introduction of a law designed to stop it. Since April 2022, it has been illegal to provide essays for students in post-16 education in England. But so far there have been no prosecutions. The BBC has spoken to a former lecturer who describes essay cheating as an open secret and to a businessman who claims to have made millions from selling model answer essays to university students. Universities UK, which represents 141 institutions, said there were severe penalties for students caught submitting work that was not their own.

cheating, student, university, (13 more...)

BBC News

Country:

North America > United States (0.15)
North America > Central America (0.14)
Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.05)
(17 more...)

Industry:

Government > Regional Government > Europe Government > United Kingdom Government (0.95)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.70)
Education > Educational Setting (0.70)

Technology: Information Technology > Artificial Intelligence (1.00)

arXiv.org Machine LearningDec-17-2025

LLmFPCA-detect: LLM-powered Multivariate Functional PCA for Anomaly Detection in Sparse Longitudinal Texts

Dubey, Prasanjit, Guha, Aritra, Zhou, Zhengyi, Wu, Qiong, Huo, Xiaoming, Dubey, Paromita

Sparse longitudinal (SL) textual data arises when individuals generate text repeatedly over time (e.g., customer reviews, occasional social media posts, electronic medical records across visits), but the frequency and timing of observations vary across individuals. These complex textual data sets have immense potential to inform future policy and targeted recommendations. However, because SL text data lack dedicated methods and are noisy, heterogeneous, and prone to anomalies, detecting and inferring key patterns is challenging. We introduce LLmFPCA-detect, a flexible framework that pairs LLM-based text embeddings with functional data analysis to detect clusters and infer anomalies in large SL text datasets. First, LLmFPCA-detect embeds each piece of text into an application-specific numeric space using LLM prompts. Sparse multivariate functional principal component analysis (mFPCA) conducted in the numeric space forms the workhorse to recover primary population characteristics, and produces subject-level scores which, together with baseline static covariates, facilitate data segmentation, unsupervised anomaly detection and inference, and enable other downstream tasks. In particular, we leverage LLMs to perform dynamic keyword profiling guided by the data segments and anomalies discovered by LLmFPCA-detect, and we show that cluster-specific functional PC scores from LLmFPCA-detect, used as features in existing pipelines, help boost prediction performance. We support the stability of LLmFPCA-detect with experiments and evaluate it on two different applications using public datasets, Amazon customer-review trajectories, and Wikipedia talk-page comment streams, demonstrating utility across domains and outperforming state-of-the-art baselines.

anomaly, llmfpca-detect, trajectory, (15 more...)

2512.14604

Country:

North America > United States > California (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report > Experimental Study (0.68)

Industry:

Information Technology > Services (0.66)
Health & Medicine > Health Care Technology > Medical Record (0.54)
Health & Medicine > Therapeutic Area (0.46)
Education > Educational Setting (0.45)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Pareek, Divyansh, Oh, Sewoong, Du, Simon S.

Understanding the Gain from Data Filtering in Multimodal Contrastive Learning

arXiv.org Machine LearningDec-17-2025

The success of modern multimodal representation learning relies on internet-scale datasets. Due to the low quality of a large fraction of raw web data, data curation has become a critical step in the training pipeline. Filtering using a trained model (i.e., teacher-based filtering) has emerged as a successful solution, leveraging a pre-trained model to compute quality scores. To explain the empirical success of teacher-based filtering, we characterize the performance of filtered contrastive learning under the standard bimodal data generation model. Denoting $η\in(0,1]$ as the fraction of data with correctly matched modalities among $n$ paired samples, we utilize a linear contrastive learning setup to show a provable benefit of data filtering: $(i)$ the error without filtering is upper and lower bounded by $\frac{1}{η\sqrt{n}}$, and $(ii)$ the error with teacher-based filtering is upper bounded by $\frac{1}{\sqrt{ηn}}$ in the large $η$ regime, and by $\frac{1}{\sqrt{n}}$ in the small $η$ regime.

contrastive learning, denote, matrix, (16 more...)

2512.1423

Country:

North America > United States > Washington > King County > Seattle (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Education (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)
Information Technology > Data Science > Data Quality > Data Cleaning (0.34)

Nguyen-Trung, Nghia, Tran-Dinh, Quoc

A Class of Accelerated Fixed-Point-Based Methods with Delayed Inexact Oracles and Its Applications

In this paper, we develop a novel accelerated fixed-point-based framework using delayed inexact oracles to approximate a fixed point of a nonexpansive operator (or equivalently, a root of a co-coercive operator), a central problem in scientific computing. Our approach leverages both Nesterov's acceleration technique and the Krasnosel'skii-Mann (KM) iteration, while accounting for delayed inexact oracles, a key mechanism in asynchronous algorithms. We also introduce a unified approximate error condition for delayed inexact oracles, which can cover various practical scenarios. Under mild conditions and appropriate parameter updates, we establish both $\mathcal{O}(1/k^2)$ non-asymptotic and $o(1/k^2)$ asymptotic convergence rates in expectation for the squared norm of residual. Our rate significantly improves the $\mathcal{O}(1/k)$ rates in classical KM-type methods, including their asynchronous variants. We also establish $o(1/k^2)$ almost sure convergence rates and the almost sure convergence of iterates to a solution of the problem. Within our framework, we instantiate three settings for the underlying operator: (i) a deterministic universal delayed oracle; (ii) a stochastic delayed oracle; and (iii) a finite-sum structure with asynchronous updates. For each case, we instantiate our framework to obtain a concrete algorithmic variant for which our convergence results still apply, and whose iteration complexity depends linearly on the maximum delay. Finally, we verify our algorithms and theoretical results through two numerical examples on both matrix game and shallow neural network training problems.

accelerated fixed-point-based method, algorithm, iteration, (15 more...)

2512.13547

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
North America > United States > Arizona (0.04)
(2 more...)

Genre: Research Report (0.81)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Learning under Distributional Drift: Reproducibility as an Intrinsic Statistical Resource

Zaichyk, Sofiya

Statistical learning under distributional drift remains insufficiently characterized: when each observation alters the data-generating law, classical generalization bounds can collapse. We introduce a new statistical primitive, the reproducibility budget $C_T$, which quantifies a system's finite capacity for statistical reproducibility - the extent to which its sampling process can remain governed by a consistent underlying distribution in the presence of both exogenous change and endogenous feedback. Formally, $C_T$ is defined as the cumulative Fisher-Rao path length of the coupled learner-environment evolution, measuring the total distributional motion accumulated during learning. From this construct we derive a drift-feedback generalization bound of order $O(T^{-1/2} + C_T/T)$, and we prove a matching minimax lower bound showing that this rate is minimax-optimal. Consequently, the results establish a reproducibility speed limit: no algorithm can achieve smaller worst-case generalization error than that imposed by the average Fisher-Rao drift rate $C_T/T$ of the data-generating process. The framework situates exogenous drift, adaptive data analysis, and performative prediction within a common geometric structure, with $C_T$ emerging as the intrinsic quantity measuring distributional motion across these settings.

learner, learning, trajectory, (14 more...)

2512.13506

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Rubinstein, Ittai, Hopkins, Samuel B.

On the Accuracy of Newton Step and Influence Function Data Attributions

Data attribution aims to explain model predictions by estimating how they would change if certain training points were removed, and is used in a wide range of applications, from interpretability and credit assignment to unlearning and privacy. Even in the relatively simple case of linear regressions, existing mathematical analyses of leading data attribution methods such as Influence Functions (IF) and single Newton Step (NS) remain limited in two key ways. First, they rely on global strong convexity assumptions which are often not satisfied in practice. Second, the resulting bounds scale very poorly with the number of parameters ($d$) and the number of samples removed ($k$). As a result, these analyses are not tight enough to answer fundamental questions such as "what is the asymptotic scaling of the errors of each method?" or "which of these methods is more accurate for a given dataset?" In this paper, we introduce a new analysis of the NS and IF data attribution methods for convex learning problems. To the best of our knowledge, this is the first analysis of these questions that does not assume global strong convexity and also the first explanation of [KATL19] and [RH25a]'s observation that NS data attribution is often more accurate than IF. We prove that for sufficiently well-behaved logistic regression, our bounds are asymptotically tight up to poly-logarithmic factors, yielding scaling laws for the errors in the average-case sample removals. \[ \mathbb{E}_{T \subseteq [n],\, |T| = k} \bigl[ \|\hatθ_T - \hatθ_T^{\mathrm{NS}}\|_2 \bigr] = \widetildeΘ\!\left(\frac{k d}{n^2}\right), \qquad \mathbb{E}_{T \subseteq [n],\, |T| = k} \bigl[ \|\hatθ_T^{\mathrm{NS}} - \hatθ_T^{\mathrm{IF}}\|_2 \bigr] = \widetildeΘ\!\left( \frac{(k + d)\sqrt{k d}}{n^2} \right). \]

assumption, high probability, theorem 1, (15 more...)

2512.12572

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.66)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Chase, Zachary, Hanneke, Steve, Moran, Shay, Shafer, Jonathan

Optimal Mistake Bounds for Transductive Online Learning

We resolve a 30-year-old open problem concerning the power of unlabeled data in online learning by tightly quantifying the gap between transductive and standard online learning. In the standard setting, the optimal mistake bound is characterized by the Littlestone dimension $d$ of the concept class $H$ (Littlestone 1987). We prove that in the transductive setting, the mistake bound is at least $Ω(\sqrt{d})$. This constitutes an exponential improvement over previous lower bounds of $Ω(\log\log d)$, $Ω(\sqrt{\log d})$, and $Ω(\log d)$, due respectively to Ben-David, Kushilevitz, and Mansour (1995, 1997) and Hanneke, Moran, and Shafer (2023). We also show that this lower bound is tight: for every $d$, there exists a class of Littlestone dimension $d$ with transductive mistake bound $O(\sqrt{d})$. Our upper bound also improves upon the best known upper bound of $(2/3)d$ from Ben-David, Kushilevitz, and Mansour (1997). These results establish a quadratic gap between transductive and standard online learning, thereby highlighting the benefit of advance access to the unlabeled instance sequence. This contrasts with the PAC setting, where transductive and standard learning exhibit similar sample complexities.

adversary, node, sequence, (16 more...)

2512.12567

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Virginia (0.04)
(11 more...)

Genre: Research Report (0.40)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)