AITopics | Europe

Collaborating Authors

Europe

Sequential Minimal Optimization Algorithm for One-Class Support Vector Machines With Privileged Information

Lange, Andrey, Smolyakov, Dmitry, Burnaev, Evgeny

arXiv.org Machine LearningJun-23-2026

One of the powerful techniques in data modeling is accounting for features that are available at the training stage, but are not available when the trained model is used to classify or predict test data -- the Learning Using Privileged Information paradigm (LUPI). Sequential Minimal Optimization (SMO) methods have been developed for supervised Support Vector Machines (SVM), unsupervised one-class SVM, and SVM with privileged information (SVM+). The missing brick in this research has long been a one-class SVM with privileged information (OC-SVM+). In this paper, we propose an SMO algorithm for OC-SVM+ that significantly outperforms non-sequential algorithms for training the OC-SVM+ model. Its finite-time convergence is established. The experiments show how privileged information affects a descriptive domain in the space of original features. Comparative benchmark tests demonstrate that our algorithm is superior over interior point algorithms.

artificial intelligence, machine learning, privileged information, (10 more...)

arXiv.org Machine Learning

doi: 10.1109/ACCESS.2023.3331685

2606.2221

Country: Europe > Russia (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Non-asymptotic estimates of the minimal risk in statistical learning

Wu, Liming, Yang, Sen

arXiv.org Machine LearningJun-23-2026

In this paper we prove some concentration inequalities for two types of error probabilities in the Empirical Risk Principle (ERP) in statistical learning, which provide a lower bound and an upper bound for the minimal risk (in terms of the minimal empirical risk) with non-asymptotic high confidence. The usual boundedness condition of the empirical risk function is relaxed to the Gaussian or exponential integrability condition. The confidence of the lower bound of the minimal risk is shown to be independent of the number of training parameters and the dimension of the input vectors, allowing one to detect the deficiency of a learning machine efficiently; and the confidence of the upper bound of the minimal risk is proved to be high provided that the sample size $n$ is much greater than the box dimension of the parameter set $Θ$ in the Orlicz metric $d_{ψ_1}$ associated with the risk functions. Our work is based on Talagrand's concentration inequalities (the sharp versions by Bousquet and Klein-Rio), transport-entropy inequalities and the recent progress in the theory of empirical processes and statistical learning.

artificial intelligence, inequality, machine learning, (17 more...)

arXiv.org Machine Learning

2606.23295

Country: Europe > France (0.28)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Convergence Analysis of Nyström Subsampling in Covariate Shift Adaptation for Misspecified case

Myleiko, Hanna, Solodky, Sergei, Semenov, Vasyl

arXiv.org Machine LearningJun-23-2026

This paper investigates convergence properties of regularized Nystr om subsampling applied to the unsupervised domain adaptation problem under covariate shift. We focus on the low-smoothness (misspecified) case where the target function lies outside the reproducing kernel Hilbert space. By combining Tikhonov regularization with Nystr om projection onto a subsampled subspace, we obtain upper bounds on the excess risk that hold with high probability and are expressed in terms of the source condition, the effective dimension, and the sample sizes. We further extend the analysis to the setting where the Radon-Nikodym derivative between the target and source marginal distributions is unknown and must be approximated, and we identify the minimal additional sample sizes required to maintain the same convergence rate as in the oracle case.

artificial intelligence, machine learning, tjt, (16 more...)

arXiv.org Machine Learning

2606.22259

Country: Europe > Ukraine (0.28)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses

Neural Information Processing SystemsJun-22-2026, 23:52:58 GMT

We consider the problem setting of prediction with expert advice with possibly heavy-tailed losses, i.e. the only assumption on the losses is an upper bound on their second moments, denoted by θ. We develop adaptive algorithms that do not require any prior knowledge about the range or the second moment of the losses. Existing adaptive algorithms have what is typically considered a lower-order term in their regret guarantees. We show that this lower-order term, which is often the maximum of the losses, can actually dominate the regret bound in our setting. Specifically, we show that even with small constant θ, this lower-order term can scale as KT, where K is the number of experts and T is the time horizon. We propose adaptive algorithms with improved regret bounds that avoid the dependence on such a lower-order term and guarantee O( p θT log(K)) regret in the worst case, and O(θlog(KT)/ min) regret when the losses are sampled i.i.d.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country: Europe > Netherlands (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Banking & Finance (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

SPARTAN: ASparse Transformer World Model Attending to What Matters

Neural Information Processing SystemsJun-22-2026, 23:51:53 GMT

Capturing the interactions between entities in a structured way plays a central role in world models that flexibly adapt to changes in the environment. Recent works motivate the benefits of models that explicitly represent the structure of interactions and formulate the problem as discovering local causal structures. In this work, we demonstrate that reliably capturing these relationships in complex settings remains challenging. To remedy this shortcoming, we postulate that sparsity is a critical ingredient for the discovery of such local structures. To this end, we present the SPARse TrANsformer World model (SPARTAN), a Transformer-based world model that learns context-dependent interaction structures between entities in a scene. By applying sparsity regularisation on the attention patterns between objectfactored tokens, SPARTAN learns sparse, context-dependent interaction graphs that accurately predict future object states. We further extend our model to adapt to sparse interventions with unknown targets in the dynamics of the environment. This results in a highly interpretable world model that can efficiently adapt to changes. Empirically, we evaluate SPARTAN against the current state-of-the-art in object-centric world models in observation-based environments and demonstrate that our model can learn local causal graphs that accurately reflect the underlying interactions between objects, achieving significantly improved few-shot adaptation to dynamics changes, as well as robustness against distractors.

artificial intelligence, causal graph, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe (0.67)
North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

AneuG-Flow: ALarge-Scale Synthetic Dataset of Diverse Intracranial Aneurysm Geometries and Hemodynamics

Neural Information Processing SystemsJun-22-2026, 23:51:33 GMT

Hemodynamics has a substantial influence on normal cardiovascular growth and disease formation, but requires time-consuming simulations to obtain. Deep Learning algorithms to rapidly predict hemodynamics parameters can be very useful, but their development is hindered by the lack of large dataset on anatomic geometries and associated fluid dynamics. This paper presents a new large-scale dataset of intracranial aneurysm (IA) geometries and hemodynamics to support the development of neural operators to solve geometry-dependent flow governing partial differential equations. The dataset includes 14,000 steady-flow cases and 730 pulsatile-flow cases simulated with computational fluid dynamics. All cases are computed using a laminar flow setup with more than 3 million cells.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
Asia > Singapore (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets

Neural Information Processing SystemsJun-22-2026, 23:48:56 GMT

In studies of transferable learning, scaling laws are obtained for various important foundation models to predict their properties and performance at larger scales. Taking language-vision learning as example, we show here how scaling law derivation can also be used for model and dataset comparison, allowing to decide which procedure is to be preferred for pre-training. Full scaling laws based on dense measurements across a wide span of model and samples seen scales are derived for two important language-vision learning procedures, CLIP and MaMMUT, that use either contrastive only or contrastive and captioning text generative loss. For the first time, we use derived scaling laws to compare both models and three open datasets, DataComp-1.4B,

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Europe (0.92)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

e1ebda145808ca45774993fb67314894-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-22-2026, 23:48:07 GMT

ARelated Work1 Data Attribution Evaluation: Given recent developments in data attribution methods for LLMs,2 past works in evaluating these methods fall two major categories: leave-out-out and task-based3 evaluation. Leave-one-out evaluation measures the correlation between the data attribution method4 scores and model-retraining, which can also be approximated using linear datamodeling score [26].5 In task-based evaluation, the data attribution method is evaluated based on its application towards6 downstream task, such as noisy label detection, counterfactual evaluation [3, 13].7 Training Data Selection: Selecting high-quality training data selection is important for efficient8 learning in LLMs. Common approaches to data selection relies on heuristic filtering, such as de-9 duplication and lexicon-filtering, [34], or semantic rating [48, 52]. Recent works have applied data10 attribution methods towards data selection in LLMs in both pre-training [56, 59, 15] and post-training11 [45, 53, 31]. These data attribution methods are dynamic and model-aware - increasing the frequency12 of performing selection is one way to take greater account for group influence, where online selection13 at each training step is most fine-grained [49].14 Toxicity/Bias Detection: Detecting and mitigating toxic/biased LLMs outputs is a crucial for safe15 deployment in real-word settings. Existing methods for detecting toxicity/bias in LLMs commonly16 include online API tools 1 [37] or LLM-classifiers [58, 21, 16, 27]. Factual Attribution: Identifying training examples which causes LLMs to generate specific factual20 statements is an important application of data attribution as AI tools are becoming increasingly21 common. Apart from baseline retrieval methods that leverage lexical/semantic similarity like BM2522 [48], Rep Sim [44] and Gecko [33], recent works have explored the use of data attribution in tracing23 factual knowledge in both pre-training[6] and post-training [42, 2].24 We provide below descriptions to the data attribution methods and non-attribution baselines evaluated26 in this work. Note that in our work, we consider non-attribution baselines as methods that do not27 estimate the impact of training samples on models, as detailed in [19].28 Rep-Sim [44]: (Non-attribution baseline) Rep-Sim computes the cosine similarity between last29 token last layer hidden states of training and reference examples. It is more efficient compared with30 gradient-based data attribution methods. BM25 [48]: (Non-attribution baseline) BM25 is a classic information retrieval algorithm that ranks33 training samples by lexical overlap with the query. It is significantly more efficient compared with34 gradient-based data attribution methods.35

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.46)
Asia > Middle East (0.28)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry:

Government (0.93)
Media (0.93)
Information Technology (0.93)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models

Neural Information Processing SystemsJun-22-2026, 23:48:00 GMT

Data attribution methods quantify the influence of training data on model outputs and are becoming increasingly relevant for a wide range of LLM research and applications, including dataset curation, model interpretability, data valuation. However, there remain critical gaps in systematic LLM-centric evaluation of data attribution methods. To this end, we introduce DATE-LM (Data Attribution Evaluation in Language Models), a unified benchmark for evaluating data attribution methods through real-world LLM applications.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.93)

Industry:

Government (1.00)
Media (0.92)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Small Singular Values Matter: ARandom Matrix Analysis of Transformer Models

Neural Information Processing SystemsJun-22-2026, 23:47:01 GMT

This work analyzes singular-value spectra of weight matrices in pretrained transformer models to understand how information is stored at both ends of the spectrum.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback