AITopics | Data Science

Collaborating Authors

Data Science

News Overviews Instructional Materials AI-Alerts Classics

Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction

Neural Information Processing SystemsJun-2-2025, 16:04:09 GMT

Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (1.00)
Transportation (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

FairJob: A Real-World Dataset for Fairness in Online Systems

Neural Information Processing SystemsJun-2-2025, 15:56:42 GMT

We introduce a fairness-aware dataset for job recommendation in advertising, designed to foster research in algorithmic fairness within real-world scenarios. It was collected and prepared to comply with privacy standards and business confidentiality. An additional challenge is the lack of access to protected user attributes such as gender, for which we propose a solution to obtain a proxy estimate. Despite being anonymized and including a proxy for a sensitive attribute, our dataset preserves predictive power and maintains a realistic and challenging benchmark. This dataset addresses a significant gap in the availability of fairnessfocused resources for high-impact domains like advertising - the actual impact being having access or not to precious employment opportunities, where balancing fairness and utility is a common industrial challenge. We also explore various stages in the advertising process where unfairness can occur and introduce a method to compute a fair utility metric for the job recommendations in online systems case from a biased dataset. Experimental evaluations of bias mitigation techniques on the released dataset demonstrate potential improvements in fairness and the associated trade-offs with utility.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Marketing (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(6 more...)

Add feedback

Data curation via joint example selection further accelerates multimodal learning Olivier J. Hénaff

Neural Information Processing SystemsJun-2-2025, 15:53:00 GMT

Data curation is an essential component of large-scale pretraining. In this work, we demonstrate that jointly prioritizing batches of data is more effective for learning than selecting examples independently. Multimodal contrastive objectives expose the dependencies between data and thus naturally yield criteria for measuring the joint learnability of a batch. We derive a simple and tractable algorithm for selecting such batches, which significantly accelerate training beyond individuallyprioritized data points. As performance improves by selecting from large superbatches, we also leverage recent advances in model approximation to reduce the computational overhead of scoring.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Europe > Netherlands (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

NE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction

Neural Information Processing SystemsJun-2-2025, 15:38:31 GMT

Federated learning (FL) has rapidly evolved as a promising paradigm that enables collaborative model training across distributed participants without exchanging their local data. Despite its broad applications in fields such as computer vision, graph learning, and natural language processing, the development of a data projection model that can be effectively used to visualize data in the context of FL is crucial yet remains heavily under-explored. Neighbor embedding (NE) is an essential technique for visualizing complex high-dimensional data, but collaboratively learning a joint NE model is difficult. The key challenge lies in the objective function, as effective visualization algorithms like NE require computing loss functions among pairs of data.

data mining, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.41)

Add feedback

EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture Models Shangquan Sun 1,2 Hyunhee Park 6

Neural Information Processing SystemsJun-2-2025, 15:38:04 GMT

Image restoration has experienced significant advancements due to the development of deep learning. Nevertheless, it encounters challenges related to ill-posed problems, resulting in deviations between single model predictions and ground-truths. Ensemble learning, as a powerful machine learning technique, aims to address these deviations by combining the predictions of multiple base models. Most existing works adopt ensemble learning during the design of restoration models, while only limited research focuses on the inference-stage ensemble of pre-trained restoration models. Regression-based methods fail to enable efficient inference, leading researchers in academia and industry to prefer averaging as their choice for post-training ensemble.

artificial intelligence, ensemble, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (0.93)
Instructional Material (0.67)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Continuous Temporal Domain Generalization

Neural Information Processing SystemsJun-2-2025, 15:26:48 GMT

Temporal Domain Generalization (TDG) addresses the challenge of training predictive models under temporally varying data distributions. Traditional TDG approaches typically focus on domain data collected at fixed, discrete time intervals, which limits their capability to capture the inherent dynamics within continuous-evolving and irregularly-observed temporal domains. To overcome this, this work formalizes the concept of Continuous Temporal Domain Generalization (CTDG), where domain data are derived from continuous times and are collected at arbitrary times. CTDG tackles critical challenges including: 1) Characterizing the continuous dynamics of both data and models, 2) Learning complex high-dimensional nonlinear dynamics, and 3) Optimizing and controlling the generalization across continuous temporal domains. To address them, we propose a Koopman operator-driven continuous temporal domain generalization (Koodos) framework. We formulate the problem within a continuous dynamic system and leverage the Koopman theory to learn the underlying dynamics; the framework is further enhanced with a comprehensive optimization strategy equipped with analysis and control driven by prior knowledge of the dynamics patterns. Extensive experiments demonstrate the effectiveness and efficiency of our approach.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Epidemiology (0.68)
Government (0.67)
(2 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Modeling & Simulation (0.91)
(2 more...)

Add feedback

Identifiable Shared Component Analysis of Unpaired Multimodal Mixtures

Neural Information Processing SystemsJun-2-2025, 15:26:30 GMT

A core task in multi-modal learning is to integrate information from multiple feature spaces (e.g., text and audio), offering modality-invariant essential representations of data. Recent research showed that, classical tools such as canonical correlation analysis (CCA) provably identify the shared components up to minor ambiguities, when samples in each modality are generated from a linear mixture of shared and private components. Such identifiability results were obtained under the condition that the cross-modality samples are aligned/paired according to their shared information. This work takes a step further, investigating shared component identifiability from multi-modal linear mixtures where cross-modality samples are unaligned. A distribution divergence minimization-based loss is proposed, under which a suite of sufficient conditions ensuring identifiability of the shared components are derived. Our conditions are based on cross-modality distribution discrepancy characterization and density-preserving transform removal, which are much milder than existing studies relying on independent component analysis. More relaxed conditions are also provided via adding reasonable structural constraints, motivated by available side information in various applications. The identifiability claims are thoroughly validated using synthetic and real-world data.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Oregon (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Information Technology (0.67)
Government (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction

Neural Information Processing SystemsJun-2-2025, 15:14:02 GMT

Time series prediction problems are becoming increasingly high-dimensional in modern applications, such as climatology and demand forecasting. For example, in the latter problem, the number of items for which demand needs to be forecast might be as large as 50,000. In addition, the data is generally noisy and full of missing values. Thus, modern applications require methods that are highly scalable, and can deal with noisy data in terms of corruptions or missing values. However, classical time series methods usually fall short of handling these issues.

artificial intelligence, machine learning, temporal regularized matrix factorization, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.42)

Add feedback

Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees Nicolas Beltran-Velez 1 Alp Kucukelbir 1,4

Neural Information Processing SystemsJun-2-2025, 15:01:48 GMT

Probabilistic prediction aims to compute predictive distributions rather than single point predictions. These distributions enable practitioners to quantify uncertainty, compute risk, and detect outliers. However, most probabilistic methods assume parametric responses, such as Gaussian or Poisson distributions. When these assumptions fail, such models lead to bad predictions and poorly calibrated uncertainty. In this paper, we propose Treeffuser, an easy-to-use method for probabilistic prediction on tabular data.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (0.67)
Banking & Finance (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Robust Conformal Prediction Using Privileged Information

Neural Information Processing SystemsJun-2-2025, 14:58:22 GMT

We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data, such as missing or noisy variables. Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption. Importantly, naively applying conformal prediction does not provide reliable predictions in this setting, due to the distribution shift induced by the corruptions. To account for the distribution shift, we assume access to privileged information (PI). The PI is formulated as additional features that explain the distribution shift, however, they are only available during training and absent at test time. We approach this problem by introducing a novel generalization of weighted conformal prediction and support our method with theoretical coverage guarantees. Empirical experiments on both real and synthetic datasets indicate that our approach achieves a valid coverage rate and constructs more informative predictions compared to existing methods, which are not supported by theoretical guarantees.

data mining, machine learning, prediction, (17 more...)

Neural Information Processing Systems

Country: