AITopics | Data Mining

Collaborating Authors

Data Mining

Computers have become adept at extracting patterns from very large collections of data. For example, shopping transactions can reveal consumers' preferences and message traffic on social networks can reveal political trends.

News Overviews Instructional Materials AI-Alerts Classics

Gaussian Process Bandits for Top-k Recommendations

Neural Information Processing SystemsMay-31-2025, 00:42:00 GMT

Theoretical analysis confirms that the proposed algorithm achieves sub-linear regret in relation to the number of rounds and arms.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.67)
Information Technology > Information Management (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

State Aggregation Learning from Markov Transition Data

Yaqi Duan, Tracy Ke, Mengdi Wang

Neural Information Processing SystemsMay-31-2025, 00:41:14 GMT

State aggregation is a popular model reduction method rooted in optimal control. It reduces the complexity of engineering systems by mapping the system's states into a small number of meta-states. The choice of aggregation map often depends on the data analysts' knowledge and is largely ad hoc. In this paper, we propose a tractable algorithm that estimates the probabilistic aggregation map from the system's trajectory. We adopt a soft-aggregation model, where each meta-state has a signature raw state, called an anchor state.

artificial intelligence, data mining, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Industry:

Transportation > Passenger (0.46)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.96)
Information Technology > Data Science > Data Mining (0.66)

Add feedback

Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms

Neural Information Processing SystemsMay-30-2025, 23:24:12 GMT

We study and provide instance-optimal algorithms in differential privacy by extending and approximating the inverse sensitivity mechanism. We provide two approximation frameworks, one which only requires knowledge of local sensitivities, and a gradient-based approximation for optimization problems, which are efficiently computable for a broad class of functions. We complement our analysis with instance-specific lower bounds for vector-valued functions, which demonstrate that our mechanisms are (nearly) instance-optimal under certain assumptions and that minimax lower bounds may not provide an accurate estimate of the hardness of a problem in general: our algorithms can significantly outperform minimax bounds for well behaved instances. Finally, we use our approximation framework to develop private mechanisms for unbounded-range mean estimation, principal component analysis, and linear regression. For PCA, our mechanisms give an efficient (pure) differentially private algorithm with near-optimal rates.

data mining, machine learning, mechanism, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.56)

Add feedback

DTWNet: a Dynamic Time Warping Network

Xingyu Cai, Tingyang Xu, Jinfeng Yi, Junzhou Huang, Sanguthevar Rajasekaran

Neural Information Processing SystemsMay-30-2025, 21:59:41 GMT

Dynamic Time Warping (DTW) is widely used as a similarity measure in various domains. Due to its invariance against warping in the time axis, DTW provides more meaningful discrepancy measurements between two signals than other distance measures. In this paper, we propose a novel component in an artificial neural network. In contrast to the previous successful usage of DTW as a loss function, the proposed framework leverages DTW to obtain a better feature extraction. For the first time, the DTW loss is theoretically analyzed, and a stochastic backpropogation scheme is proposed to improve the accuracy and efficiency of the DTW learning. We also demonstrate that the proposed framework can be used as a data analysis tool to perform data decomposition.

artificial intelligence, data mining, machine learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift

Neural Information Processing SystemsMay-30-2025, 20:50:56 GMT

As machine learning models are increasingly deployed in dynamic environments, it becomes paramount to assess and quantify uncertainties associated with distribution shifts. A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance. The prediction interval, which captures the range of likely outcomes for a given prediction, serves as a crucial tool for characterizing uncertainties induced by their underlying distribution. In this paper, we propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain under unsupervised domain shift, under which we have labeled samples from a related source domain and unlabeled covariates from the target domain. Our analysis encompasses scenarios where the source and the target domain are related via i) a bounded density ratio, and ii) a measure-preserving transformation. Our proposed methodologies are computationally efficient and easy to implement.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Spain (0.14)
Europe > North Macedonia (0.14)
Europe > Italy (0.14)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Banking & Finance > Real Estate (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Multi-resolution Multi-task Gaussian Processes

Neural Information Processing SystemsMay-30-2025, 20:29:53 GMT

We consider evidence integration from potentially dependent observation processes under varying spatio-temporal sampling resolutions and noise levels. We offer a multi-resolution multi-task (MRGP) framework that allows for both inter-task and intra-task multi-resolution and multi-fidelity. We develop shallow Gaussian Process (GP) mixtures that approximate the difficult to estimate joint likelihood with a composite one and deep GP constructions that learn mappings between resolutions and naturally handle biases. In doing so, we generalize existing approaches and offer information-theoretic corrections and efficient variational approximations. We demonstrate the competitiveness of MRGPs on synthetic settings and on the challenging problem of hyper-local estimation of air pollution levels across London from multiple sensing modalities operating at disparate spatio-temporal resolutions.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Modeling & Simulation (0.74)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Scalable Early Childhood Reading Performance Prediction Zanming Huang 1

Neural Information Processing SystemsMay-30-2025, 20:28:36 GMT

Models for student reading performance can empower educators and institutions to proactively identify at-risk students, thereby enabling early and tailored instructional interventions. However, there are no suitable publicly available educational datasets for modeling and predicting future reading performance. In this work, we introduce the Enhanced Core Reading Instruction (ECRI) dataset, a novel largescale longitudinal tabular dataset collected across 44 schools with 6,916 students and 172 teachers. We leverage the dataset to empirically evaluate the ability of state-of-the-art machine learning models to recognize early childhood educational patterns in multivariate and partial measurements. Specifically, we demonstrate a simple self-supervised strategy in which a Multi-Layer Perception (MLP) network is pre-trained over masked inputs to outperform several strong baselines while generalizing over diverse educational settings. To facilitate future developments in precise modeling and responsible use of models for individualized and early intervention strategies, our data and code are available at https://ecri-data.github.io/.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.48)
Education > Educational Setting > K-12 Education > Primary School (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Graph Edit Distance with General Costs Using Neural Set Divergence Eeshaan Jain Indradyumna Roy

Neural Information Processing SystemsMay-30-2025, 20:08:48 GMT

Graph Edit Distance (GED) measures the (dis-)similarity between two given graphs, in terms of the minimum-cost edit sequence that transforms one graph to the other. However, the exact computation of GED is NP-Hard, which has recently motivated the design of neural methods for GED estimation. However, they do not explicitly account for edit operations with different costs.

data mining, ged, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine (0.96)
Information Technology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox

Neural Information Processing SystemsMay-30-2025, 19:17:16 GMT

Long-tailed data distributions pose challenges for a variety of domains like e-commerce, finance, biomedical science, and cyber security, where the performance of machine learning models is often dominated by head categories while tail categories are inadequately learned. This work aims to provide a systematic view of long-tailed learning with regard to three pivotal angles: (A1) the characterization of data long-tailedness, (A2) the data complexity of various domains, and (A3) the heterogeneity of emerging tasks.

data mining, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Iowa (0.14)

Genre:

Research Report (0.67)
Overview (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(5 more...)

Add feedback

Towards Stable Representations for Protein Interface Prediction Ziqi Gao 1,2

Neural Information Processing SystemsMay-30-2025, 19:17:00 GMT

The knowledge of protein interactions is crucial but challenging for drug discovery applications. This work focuses on protein interface prediction, which aims to determine whether a pair of residues from different proteins interact. Existing data-driven methods have made significant progress in effectively learning protein structures. Nevertheless, they overlook the conformational changes (i.e., flexibility) within proteins upon binding, leading to poor generalization ability. In this paper, we regard the protein flexibility as an attack on the trained model and aim to defend against it for improved generalization. To fulfill this purpose, we propose ATProt, an adversarial training framework for protein representations to robustly defend against the attack of protein flexibility. ATProt can theoretically guarantee protein representation stability under complicated protein flexibility. Experiments on various benchmarks demonstrate that ATProt consistently improves the performance for protein interface prediction. Moreover, our method demonstrates broad applicability, performing the best even when provided with testing structures from structure prediction models like ESMFold and AlphaFold2.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: