AITopics | data mining

Collaborating Authors

data mining

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Elephant in the Room: Towards A Reliable Time-Series Anomaly Detection Benchmark

Neural Information Processing SystemsJun-1-2025, 13:32:19 GMT

Time-series anomaly detection is a fundamental task across scientific fields and industries. However, the field has long faced the "elephant in the room:" critical issues including flawed datasets, biased evaluation measures, and inconsistent benchmarking practices that have remained largely ignored and unaddressed. We introduce the TSB-AD to systematically tackle these issues in the following three aspects: (i) Dataset Integrity: with 1070 high-quality time series from a diverse collection of 40 datasets (doubling the size of the largest collection and four times the number of existing curated datasets), we provide the first large-scale, heterogeneous, meticulously curated dataset that combines the effort of human perception and model interpretation; (ii) Measure Reliability: by revealing issues and biases in evaluation measures, we identify the most reliable and accurate measure, namely, VUS-PR for anomaly detection in time series to address concerns from the community; and (iii) Comprehensive Benchmarking: with a broad spectrum of 40 detection algorithms, from statistical methods to the latest foundation models, we perform a comprehensive evaluation that includes a thorough hyperparameter tuning and a unified setup for a fair and reproducible comparison. Our findings challenge the conventional wisdom regarding the superiority of advanced neural network architectures, revealing that simpler architectures and statistical methods often yield better performance. The promising performance of neural networks on multivariate cases and foundation models on point anomalies highlights the need for further advancements in these methods.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.92)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Consent in Crisis: The Rapid Decline of the AI Data Commons, Ariel Lee

Neural Information Processing SystemsJun-1-2025, 13:17:03 GMT

General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14, 000 web domains provides an expansive view of crawlable web data and how codified data use preferences are changing over time. We observe a proliferation of AIspecific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites' expressed intentions in their Terms of Service and their robots.txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI.

data mining, large language model, machine learning, (22 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Law (1.00)
Information Technology > Services (0.93)
(3 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Web (1.00)
(7 more...)

Add feedback

c2f71567cd53464161cab3336e8fc865-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-1-2025, 12:47:21 GMT

Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte.

data mining, large language model, machine learning, (23 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
North America > Canada (0.14)
Europe > Belgium (0.14)

Genre: Workflow (0.67)

Industry:

Information Technology > Software (0.93)
Law (0.92)
Information Technology > Services (0.68)

Technology:

Information Technology > Software (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(8 more...)

Add feedback

Causal Deciphering and Inpainting in Spatio-Temporal Dynamics via Diffusion Model

Neural Information Processing SystemsJun-1-2025, 12:42:13 GMT

Spatio-temporal (ST) prediction has garnered a De facto attention in earth sciences, such as meteorological prediction, human mobility perception. However, the scarcity of data coupled with the high expenses involved in sensor deployment results in notable data imbalances. Furthermore, models that are excessively customized and devoid of causal connections further undermine the generalizability and interpretability. To this end, we establish a framework for ST predictions from a causal perspective, termed CaPaint, which targets to identify causal regions in data and endow model with causal reasoning ability in a two-stage process. Going beyond this process, we build on the front door adjustment as the theoretical foundation to specifically address the sub-regions identified as non-causal in the upstream phase.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Workflow (0.67)

Industry:

Leisure & Entertainment (0.67)
Information Technology (0.46)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Self-Calibrating Conformal Prediction

Neural Information Processing SystemsJun-1-2025, 11:49:26 GMT

In machine learning, model calibration and predictive inference are essential for producing reliable predictions and quantifying uncertainty to support decisionmaking. Recognizing the complementary roles of point and interval predictions, we introduce Self-Calibrating Conformal Prediction, a method that combines Venn-Abers calibration and conformal prediction to deliver calibrated point predictions alongside prediction intervals with finite-sample validity conditional on these predictions. To achieve this, we extend the original Venn-Abers procedure from binary classification to regression. Our theoretical framework supports analyzing conformal prediction methods that involve calibrating model predictions and subsequently constructing conditionally valid prediction intervals on the same data, where the conditioning set or conformity scores may depend on the calibrated predictions. Real-data experiments show that our method improves interval efficiency through model calibration and offers a practical alternative to feature-conditional validity.

data mining, machine learning, prediction, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.14)
North America > United States > Hawaii (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Education (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Variance Reduction in Bipartite Experiments through Correlation Clustering

Jean Pouget-Abadie, Kevin Aydin, Warren Schudy, Kay Brodersen, Vahab Mirrokni

Neural Information Processing SystemsJun-1-2025, 10:51:33 GMT

Causal inference in randomized experiments typically assumes that the units of randomization and the units of analysis are one and the same. In some applications, however, these two roles are played by distinct entities linked by a bipartite graph. The key challenge in such bipartite settings is how to avoid interference bias, which would typically arise if we simply randomized the treatment at the level of analysis units. One effective way of minimizing interference bias in standard experiments is through cluster randomization, but this design has not been studied in the bipartite setting where conventional clustering schemes can lead to poorly powered experiments. This paper introduces a novel clustering objective and a corresponding algorithm that partitions a bipartite graph so as to maximize the statistical power of a bipartite experiment on that graph. Whereas previous work relied on balanced partitioning, our formulation suggests the use of a correlation clustering objective. We use a publicly-available graph of Amazon useritem reviews to validate our solution and illustrate how it substantially increases the statistical power in bipartite experiments.

data mining, experiment, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Data Science > Data Mining (0.95)

Add feedback

CycleNet: Enhancing Time Series Forecasting through Modeling Periodic Patterns

Neural Information Processing SystemsJun-1-2025, 10:27:02 GMT

The stable periodic patterns present in time series data serve as the foundation for conducting long-horizon forecasts. In this paper, we pioneer the exploration of explicitly modeling this periodicity to enhance the performance of models in long-term time series forecasting (LTSF) tasks. Specifically, we introduce the Residual Cycle Forecasting (RCF) technique, which utilizes learnable recurrent cycles to model the inherent periodic patterns within sequences, and then performs predictions on the residual components of the modeled cycles. Combining RCF with a Linear layer or a shallow MLP forms the simple yet powerful method proposed in this paper, called CycleNet. CycleNet achieves state-of-theart prediction accuracy in multiple domains including electricity, weather, and energy, while offering significant efficiency advantages by reducing over 90% of the required parameter quantity. Furthermore, as a novel plug-and-play technique, the RCF can also significantly improve the prediction accuracy of existing models, including PatchTST and iTransformer.

data mining, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry:

Energy > Renewable > Solar (0.69)
Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Data Science > Data Mining (0.72)

Add feedback

Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback

Mingrui Zhang, Lin Chen, Hamed Hassani, Amin Karbasi

Neural Information Processing SystemsJun-1-2025, 09:42:40 GMT

In this paper, we propose three online algorithms for submodular maximization.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Generalization Analysis for Label-Specific Representation Learning 1,3, Min-Ling Zhang School of Cyber Science and Engineering, Southeast University, Nanjing 210096, China

Neural Information Processing SystemsJun-1-2025, 09:18:42 GMT

Label-specific representation learning (LSRL), i.e., constructing the representation with specific discriminative properties for each class label, is an effective strategy to improve the performance of multi-label learning. However, the generalization analysis of LSRL is still in its infancy. The existing theory bounds for multilabel learning, which preserve the coupling among different components, are invalid for LSRL. In an attempt to overcome this challenge and make up for the gap in the generalization theory of LSRL, we develop a novel vector-contraction inequality and derive the generalization bound for general function class of LSRL with a weaker dependency on the number of labels than the state of the art. In addition, we derive generalization bounds for typical LSRL methods, and these theoretical results reveal the impact of different label-specific representations on generalization analysis. The mild bounds without strong assumptions explain the good generalization ability of LSRL.

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.40)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Practical Differentially Private Top-k Selection with Pay-what-you-get Composition

David Durfee, Ryan M. Rogers

Neural Information Processing SystemsJun-1-2025, 09:17:12 GMT

We study the problem of top-k selection over a large domain universe subject to user-level differential privacy. Typically, the exponential mechanism or report noisy max are the algorithms used to solve this problem. However, these algorithms require querying the database for the count of each domain element. We focus on the setting where the data domain is unknown, which is different than the setting of frequent itemsets where an apriori type algorithm can help prune the space of domain elements to query. We design algorithms that ensures (approximate) (ε, δ > 0)-differential privacy and only needs access to the true top-k elements from the data for any chosen k k. We consider both the setting where a user's data can modify an arbitrary number of counts by at most 1, i.e. unrestricted sensitivity, and the setting where a user's data can modify at most some small, fixed number of counts by at most 1, i.e. restricted sensitivity. Additionally, we provide a pay-what-you-get privacy composition bound for our algorithms. That is, our algorithms might return fewer than k elements when the top-k elements are queried, but the overall privacy budget only decreases by the size of the outcome.

data mining, machine learning, real time system, (21 more...)

Neural Information Processing Systems

Country: