AITopics | Data Mining

Collaborating Authors

Data Mining

Computers have become adept at extracting patterns from very large collections of data. For example, shopping transactions can reveal consumers' preferences and message traffic on social networks can reveal political trends.

News Overviews Instructional Materials AI-Alerts Classics

A Framework for Cryptographic Verifiability of End-to-End AI Pipelines

Balan, Kar, Learney, Robert, Wood, Tim

arXiv.org Artificial IntelligenceMar-28-2025

The increasing integration of Artificial Intelligence across multiple industry sectors necessitates robust mechanisms for ensuring transparency, trust, and auditability of its development and deployment. This topic is particularly important in light of recent calls in various jurisdictions to introduce regulation and legislation on AI safety. In this paper, we propose a framework for complete verifiable AI pipelines, identifying key components and analyzing existing cryptographic approaches that contribute to verifiability across different stages of the AI lifecycle, from data sourcing to training, inference, and unlearning. This framework could be used to combat misinformation by providing cryptographic proofs alongside AI-generated assets to allow downstream verification of their provenance and correctness. Our findings underscore the importance of ongoing research to develop cryptographic tools that are not only efficient for isolated AI processes, but that are efficiently `linkable' across different processes within the AI pipeline, to support the development of end-to-end verifiable AI technologies.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.22573

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.67)

Genre: Research Report > New Finding (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Media (0.87)
Law > Civil Rights & Constitutional Law (0.67)
Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Bi-Level Multi-View fuzzy Clustering with Exponential Distance

Sinaga, Kristina P.

arXiv.org Artificial IntelligenceMar-28-2025

In this study, we propose extension of fuzzy c-means (FCM) clustering in multi-view environments. First, we introduce an exponential multi-view FCM (E-MVFCM). E-MVFCM is a centralized MVC with consideration to heat-kernel coefficients (H-KC) and weight factors. Secondly, we propose an exponential bi-level multi-view fuzzy c-means clustering (EB-MVFCM). Different to E-MVFCM, EB-MVFCM does automatic computation of feature and weight factors simultaneously. Like E-MVFCM, EB-MVFCM present explicit forms of the H-KC to simplify the generation of the heat-kernel $\mathcal{K}(t)$ in powers of the proper time $t$ during the clustering process. All the features used in this study, including tools and functions of proposed algorithms will be made available at https://www.github.com/KristinaP09/EB-MVFCM.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.22932

Genre: Research Report (0.90)

Technology:

Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

TS: A Unified Multi-Task Time Series Model

Neural Information Processing SystemsMar-27-2025, 16:28:03 GMT

Although pre-trained transformers and reprogrammed text-based LLMs have shown strong performance on time series tasks, the best-performing architectures vary widely across tasks, with most models narrowly focused on specific areas, such as time series forecasting. Unifying predictive and generative time series tasks within a single model remains challenging.

data mining, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine (1.00)
Government > Military (0.92)
Energy (0.67)
Government > Regional Government (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

On the Powerfulness of Textual Outlier Exposure for Visual OoD Detection

Neural Information Processing SystemsMar-27-2025, 16:26:43 GMT

Successful detection of Out-of-Distribution (OoD) data is becoming increasingly important to ensure safe deployment of neural networks. One of the main challenges in OoD detection is that neural networks output overconfident predictions on OoD data, make it difficult to determine OoD-ness of data solely based on their predictions. Outlier exposure addresses this issue by introducing an additional loss that encourages low-confidence predictions on OoD data during training. While outlier exposure has shown promising potential in improving OoD detection performance, all previous studies on outlier exposure have been limited to utilizing visual outliers.

data mining, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East (0.46)
North America > United States > Minnesota (0.28)
Europe > Switzerland (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

case, please provide a description

Neural Information Processing SystemsMar-27-2025, 16:23:03 GMT

This document is based on Datasheets for Datasets by and edges)? Please see the most updated version The instances of this graph-based dataset comprise here. Link prediction on this dataset is a multi-instance prediction task [3]. For what purpose was the dataset created? Was there a specific task in mind? Was there a specific gap that How many instances are there in total (of each type, needed to be filled?

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.14)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Law (0.94)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.68)
Information Technology > Data Science > Data Mining (0.67)
Information Technology > Information Management (0.67)

Add feedback

MOTIVE: A Drug-Target Interaction Graph For Inductive Link Prediction

Neural Information Processing SystemsMar-27-2025, 16:22:59 GMT

Drug-target interaction (DTI) prediction is crucial for identifying new therapeutics and detecting mechanisms of action.

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
(2 more...)

Add feedback

Diverse Community Data for Benchmarking Data Privacy Algorithms

Neural Information Processing SystemsMar-27-2025, 16:13:36 GMT

The Collaborative Research Cycle (CRC) is a National Institute of Standards and Technology (NIST) benchmarking program intended to strengthen understanding of tabular data deidentification technologies. Deidentification algorithms are vulnerable to the same bias and privacy issues that impact other data analytics and machine learning applications, and it can even amplify those issues by contaminating downstream applications. This paper summarizes four CRC contributions: theoretical work on the relationship between diverse populations and challenges for equitable deidentification; public benchmark data focused on diverse populations and challenging features; a comprehensive open source suite of evaluation metrology for deidentified datasets; and an archive of more than 450 deidentified data samples from a broad range of techniques. The initial set of evaluation results demonstrate the value of the CRC tools for investigations in this field.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.50)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Learning to Understand Open-World Video Anomalies 1,2

Neural Information Processing SystemsMar-27-2025, 16:13:19 GMT

Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation > Ground > Road (1.00)
Law Enforcement & Public Safety (1.00)
Transportation > Infrastructure & Services (0.93)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.91)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Fair GLASSO: Estimating Fair Graphical Models with Unbiased Statistical Behavior

Neural Information Processing SystemsMar-27-2025, 16:11:46 GMT

We propose estimating Gaussian graphical models (GGMs) that are fair with respect to sensitive nodal attributes.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Education (1.00)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Query-Efficient Correlation Clustering with Noisy Oracle

Neural Information Processing SystemsMar-27-2025, 16:07:33 GMT

We study a general clustering setting in which we have n elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the weighted similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We introduce two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB): fixed confidence and fixed budget settings. For both settings, we design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation clustering and study their theoretical guarantees. Our results are the first examples of polynomial-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.

data mining, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: