AITopics | mining method

Collaborating Authors

mining method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Graph Neural Network-Driven Hierarchical Mining for Complex Imbalanced Data

Qi, Yijiashun, Lu, Quanchao, Dou, Shiyu, Sun, Xiaoxuan, Li, Muqing, Li, Yankaiqi

arXiv.org Artificial IntelligenceFeb-6-2025

This study presents a hierarchical mining framework for high-dimensional imbalanced data, leveraging a depth graph model to address the inherent performance limitations of conventional approaches in handling complex, high-dimensional data distributions with imbalanced sample representations. By constructing a structured graph representation of the dataset and integrating graph neural network (GNN) embeddings, the proposed method effectively captures global interdependencies among samples. Furthermore, a hierarchical strategy is employed to enhance the characterization and extraction of minority class feature patterns, thereby facilitating precise and robust imbalanced data mining. Empirical evaluations across multiple experimental scenarios validate the efficacy of the proposed approach, demonstrating substantial improvements over traditional methods in key performance metrics, including pattern discovery count, average support, and minority class coverage. Notably, the method exhibits superior capabilities in minority-class feature extraction and pattern correlation analysis. These findings underscore the potential of depth graph models, in conjunction with hierarchical mining strategies, to significantly enhance the efficiency and accuracy of imbalanced data analysis. This research contributes a novel computational framework for high-dimensional complex data processing and lays the foundation for future extensions to dynamically evolving imbalanced data and multi-modal data applications, thereby expanding the applicability of advanced data mining methodologies to more intricate analytical domains.

imbalanced data, mining, representation, (12 more...)

arXiv.org Artificial Intelligence

2502.03803

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Michigan (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Law Enforcement & Public Safety (0.94)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Critical Example Mining for Vehicle Trajectory Prediction using Flow-based Generative Models

Ding, Zhezhang, Zhao, Huijing

arXiv.org Artificial IntelligenceOct-21-2024

Precise trajectory prediction in complex driving scenarios is essential for autonomous vehicles. In practice, different driving scenarios present varying levels of difficulty for trajectory prediction models. However, most existing research focuses on the average precision of prediction results, while ignoring the underlying distribution of the input scenarios. This paper proposes a critical example mining method that utilizes a data-driven approach to estimate the rareness of the trajectories. By combining the rareness estimation of observations with whole trajectories, the proposed method effectively identifies a subset of data that is relatively hard to predict BEFORE feeding them to a specific prediction model. The experimental results show that the mined subset has higher prediction error when applied to different downstream prediction models, which reaches +108.1% error (greater than two times compared to the average on dataset) when mining 5% samples. Further analysis indicates that the mined critical examples include uncommon cases such as sudden brake and cancelled lane-change, which helps to better understand and improve the performance of prediction models.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.16083

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

NV-Retriever: Improving text embedding models with effective hard-negative mining

Moreira, Gabriel de Souza P., Osmulski, Radek, Xu, Mengyao, Ak, Ronay, Schifferer, Benedikt, Oldridge, Even

arXiv.org Artificial IntelligenceJul-22-2024

Text embedding models have been popular for information retrieval applications such as semantic search and Question-Answering systems based on Retrieval-Augmented Generation (RAG). Those models are typically Transformer models that are fine-tuned with contrastive learning objectives. Many papers introduced new embedding model architectures and training approaches, however, one of the key ingredients, the process of mining negative passages, remains poorly explored or described. One of the challenging aspects of fine-tuning embedding models is the selection of high quality hard-negative passages for contrastive learning. In this paper we propose a family of positive-aware mining methods that leverage the positive relevance score for more effective false negatives removal. We also provide a comprehensive ablation study on hard-negative mining methods over their configurations, exploring different teacher and base models. We demonstrate the efficacy of our proposed methods by introducing the NV-Retriever-v1 model, which scores 60.9 on MTEB Retrieval (BEIR) benchmark and 0.65 points higher than previous methods. The model placed 1st when it was published to MTEB Retrieval on July 07, 2024.

arxiv preprint arxiv, mining method, teacher model, (13 more...)

arXiv.org Artificial Intelligence

2407.15831

Country:

Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
South America > Brazil > São Paulo (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(5 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)

Add feedback

Offline versus Online Triplet Mining based on Extreme Distances of Histopathology Patches

Sikaroudi, Milad, Ghojogh, Benyamin, Safarpoor, Amir, Karray, Fakhri, Crowley, Mark, Tizhoosh, H. R.

arXiv.org Artificial IntelligenceAug-10-2022

We analyze the effect of offline and online triplet mining for colorectal cancer (CRC) histopathology dataset containing 100,000 patches. We consider the extreme, i.e., farthest and nearest patches to a given anchor, both in online and offline mining. While many works focus solely on selecting the triplets online (batch-wise), we also study the effect of extreme distances and neighbor patches before training in an offline fashion. We analyze extreme cases' impacts in terms of embedding distance for offline versus online mining, including easy positive, batch semi-hard, batch hard triplet mining, neighborhood component analysis loss, its proxy version, and distance weighted sampling. We also investigate online approaches based on extreme distance and comprehensively compare offline, and online mining performance based on the data patterns and explain offline mining as a tractable generalization of the online mining with large mini-batch size. As well, we discuss the relations of different colorectal tissue types in terms of extreme distances. We found that offline and online mining approaches have comparable performances for a specific architecture, such as ResNet-18 in this study. Moreover, we found the assorted case, including different extreme distances, is promising, especially in the online approach.

extreme distance, mining, triplet, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-64556-4_26

2007.022

Country: North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.49)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Product typicality attribute mining method based on a topic clustering ensemble - Artificial Intelligence Review

#artificialintelligenceMar-14-2022, 16:49:38 GMT

Despite the extensive application of topic models in natural language processing tasks in recent years, the Chinese texts of short comments characterised by large scale, high noise and small information points have put forward higher requirements for the accuracy and stability of the results, which fails to be satisfied by existing topic models. In this paper, a product typicality attribute mining method based on a topic clustering ensemble was proposed. By introducing multiple topic models into ensemble learning, the problems of semantic representation loss, clustering inefficiency and lack of interpretability in the mining of product typicality attributes of short comment texts should be solved. By an effective combination of the topic clustering algorithm based on the diversity of speech, the topic clustering ensemble algorithm based on the Non-negative matrix factorization, and the interpretation method of product typicality attributes based on the mean-shift algorithm, an unsupervised model of product typicality attribute mining for short comment texts is constructed. As shown by the experimental results, the modelling method assumes favourable performance in topic clustering and feature selection, suggesting its advantages in product typicality attribute identification and interpretability compared with common methods.

mining method, product typicality, typicality, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Acceleration of Large Margin Metric Learning for Nearest Neighbor Classification Using Triplet Mining and Stratified Sampling

Poorheravi, Parisa Abdolrahim, Ghojogh, Benyamin, Gaudet, Vincent, Karray, Fakhri, Crowley, Mark

arXiv.org Machine LearningSep-29-2020

Metric learning is one of the techniques in manifold learning with the goal of finding a projection subspace for increasing and decreasing the inter- and intra-class variances, respectively. Some of the metric learning methods are based on triplet learning with anchor-positive-negative triplets. Large margin metric learning for nearest neighbor classification is one of the fundamental methods to do this. Recently, Siamese networks have been introduced with the triplet loss. Many triplet mining methods have been developed for Siamese networks; however, these techniques have not been applied on the triplets of large margin metric learning for nearest neighbor classification. In this work, inspired by the mining methods for Siamese networks, we propose several triplet mining techniques for large margin metric learning. Moreover, a hierarchical approach is proposed, for acceleration and scalability of optimization, where triplets are selected by stratified sampling in hierarchical hyper-spheres. We analyze the proposed methods on three publicly available datasets, i.e., Fisher Iris, ORL faces, and MNIST datasets.

artificial intelligence, machine learning, metric learning, (14 more...)

arXiv.org Machine Learning

2009.14244

Country:

North America > United States > New York (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.83)

Add feedback

Internet of Things and data mining: From applications to techniques and systems - Gaber - - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery - Wiley Online Library

#artificialintelligenceJan-5-2019, 23:30:25 GMT

The massive adoption of Internet of Things (IoT) opens a plethora of new use cases, applications, frameworks, and data processing architectures. A new ecosystem of supporting technologies is being developed in parallel with IoT to enable resource provisioning for resource‐constrained devices and systems (Baktir, Ozgovde, & Ersoy, 2017; Mao, You, Zhang, Huang, & Letaief, 2017; F. Wang, Hu, Hu, Zhou, & Zhao, 2017). The core of future IoT systems will be designed by integrating mobile edge computing systems, software‐defined networks, 5G, augmented reality, and data mining (including machine learning and artificial intelligence) to name a few (Baktir et al., 2017; Mao et al., 2017). Data mining is the process of discovering hidden knowledge patterns from raw data; therefore, the execution of knowledge discovery processes in IoT environments will leverage the utility of IoT systems. In essence, data mining will play a vital role in highly interactive and intelligent IoT systems.

application, artificial intelligence, data mining, (11 more...)

#artificialintelligence

Genre: Overview (0.55)

Industry: Information Technology > Smart Houses & Appliances (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Knowledge Discovery (0.64)

Add feedback

Internet of Things and data mining: From applications to techniques and systems - Gaber - - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery - Wiley Online Library

#artificialintelligenceJan-5-2019, 23:30:25 GMT

application, artificial intelligence, data mining, (11 more...)

#artificialintelligence

Genre: Overview (0.55)

Industry: Information Technology > Smart Houses & Appliances (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Knowledge Discovery (0.64)

Add feedback

Deep Metric Learning by Online Soft Mining and Class-Aware Attention

Wang, Xinshao, Hua, Yang, Kodirov, Elyor, Hu, Guosheng, Robertson, Neil M.

arXiv.org Machine LearningNov-4-2018

Deep metric learning aims to learn a deep embedding that can capture the semantic similarity of data points. Given the availability of massive training samples, deep metric learning is known to suffer from slow convergence due to a large fraction of trivial samples. Therefore, most existing methods generally resort to sample mining strategies for selecting nontrivial samples to accelerate convergence and improve performance. In this work, we identify two critical limitations of the sample mining methods, and provide solutions for both of them. First, previous mining methods assign one binary score to each sample, i.e., dropping or keeping it, so they only selects a subset of relevant samples in a mini-batch. Therefore, we propose a novel sample mining method, called Online Soft Mining (OSM), which assigns one continuous score to each sample to make use of all samples in the mini-batch. OSM learns extended manifolds that preserve useful intraclass variances by focusing on more similar positives. Second, the existing methods are easily influenced by outliers as they are generally included in the mined subset. To address this, we introduce Class-Aware Attention (CAA) that assigns little attention to abnormal data samples. Furthermore, by combining OSM and CAA, we propose a novel weighted contrastive loss to learn discriminative embeddings. Extensive experiments on two fine-grained visual categorisation datasets and two video-based person re-identification benchmarks show that our method significantly outperforms the state-of-the-art.

artificial intelligence, contrastive loss, machine learning, (19 more...)

arXiv.org Machine Learning

1811.01459

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

MCA-based Rule Mining Enables Interpretable Inference in Clinical Psychiatry

Gao, Qingzhu, Gonzalez, Humberto, Ahammad, Parvez

arXiv.org Machine LearningOct-26-2018

Development of interpretable machine learning models for clinical healthcare applications has the potential of changing the way we understand, treat, and ultimately cure, diseases and disorders in many areas of medicine. Interpretable ML models for clinical healthcare can serve not only as sources of predictions and estimates, but also as discovery tools for clinicians and researchers to reveal new knowledge from the data. High dimensionality of patient information (e.g., phenotype, genotype, and medical history), lack of objective measurements, and the heterogeneity in patient populations often create significant challenges in developing interpretable machine learning models for clinical psychiatry in practice. In this paper we take a step towards the development of such interpretable models. First, by developing a novel categorical rule mining method based on Multivariate Correspondence Analysis (MCA) capable of handling datasets with large numbers of feature categories, and second, by applying this method to build a transdiagnostic Bayesian Rule List model to screen for neuropsychiatric disorders using Consortium for Neuropsychiatric Phenomics dataset. We show that our method is not only at least 100 times faster than state-of-the-art rule mining techniques for datasets with 50 features, but also provides interpretability and comparable prediction accuracy across several benchmark datasets.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1810.11558

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback