AITopics | Unsupervised or Indirectly Supervised Learning

Collaborating Authors

Unsupervised or Indirectly Supervised Learning

Unsupervised learning is a branch of machine learning that learns from test data that has not been labeled, classified or categorized. Instead of responding to feedback, unsupervised learning identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Dynamic Modality-Camera Invariant Clustering for Unsupervised Visible-Infrared Person Re-identification

Yang, Yiming, Hu, Weipeng, Hu, Haifeng

arXiv.org Artificial IntelligenceDec-11-2024

Unsupervised learning visible-infrared person re-identification (USL-VI-ReID) offers a more flexible and cost-effective alternative compared to supervised methods. This field has gained increasing attention due to its promising potential. Existing methods simply cluster modality-specific samples and employ strong association techniques to achieve instance-to-cluster or cluster-to-cluster cross-modality associations. However, they ignore cross-camera differences, leading to noticeable issues with excessive splitting of identities. Consequently, this undermines the accuracy and reliability of cross-modal associations. To address these issues, we propose a novel Dynamic Modality-Camera Invariant Clustering (DMIC) framework for USL-VI-ReID. Specifically, our DMIC naturally integrates Modality-Camera Invariant Expansion (MIE), Dynamic Neighborhood Clustering (DNC) and Hybrid Modality Contrastive Learning (HMCL) into a unified framework, which eliminates both the cross-modality and cross-camera discrepancies in clustering. MIE fuses inter-modal and inter-camera distance coding to bridge the gaps between modalities and cameras at the clustering level. DNC employs two dynamic search strategies to refine the network's optimization objective, transitioning from improving discriminability to enhancing cross-modal and cross-camera generalizability. Moreover, HMCL is designed to optimize instance-level and cluster-level distributions. Memories for intra-modality and inter-modality training are updated using randomly selected samples, facilitating real-time exploration of modality-invariant representations. Extensive experiments have demonstrated that our DMIC addresses the limitations present in current clustering approaches and achieve competitive performance, which significantly reduces the performance gap with supervised methods.

artificial intelligence, machine learning, person re-identification, (15 more...)

arXiv.org Artificial Intelligence

2412.08231

Country:

Asia > Singapore (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.66)

Add feedback

Leveraging Ensemble-Based Semi-Supervised Learning for Illicit Account Detection in Ethereum DeFi Transactions

Fazliani, Shabnam, Sorond, Mohammad Mowlavi, Masoudifard, Arsalan

arXiv.org Artificial IntelligenceDec-3-2024

The advent of smart contracts has enabled the rapid rise of Decentralized Finance (DeFi) on the Ethereum blockchain, offering substantial rewards in financial innovation and inclusivity. However, this growth has also introduced significant security risks, including the proliferation of illicit accounts involved in fraudulent activities. Traditional detection methods are limited by the scarcity of labeled data and the evolving tactics of malicious actors. In this paper, we propose a novel Self-Learning Ensemble-based Illicit account Detection (SLEID) framework to address these challenges. SLEID employs an Isolation Forest for initial outlier detection and a self-training mechanism to iteratively generate pseudo-labels for unlabeled accounts, thereby enhancing detection accuracy. Extensive experiments demonstrate that SLEID significantly outperforms traditional supervised approaches and recent semi-supervised models, achieving superior precision, recall, and F1-scores, particularly in detecting illicit accounts. Compared to state-of-the-art methods, our approach achieves better detection performance while reducing reliance on labeled data. The results affirm SLEID's efficacy as a robust solution for safeguarding the DeFi ecosystem and mitigating risks posed by malicious accounts.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.02408

Country:

North America > United States (0.28)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Middle East > Iraq > Karbala Governorate > Karbala (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.83)
(2 more...)

Add feedback

Revisit Non-parametric Two-sample Testing as a Semi-supervised Learning Problem

Tian, Xunye, Peng, Liuhua, Zhou, Zhijian, Gong, Mingming, Liu, Feng

arXiv.org Machine LearningNov-30-2024

Learning effective data representations is crucial in answering if two samples X and Y are from the same distribution (a.k.a. the non-parametric two-sample testing problem), which can be categorized into: i) learning discriminative representations (DRs) that distinguish between two samples in a supervised-learning paradigm, and ii) learning inherent representations (IRs) focusing on data's inherent features in an unsupervised-learning paradigm. However, both paradigms have issues: learning DRs reduces the data points available for the two-sample testing phase, and learning purely IRs misses discriminative cues. To mitigate both issues, we propose a novel perspective to consider non-parametric two-sample testing as a semi-supervised learning (SSL) problem, introducing the SSL-based Classifier Two-Sample Test (SSL-C2ST) framework. While a straightforward implementation of SSL-C2ST might directly use existing state-of-the-art (SOTA) SSL methods to train a classifier with labeled data (with sample indexes X or Y) and unlabeled data (the remaining ones in the two samples), conventional two-sample testing data often exhibits substantial overlap between samples and violates SSL methods' assumptions, resulting in low test power. Therefore, we propose a two-step approach: first, learn IRs using all data, then fine-tune IRs with only labelled data to learn DRs, which can both utilize information from whole dataset and adapt the discriminative power to the given data. Extensive experiments and theoretical analysis demonstrate that SSL-C2ST outperforms traditional C2ST by effectively leveraging unlabeled data. We also offer a stronger empirically designed test achieving the SOTA performance in many two-sample testing datasets.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2412.00613

Country: North America > United States > New York (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Government (1.00)
Education > Focused Education > Special Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

T-3DGS: Removing Transient Objects for 3D Scene Reconstruction

Pryadilshchikov, Vadim, Markin, Alexander, Komarichev, Artem, Rakhimov, Ruslan, Wonka, Peter, Burnaev, Evgeny

arXiv.org Artificial IntelligenceNov-29-2024

We propose a novel framework to remove transient objects from input videos for 3D scene reconstruction using Gaussian Splatting. Our framework consists of the following steps. In the first step, we propose an unsupervised training strategy for a classification network to distinguish between transient objects and static scene parts based on their different training behavior inside the 3D Gaussian Splatting reconstruction. In the second step, we improve the boundary quality and stability of the detected transients by combining our results from the first step with an off-the-shelf segmentation method. We also propose a simple and effective strategy to track objects in the input video forward and backward in time. Our results show an improvement over the current state of the art in existing sparsely captured datasets and significant improvements in a newly proposed densely captured (video) dataset. More results and code are available at https://transient-3dgs.github.io.

artificial intelligence, machine learning, proceedings, (17 more...)

arXiv.org Artificial Intelligence

2412.00155

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > Middle East > Saudi Arabia (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.34)

Add feedback

Analysis of High-dimensional Gaussian Labeled-unlabeled Mixture Model via Message-passing Algorithm

Gu, Xiaosi, Obuchi, Tomoyuki

arXiv.org Machine LearningNov-29-2024

Semi-supervised learning (SSL) is a machine learning methodology that leverages unlabeled data in conjunction with a limited amount of labeled data. Although SSL has been applied in various applications and its effectiveness has been empirically demonstrated, it is still not fully understood when and why SSL performs well. Some existing theoretical studies have attempted to address this issue by modeling classification problems using the so-called Gaussian Mixture Model (GMM). These studies provide notable and insightful interpretations. However, their analyses are focused on specific purposes, and a thorough investigation of the properties of GMM in the context of SSL has been lacking. In this paper, we conduct such a detailed analysis of the properties of the high-dimensional GMM for binary classification in the SSL setting. To this end, we employ the approximate message passing and state evolution methods, which are widely used in high-dimensional settings and originate from statistical mechanics. We deal with two estimation approaches: the Bayesian one and the l2-regularized maximum likelihood estimation (RMLE). We conduct a comprehensive comparison between these two approaches, examining aspects such as the global phase diagram, estimation error for the parameters, and prediction error for the labels. A specific comparison is made between the Bayes-optimal (BO) estimator and RMLE, as the BO setting provides optimal estimation performance and is ideal as a benchmark. Our analysis shows that with appropriate regularizations, RMLE can achieve near-optimal performance in terms of both the estimation error and prediction error, especially when there is a large amount of unlabeled data. These results demonstrate that the l2 regularization term plays an effective role in estimation and prediction in SSL approaches.

algorithm, amp, bayesian approach, (16 more...)

arXiv.org Machine Learning

2411.19553

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

HiCat: A Semi-Supervised Approach for Cell Type Annotation

Bi, Chang, Bai, Kailun, Li, Xing, Zhang, Xuekui

arXiv.org Artificial IntelligenceNov-24-2024

We introduce HiCat (Hybrid Cell Annotation using Transformative embeddings), a novel semi-supervised pipeline for annotating cell types from single-cell RNA sequencing data. HiCat fuses the strengths of supervised learning for known cell types with unsupervised learning to identify novel types. This hybrid approach incorporates both reference and query genomic data for feature engineering, enhancing the embedding learning process, increasing the effective sample size for unsupervised techniques, and improving the transferability of the supervised model trained on reference data when applied to query datasets. The pipeline follows six key steps: (1) removing batch effects using Harmony to generate a 50-dimensional principal component embedding; (2) applying UMAP for dimensionality reduction to two dimensions to capture crucial data patterns; (3) conducting unsupervised clustering of cells with DBSCAN, yielding a one-dimensional cluster membership vector; (4) merging the multi-resolution results of the previous steps into a 53-dimensional feature space that encompasses both reference and query data; (5) training a CatBoost model on the reference dataset to predict cell types in the query dataset; and (6) resolving inconsistencies between the supervised predictions and unsupervised cluster labels. When benchmarked on 10 publicly available genomic datasets, HiCat surpasses other methods, particularly in differentiating and identifying multiple new cell types. Its capacity to accurately classify novel cell types showcases its robustness and adaptability within intricate biological datasets.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.06805

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > Canada > Saskatchewan > Saskatoon (0.04)
North America > Canada > British Columbia > Vancouver Island > Capital Regional District > Victoria (0.04)
(3 more...)

Genre:

Research Report (1.00)
Workflow (0.87)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

GraphCL: Graph-based Clustering for Semi-Supervised Medical Image Segmentation

Wang, Mengzhu, Li, Jiao, Su, Houcheng, Yin, Nan, Yang, Liang, Li, Shen

arXiv.org Artificial IntelligenceNov-22-2024

Semi-supervised learning (SSL) has made notable advancements in medical image segmentation (MIS), particularly in scenarios with limited labeled data and significantly enhancing data utilization efficiency. Previous methods primarily focus on complex training strategies to utilize unlabeled data but neglect the importance of graph structural information. Different from existing methods, we propose a graph-based clustering for semi-supervised medical image segmentation (GraphCL) by jointly modeling graph data structure in a unified deep model. The proposed GraphCL model enjoys several advantages. Firstly, to the best of our knowledge, this is the first work to model the data structure information for semi-supervised medical image segmentation (SSMIS). Secondly, to get the clustered features across different graphs, we integrate both pairwise affinities between local image features and raw features as inputs. Extensive experimental results on three standard benchmarks show that the proposed GraphCL algorithm outperforms state-of-the-art semi-supervised medical image segmentation methods.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2411.13147

Country:

Asia > Middle East > UAE (0.04)
Asia > Macao (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sketched Equivariant Imaging Regularization and Deep Internal Learning for Inverse Problems

Xu, Guixian, Li, Jinglai, Tang, Junqi

arXiv.org Artificial IntelligenceNov-22-2024

Equivariant Imaging (EI) regularization has become the de-facto technique for unsupervised training of deep imaging networks, without any need of ground-truth data. Observing that the EI-based unsupervised training paradigm currently has significant computational redundancy leading to inefficiency in high-dimensional applications, we propose a sketched EI regularization which leverages the randomized sketching techniques for acceleration. We then extend our sketched EI regularization to develop an accelerated deep internal learning framework -- Sketched Equivariant Deep Image Prior (Sk-EI-DIP), which can be efficiently applied for single-image and task-adapted reconstruction. Additionally, for network adaptation tasks, we propose a parameter-efficient approach for accelerating both EI-DIP and Sk-EI-DIP via optimizing only the normalization layers. Our numerical study on X-ray CT image reconstruction tasks demonstrate that our approach can achieve order-of-magnitude computational acceleration over standard EI-based counterpart in single-input setting, and network adaptation at test time.

artificial intelligence, machine learning, operator, (14 more...)

arXiv.org Artificial Intelligence

2411.05771

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.55)

Add feedback

DiM: $f$-Divergence Minimization Guided Sharpness-Aware Optimization for Semi-supervised Medical Image Segmentation

Wang, Bingli, Su, Houcheng, Yin, Nan, Wang, Mengzhu, Shen, Li

arXiv.org Artificial IntelligenceNov-19-2024

As a technique to alleviate the pressure of data annotation, semi-supervised learning (SSL) has attracted widespread attention. In the specific domain of medical image segmentation, semi-supervised methods (SSMIS) have become a research hotspot due to their ability to reduce the need for large amounts of precisely annotated data. SSMIS focuses on enhancing the model's generalization performance by leveraging a small number of labeled samples and a large number of unlabeled samples. The latest sharpness-aware optimization (SAM) technique, which optimizes the model by reducing the sharpness of the loss function, has shown significant success in SSMIS. However, SAM and its variants may not fully account for the distribution differences between different datasets. To address this issue, we propose a sharpness-aware optimization method based on $f$-divergence minimization (DiM) for semi-supervised medical image segmentation. This method enhances the model's stability by fine-tuning the sensitivity of model parameters and improves the model's adaptability to different datasets through the introduction of $f$-divergence. By reducing $f$-divergence, the DiM method not only improves the performance balance between the source and target datasets but also prevents performance degradation due to overfitting on the source dataset.

artificial intelligence, machine learning, segmentation, (15 more...)

arXiv.org Artificial Intelligence

2411.1235

Country:

Asia > Middle East > UAE (0.04)
Asia > Macao (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.96)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.36)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Hypergraph $p$-Laplacian equations for data interpolation and semi-supervised learning

Shi, Kehan, Burger, Martin

arXiv.org Artificial IntelligenceNov-19-2024

Hypergraph learning with p-Laplacian regularization has attracted a lot of attention due to its flexibility in modeling higher-order relationships in data. This paper focuses on its fast numerical implementation, which is challenging due to the non-differentiability of the objective function and the non-uniqueness of the minimizer. We derive a hypergraph p-Laplacian equation from the subdifferential of the p-Laplacian regularization. A simplified equation that is mathematically well-posed and computationally efficient is proposed as an alternative. Numerical experiments verify that the simplified p-Laplacian equation suppresses spiky solutions in data interpolation and improves classification accuracy in semi-supervised learning. The remarkably low computational cost enables further applications.

equation, minimizer, semi-supervised learning, (15 more...)

arXiv.org Artificial Intelligence

2411.12601

Country:

Europe > Germany > Hamburg (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.62)

Add feedback