AITopics | leveraging unlabeled data

Collaborating Authors

leveraging unlabeled data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Anti-causal domain generalization: Leveraging unlabeled data

Saengkyongam, Sorawit, Gamella, Juan L., Miller, Andrew C., Peters, Jonas, Meinshausen, Nicolai, Heinze-Deml, Christina

arXiv.org Machine LearningFeb-20-2026

The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.

artificial intelligence, domain generalization, machine learning, (16 more...)

arXiv.org Machine Learning

2602.17187

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Diagnostic Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.64)

Add feedback

Leveraging Unlabeled Data: A Guide to Semi-Supervised Learning

#artificialintelligenceMar-8-2023, 09:32:14 GMT

Semi-supervised learning (SSL) is a machine learning technique that aims to improve the accuracy and efficiency of models by leveraging both labeled and unlabeled data. In this technique, a model is trained using a small amount of labeled data, which is then used to make predictions on a much larger set of unlabeled data. The model then learns from these predictions and adjusts its parameters to improve its accuracy. In traditional supervised learning, a model is trained on a dataset that has both input features and corresponding output labels. The model then uses this labeled data to learn patterns and make predictions on new, unseen data.

leveraging unlabeled data, ssl, unlabeled data, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)

Add feedback

Leveraging Unlabeled Data

Communications of the ACMMay-24-2020, 01:09:22 GMT

Despite the rapid advances it has made it over the past decade, deep learning presents many industrial users with problems when they try to implement the technology, issues that the Internet giants have worked around through brute force. "The challenge that today's systems face is the amount of data they need for training," says Tim Ensor, head of artificial intelligence (AI) at U.K.-based technology company Cambridge Consultants. "On top of that, it needs to be structured data." Most of the commercial applications and algorithm benchmarks used to test deep neural networks (DNNs) consume copious quantities of labeled data; for example, images or pieces of text that have already been tagged in some way by a human to indicate what the sample represents. The Internet giants, who have collected the most data for use in training deep learning systems, have often resorted to crowdsourcing measures such as asking people to prove they are human during logins by identifying objects in a collection of images, or simply buying manual labor through services such as Amazon's Mechanical Turk.

artificial intelligence, deep learning, machine learning, (16 more...)

Communications of the ACM

AI-Alerts: 2020 > 2020-05 > AAAI AI-Alert for May 26, 2020 (1.00)

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
Europe > United Kingdom > England > Surrey (0.05)
Europe > Netherlands > South Holland > Leiden (0.05)
(2 more...)

Industry: Information Technology > Services (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Unlabeled Data to Scale Blocking for Record Linkage

AAAI ConferencesJul-19-2011

Record linkage is the process of matching records between two (or multiple) data sets that represent the same real-world entity. An exhaustive record linkage process involves computing the similarities between all pairs of records, which can be very expensive for large data sets. Blocking techniques alleviate this problem by dividing the records into blocks and only comparing records within the same block. To be adaptive from domain to domain, one category of blocking technique formalizes 'construction of blocking scheme' as a machine learning problem. In the process of learning the best blocking scheme, previous learning-based techniques utilize only a set of labeled data. However, since the set of labeled data is usually not large enough to well characterize the unseen (unlabeled) data, the resultant blocking scheme may poorly perform on the unseen data by generating too many candidate matches. To address that, in this paper, we propose to utilize unlabeled data (in addition to labeled data) for learning blocking schemes. Our experimental results show that using unlabeled data in learning can remarkably reduce the number of candidate matches while keeping the same level of coverage for true matches.

candidate match, conjunction, unlabeled data, (16 more...)

AAAI Conferences

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > New York (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)

Add feedback