A method of supervised learning from conflicting data with hidden contexts

Zhang, Tianren, Jiang, Yizhou, Chen, Feng

Feb-13-2025–arXiv.org Artificial Intelligence

Conventional supervised learning assumes a stable input-output relationship. However, this assumption fails in open-ended training settings where the input-output relationship depends on hidden contexts. In this work, we formulate a more general supervised learning problem in which training data is drawn from multiple unobservable domains, each potentially exhibiting distinct input-output maps. This inherent conflict in data renders standard empirical risk minimization training ineffective. To address this challenge, we propose a method LEAF that introduces an allocation function, which learns to assign conflicting data to different predictive models. We establish a connection between LEAF and a variant of the Expectation-Maximization algorithm, allowing us to derive an analytical expression for the allocation function. Finally, we provide a theoretical analysis of LEAF and empirically validate its effectiveness on both synthetic and real-world tasks involving conflicting data.

allocation function, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Feb-13-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.67)

Industry:
- Education (0.87)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Inductive Learning (0.92)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.92)
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (1.00)
  - Natural Language > Large Language Model (0.92)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.67)