Fairness, Accuracy, and Unreliable Data

Aug-28-2024–arXiv.org Artificial Intelligence

This thesis investigates three areas targeted at improving the reliability of machine learning; fairness in machine learning, strategic classification, and algorithmic robustness. Each of these domains has special properties or structure that can complicate learning. A theme throughout this thesis is thinking about ways in which a'plain' empirical risk minimization algorithm will be misleading or ineffective because of a mis-match between classical learning theory assumptions and specific properties of some data distribution in the wild. The overarching research goal for these related topics is to provide a crisp mathematical model for each learning scenario that exposes different failure modes and makes trade-offs between important metrics explicit in order to provide algorithmic advice or recommendations to practitioners and expose gaps for future research. By tuning our learning algorithms to be more distribution specific in these scenarios, the resulting learned system will exhibit higher utility and avoid catastrophic failure modes. This research is grounded in the theory of machine learning and is fundamentally mathematical in nature, with empirical support when appropriate. Theory is particularly important in these sensitive domains as it is unclear which poor behavior in deployed systems is a natural or benign consequence of a learning system with the underlying distribution,contrasting with problematic but correctable behavior caused by an error in algorithm design or implementation, how to mitigate these issues, or what a successful outcome even looks like in each problem. Theoretical understanding in each domain can help guide best practices and allow for the design of effective, reliable, and robust systems.

arxiv preprint arxiv, multi-robustness guarantee, strategic classification, (15 more...)

arXiv.org Artificial Intelligence

Aug-28-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - United States
    - New York > New York County
      - New York City (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - California
      - San Diego County > San Diego (0.04)
      - Alameda County > Berkeley (0.04)
  - Canada > Alberta
    - Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe
  - Germany (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)
- Asia
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - Middle East
    - Republic of Türkiye > Samsun Province
      - Samsun (0.04)
    - Israel > Southern District
      - Eilat (0.04)
- Africa > South Sudan
  - Equatoria > Central Equatoria > Juba (0.04)

Genre:
- Research Report > New Finding (1.00)
- Workflow (0.92)

Industry:
- Health & Medicine (1.00)
- Banking & Finance (1.00)
- Law (0.92)
- Government (0.67)
- Energy (0.67)
- Education > Educational Setting (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Optimization (1.00)
    - Agents (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Performance Analysis > Accuracy (1.00)
    - Computational Learning Theory (1.00)