Conditional Sparse Linear Regression

Aug-17-2016–arXiv.org Machine Learning

Linear regression, the fitting of linear relationships among variables in a data set, is a standard tool in data analysis. In particular, for the sake of interpretability and utility in further analysis, we desire to find highly sparse linear relationships, i.e., involving only a few variables. Of course, such simple linear relationships often will not hold across an entire population. But, more frequently there will exist conditions - perhaps a range of parameters or a segment of a larger population - under which such sparse models fit the data quite well. For example, Rosenfeld et al. [16] used data mining heuristics to identify small segments of a population in which a few additional risk factors were highly predictive of certain kinds of cancer, whereas these same risk factors were not significant in the overall population. Simple rules for special cases may also hint at the more complex general rules. More generally, we need to develop new techniques to reason about populations in which most members are atypical in some way, which are colloquially (and somewhat abusively) referred to as long-tailed distributions. We are seeking principled alternatives to ad-hoc approaches such as trying a variety of methods for clustering the data and hoping that the identified clusters can be modeled well.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

Aug-17-2016

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)
- Africa (0.14)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine > Therapeutic Area > Oncology (0.34)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning
      - Computational Learning Theory (0.69)
      - Statistical Learning > Regression (0.63)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found