Feature Selection with Distance Correlation

Das, Ranit, Kasieczka, Gregor, Shih, David

Nov-30-2022–arXiv.org Artificial Intelligence

Choosing which properties of the data to use as input to multivariate decision algorithms -- a.k.a. feature selection -- is an important step in solving any problem with machine learning. While there is a clear trend towards training sophisticated deep networks on large numbers of relatively unprocessed inputs (so-called automated feature engineering), for many tasks in physics, sets of theoretically well-motivated and well-understood features already exist. Working with such features can bring many benefits, including greater interpretability, reduced training and run time, and enhanced stability and robustness. We develop a new feature selection method based on Distance Correlation (DisCo), and demonstrate its effectiveness on the tasks of boosted top- and $W$-tagging. Using our method to select features from a set of over 7,000 energy flow polynomials, we show that we can match the performance of much deeper architectures, by using only ten features and two orders-of-magnitude fewer model parameters.

artificial intelligence, classifier, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Nov-30-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New Jersey > Middlesex County > Piscataway (0.04)
- Europe
  - Germany > Hamburg (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.94)
  - Statistical Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found