An Exploration of How Training Set Composition Bias in Machine Learning Affects Identifying Rare Objects

Lake, Sean E., Tsai, Chao-Wei

arXiv.org Artificial Intelligence 

This is due to the rapid expansion of computing (Cutri et al., 2013), had many technical challenges and resources and sensor technology in the last four required intensive astronomy expertise, experience, and labor decades that has driven equally rapid expansions in the to overcome (Eisenhardt et al., 2012, for example). A quantity of data to analyze. Astronomy, in particular, necessary first step in that process, though, is to classify has seen a proliferation of large scale imaging and spectroscopic the sources so that we can prioritize which sources might surveys that have billions of sources in them-- be interesting, and which are examples of already known surveys like: the Sloan Digital Sky Survey (SDSS, York sources. Because these sources are rare it is usually easier et al., 2000), the 2-Micron All Sky Survey (2MASS, Skrutskie to use a supervised machine learning algorithm, one that et al., 2006), the Wide-field Infrared Survey Explorer is tuned using sources with known classifications, than it (WISE, Wright et al., 2010), the Gaia satellite's survey is to use an unsupervised one. The reason should be obvious: (Gaia Collaboration et al., 2016), the Panoramic Survey subgroups of the common known source types are Telescope and Rapid Response System (Pan-STARRS) likely to outnumber the rare new ones, meaning a naive surveys (Chambers et al., 2016), the Dark Energy Spectroscopic unsupervised machine learning algorithm could need a lot Instrument (DESI) surveys (Dey et al., 2019), the of complexity before it actually finds the rare class. UKIRT Infrared Deep Sky Surveys (UKIDSS, Lawrence et al., 2007), and the Galaxy Evolution Explorer (GALEX) Supervised learning also has drawbacks when used for surveys (Martin et al., 2005).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found