Kreutz-Delgado, Ken
Learning from learning machines: a new generation of AI technology to meet the needs of science
Pion-Tonachini, Luca, Bouchard, Kristofer, Martin, Hector Garcia, Peisert, Sean, Holtz, W. Bradley, Aswani, Anil, Dwivedi, Dipankar, Wainwright, Haruko, Pilania, Ghanshyam, Nachman, Benjamin, Marrone, Babetta L., Falco, Nicola, Prabhat, null, Arnold, Daniel, Wolf-Yadlin, Alejandro, Powers, Sarah, Climer, Sharlee, Jackson, Quinn, Carlson, Ty, Sohn, Michael, Zwart, Petrus, Kumar, Neeraj, Justice, Amy, Tomlin, Claire, Jacobson, Daniel, Micklem, Gos, Gkoutos, Georgios V., Bickel, Peter J., Cazier, Jean-Baptiste, Müller, Juliane, Webb-Robertson, Bobbie-Jo, Stevens, Rick, Anderson, Mark, Kreutz-Delgado, Ken, Mahoney, Michael W., Brown, James B.
We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery. The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data. If we address the fundamental challenges associated with "bridging the gap" between domain-driven scientific models and data-driven AI learning machines, then we expect that these AI models can transform hypothesis generation, scientific discovery, and the scientific process itself.
Tuning Confidence Bound for Stochastic Bandits with Bandit Distance
Zhang, Xinyu, Das, Srinjoy, Kreutz-Delgado, Ken
We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance to others. Our UCB distance tuning (UCB-DT) formulation enables improved performance as measured by expected regret by preventing the MAB algorithm from focusing on non-optimal bandits which is a well-known deficiency of standard UCB. "Distance tuning" of the standard UCB is done using a proposed distance measure, which we call bandit distance, that is parameterizable and which therefore can be optimized to control the transition rate from exploration to exploitation based on problem requirements. We empirically demonstrate increased performance of UCB-DT versus many existing state-of-the-art methods which use the UCB formulation for the MAB problem. Our contribution also includes the development of a conceptual tool called the "Exploration Bargain Point" which gives insights into the tradeoffs between exploration and exploitation. We argue that the Exploration Bargain Point provides an intuitive perspective that is useful for comparatively analyzing the performance of UCB-based methods.
Generative and Discriminative Deep Belief Network Classifiers: Comparisons Under an Approximate Computing Framework
Ruan, Siqiao, Colbert, Ian, Kreutz-Delgado, Ken, Das, Srinjoy
The use of Deep Learning hardware algorithms for embedded applications is characterized by challenges such as constraints on device power consumption, availability of labeled data, and limited internet bandwidth for frequent training on cloud servers. To enable low power implementations, we consider efficient bitwidth reduction and pruning for the class of Deep Learning algorithms known as Discriminative Deep Belief Networks (DDBNs) for embedded-device classification tasks. We train DDBNs with both generative and discriminative objectives under an approximate computing framework and analyze their power-at-performance for supervised and semi-supervised applications. We also investigate the out-of-distribution performance of DDBNs when the inference data has the same class structure yet is statistically different from the training data owing to dynamic real-time operating environments. Based on our analysis, we provide novel insights and recommendations for choice of training objectives, bitwidth values, and accuracy sensitivity with respect to the amount of labeled data for implementing DDBN inference with minimum power consumption on embedded hardware platforms subject to accuracy tolerances.
ICLabel: An automated electroencephalographic independent component classifier, dataset, and website
Pion-Tonachini, Luca, Kreutz-Delgado, Ken, Makeig, Scott
The electroencephalogram (EEG) provides a non-invasive, minimally restrictive, and relatively low cost measure of mesoscale brain dynamics with high temporal resolution. Although signals recorded in parallel by multiple, near-adjacent EEG scalp electrode channels are highly-correlated and combine signals from many different sources, biological and non-biological, independent component analysis (ICA) has been shown to isolate the various source generator processes underlying those recordings. Independent components (IC) found by ICA decomposition can be manually inspected, selected, and interpreted, but doing so requires both time and practice as ICs have no particular order or intrinsic interpretations and therefore require further study of their properties. Alternatively, sufficiently-accurate automated IC classifiers can be used to classify ICs into broad source categories, speeding the analysis of EEG studies with many subjects and enabling the use of ICA decomposition in near-real-time applications. While many such classifiers have been proposed recently, this work presents the ICLabel project comprised of (1) an IC dataset containing spatiotemporal measures for over 200,000 ICs from more than 6,000 EEG recordings, (2) a website for collecting crowdsourced IC labels and educating EEG researchers and practitioners about IC interpretation, and (3) the automated ICLabel classifier. The classifier improves upon existing methods in two ways: by improving the accuracy of the computed label estimates and by enhancing its computational efficiency. The ICLabel classifier outperforms or performs comparably to the previous best publicly available method for all measured IC categories while computing those labels ten times faster than that classifier as shown in a rigorous comparison against all other publicly available EEG IC classifiers.