Egele, Romain, Balaprakash, Prasanna, Vishwanath, Venkatram, Guyon, Isabelle, Liu, Zhengying

Developing high-performing predictive models for large tabular data sets is a challenging task. The state-of-the-art methods are based on expert-developed model ensembles from different supervised learning methods. Recently, automated machine learning (AutoML) is emerging as a promising approach to automate predictive model development. Neural architecture search (NAS) is an AutoML approach that generates and evaluates multiple neural network architectures concurrently and improves the accuracy of the generated models iteratively. A key issue in NAS, particularly for large data sets, is the large computation time required to evaluate each generated architecture. While data-parallel training is a promising approach that can address this issue, its use within NAS is difficult. For different data sets, the data-parallel training settings such as the number of parallel processes, learning rate, and batch size need to be adapted to achieve high accuracy and reduction in training time. To that end, we have developed AgEBO-Tabular, an approach to combine aging evolution (AgE), a parallel NAS method that searches over neural architecture space, and an asynchronous Bayesian optimization method for tuning the hyperparameters of the data-parallel training simultaneously. We demonstrate the efficacy of the proposed method to generate high-performing neural network models for large tabular benchmark data sets. Furthermore, we demonstrate that the automatically discovered neural network models using our method outperform the state-of-the-art AutoML ensemble models in inference speed by two orders of magnitude while reaching similar accuracy values.

Krishnan, R., Balaprakash, Prasanna

Meta continual learning algorithms seek to train a model when faced with similar tasks observed in a sequential manner. Despite promising methodological advancements, there is a lack of theoretical frameworks that enable analysis of learning challenges such as generalization and catastrophic forgetting. To that end, we develop a new theoretical approach for meta continual learning~(MCL) where we mathematically model the learning dynamics using dynamic programming, and we establish conditions of optimality for the MCL problem. Moreover, using the theoretical framework, we derive a new dynamic-programming-based MCL method that adopts stochastic-gradient-driven alternating optimization to balance generalization and catastrophic forgetting. We show that, on MCL benchmark data sets, our theoretically grounded method achieves accuracy better than or comparable to that of existing state-of-the-art methods.

Mallick, Tanwi, Kiran, Mariam, Mohammed, Bashir, Balaprakash, Prasanna

Wide area networking infrastructures (WANs), particularly science and research WANs, are the backbone for moving large volumes of scientific data between experimental facilities and data centers. With demands growing at exponential rates, these networks are struggling to cope with large data volumes, real-time responses, and overall network performance. Network operators are increasingly looking for innovative ways to manage the limited underlying network resources. Forecasting network traffic is a critical capability for proactive resource management, congestion mitigation, and dedicated transfer provisioning. To this end, we propose a nonautoregressive graph-based neural network for multistep network traffic forecasting. Specifically, we develop a dynamic variant of diffusion convolutional recurrent neural networks to forecast traffic in research WANs. We evaluate the efficacy of our approach on real traffic from ESnet, the U.S. Department of Energy's dedicated science network. Our results show that compared to classical forecasting methods, our approach explicitly learns the dynamic nature of spatiotemporal traffic patterns, showing significant improvements in forecasting accuracy. Our technique can surpass existing statistical and deep learning approaches by achieving approximately 20% mean absolute percentage error for multiple hours of forecasts despite dynamic network traffic settings.

Jiang, Shengli, Balaprakash, Prasanna

Predicting the properties of a molecule from its structure is a challenging task. Recently, deep learning methods have improved the state of the art for this task because of their ability to learn useful features from the given data. By treating molecule structure as graphs, where atoms and bonds are modeled as nodes and edges, graph neural networks (GNNs) have been widely used to predict molecular properties. However, the design and development of GNNs for a given data set rely on labor-intensive design and tuning of the network architectures. Neural architecture search (NAS) is a promising approach to discover high-performing neural network architectures automatically. To that end, we develop an NAS approach to automate the design and development of GNNs for molecular property prediction. Specifically, we focus on automated development of message-passing neural networks (MPNNs) to predict the molecular properties of small molecules in quantum mechanics and physical chemistry data sets from the MoleculeNet benchmark. We demonstrate the superiority of the automatically discovered MPNNs by comparing them with manually designed GNNs from the MoleculeNet benchmark. We study the relative importance of the choices in the MPNN search space, demonstrating that customizing the architecture is critical to enhancing performance in molecular property prediction and that the proposed approach can perform customization automatically with minimal manual effort.

Madireddy, Sandeep, Yanguas-Gil, Angel, Balaprakash, Prasanna

We focus on the problem of how to achieve online continual learning under memory-constrained conditions where the input data may not be known a priori. These constraints are relevant in edge computing scenarios. We have developed an architecture where input processing over data streams and online learning are integrated in a single recurrent network architecture. This allows us to cast metalearning optimization as a mixed-integer optimization problem, where different synaptic plasticity algorithms and feature extraction layers can be swapped out and their hyperparameters are optimized to identify optimal architectures for different sets of tasks. We utilize a Bayesian optimization method to search over a design space that spans multiple learning algorithms, their specific hyperparameters, and feature extraction layers. We demonstrate our approach for online non-incremental and class-incremental learning tasks. Our optimization algorithm finds configurations that achieve superior continual learning performance on Split-MNIST and Permuted-MNIST data as compared with other memory-constrained learning approaches, and it matches that of the state-of-the-art memory replay-based approaches without explicit data storage and replay. Our approach allows us to explore the transferability of optimal learning conditions to tasks and datasets that have not been previously seen. We demonstrate that the accuracy of our transfer metalearning across datasets can be largely explained through a transfer coefficient that can be based on metrics of dimensionality and distance between datasets.

Khairy, Sami, Balaprakash, Prasanna, Cai, Lin X.

The canonical solution methodology for finite constrained Markov decision processes (CMDPs), where the objective is to maximize the expected infinite-horizon discounted rewards subject to the expected infinite-horizon discounted costs constraints, is based on convex linear programming. In this brief, we first prove that the optimization objective in the dual linear program of a finite CMDP is a piece-wise linear convex function (PWLC) with respect to the Lagrange penalty multipliers. Next, we propose a novel two-level Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find the optimal state-value function and Lagrange penalty multipliers of a finite CMDP. The proposed algorithm is applied in two stochastic control problems with constraints: robot navigation in a grid world and solar-powered unmanned aerial vehicle (UAV)-based wireless network management. We empirically compare the convergence performance of the proposed GAS algorithm with binary search (BS), Lagrangian primal-dual optimization (PDO), and Linear Programming (LP). Compared with benchmark algorithms, it is shown that the proposed GAS algorithm converges to the optimal solution faster, does not require hyper-parameter tuning, and is not sensitive to initialization of the Lagrange penalty multiplier.

Wycoff, Nathan, Balaprakash, Prasanna, Xia, Fangfang

If edge devices are to be deployed to critical applications where their decisions could have serious financial, political, or public-health consequences, they will need a way to signal when they are not sure how to react to their environment. For instance, a lost delivery drone could make its way back to a distribution center or contact the client if it is confused about how exactly to make its delivery, rather than taking the action which is "most likely" correct. This issue is compounded for health care or military applications. However, the brain-realistic temporal credit assignment problem neuromorphic computing algorithms have to solve is difficult. The double role weights play in backpropagation-based-learning, dictating how the network reacts to both input and feedback, needs to be decoupled. e-prop 1 is a promising learning algorithm that tackles this with Broadcast Alignment (a technique where network weights are replaced with random weights during feedback) and accumulated local information. We investigate under what conditions the Bayesian loss term can be expressed in a similar fashion, proposing an algorithm that can be computed with only local information as well and which is thus no more difficult to implement on hardware. This algorithm is exhibited on a store-recall problem, which suggests that it can learn good uncertainty on decisions to be made over time.

Khairy, Sami, Shaydulin, Ruslan, Cincio, Lukasz, Alexeev, Yuri, Balaprakash, Prasanna

Quantum computing is a computational paradigm with the potential to outperform classical methods for a variety of problems. Proposed recently, the Quantum Approximate Optimization Algorithm (QAOA) is considered as one of the leading candidates for demonstrating quantum advantage in the near term. QAOA is a variational hybrid quantum-classical algorithm for approximately solving combinatorial optimization problems. The quality of the solution obtained by QAOA for a given problem instance depends on the performance of the classical optimizer used to optimize the variational parameters. In this paper, we formulate the problem of finding optimal QAOA parameters as a learning task in which the knowledge gained from solving training instances can be leveraged to find high-quality solutions for unseen test instances. To this end, we develop two machine-learning-based approaches. Our first approach adopts a reinforcement learning (RL) framework to learn a policy network to optimize QAOA circuits. Our second approach adopts a kernel density estimation (KDE) technique to learn a generative model of optimal QAOA parameters. In both approaches, the training procedure is performed on small-sized problem instances that can be simulated on a classical computer; yet the learned RL policy and the generative model can be used to efficiently solve larger problems. Extensive simulations using the IBM Qiskit Aer quantum circuit simulator demonstrate that our proposed RL- and KDE-based approaches reduce the optimality gap by factors up to 30.15 when compared with other commonly used off-the-shelf optimizers.

Khairy, Sami, Shaydulin, Ruslan, Cincio, Lukasz, Alexeev, Yuri, Balaprakash, Prasanna

Quantum computing exploits basic quantum phenomena such as state superposition and entanglement to perform computations. The Quantum Approximate Optimization Algorithm (QAOA) is arguably one of the leading quantum algorithms that can outperform classical state-of-the-art methods in the near term. QAOA is a hybrid quantum-classical algorithm that combines a parameterized quantum state evolution with a classical optimization routine to approximately solve combinatorial problems. The quality of the solution obtained by QAOA within a fixed budget of calls to the quantum computer depends on the performance of the classical optimization routine used to optimize the variational parameters. In this work, we propose an approach based on reinforcement learning (RL) to train a policy network that can be used to quickly find high-quality variational parameters for unseen combinatorial problem instances. The RL agent is trained on small problem instances which can be simulated on a classical computer, yet the learned RL policy is generalizable and can be used to efficiently solve larger instances. Extensive simulations using the IBM Qiskit Aer quantum circuit simulator demonstrate that our trained RL policy can reduce the optimality gap by a factor up to 8.61 compared with other off-the-shelf optimizers tested.

Jiang, Peihong, Doan, Hieu, Madireddy, Sandeep, Assary, Rajeev Surendran, Balaprakash, Prasanna

Computer-assisted synthesis planning aims to help chemists find better reaction pathways faster. Finding viable and short pathways from sugar molecules to value-added chemicals can be modeled as a retrosynthesis planning problem with a catalyst allowed. This is a crucial step in efficient biomass conversion. The traditional computational chemistry approach to identifying possible reaction pathways involves computing the reaction energies of hundreds of intermediates, which is a critical bottleneck in silico reaction discovery. Deep reinforcement learning has shown in other domains that a well-trained agent with little or no prior human knowledge can surpass human performance. While some effort has been made to adapt machine learning techniques to the retrosynthesis planning problem, value-added chemical discovery presents unique challenges. Specifically, the reaction can occur in several different sites in a molecule, a subtle case that has never been treated in previous works. With a more versatile formulation of the problem as a Markov decision process, we address the problem using deep reinforcement learning techniques and present promising preliminary results.