Strohmeier, Christopher, Lyu, Hanbaek, Needell, Deanna

Nonnegative Matrix Factorization (NMF) algorithms are fundamental tools in learning low-dimensional features from vector-valued data, Nonnegative Tensor Factorization (NTF) algorithms serve a similar role for dictionary learning problems for multi-modal data. Also, there is often a critical interest in \textit{online} versions of such factorization algorithms to learn progressively from minibatches, without requiring the full data as in conventional algorithms. However, the current theory of Online NTF algorithms is quite nascent, especially compared to the comprehensive literature on online NMF algorithms. In this work, we introduce a novel online NTF algorithm that learns a CP basis from a given stream of tensor-valued data under general constraints. In particular, using nonnegativity constraints, the learned CP modes also give localized dictionary atoms that respect the tensor structure in multi-model data. On the application side, we demonstrate the utility of our algorithm on a diverse set of examples from image, video, and time series data, illustrating how one may learn qualitatively different CP-dictionaries by not needing to reshape tensor data before the learning process. On the theoretical side, we prove that our algorithm converges to the set of stationary points of the objective function under the hypothesis that the sequence of data tensors have functional Markovian dependence. This assumption covers a wide range of application contexts including data streams generated by independent or MCMC sampling.

Grotheer, Rachel, Huang, Yihuan, Li, Pengyu, Rebrova, Elizaveta, Needell, Deanna, Huang, Longxiu, Kryshchenko, Alona, Li, Xia, Ha, Kyung, Kryshchenko, Oleksandr

The appearance of the novel SARS-CoV-2 virus on the global scale has generated demand for rapid 1.1 Contributions research into the virus and the disease it causes, Our methods help make sense of a vast and rapidly COVID-19. However, the literature about coronaviruses growing body of COVID-19 related literature. The such as SARS-CoV-2 is vast and difficult main contributions of this paper are as follows: to sift through. This paper describes an attempt to organize existing literature on coronaviruses, - A diverse dataset of COVID-19 related scientific other pandemics, and early research on the current literature is compiled, consisting of COVID-19 outbreak in response to the call to articles with full-text available drawn from action issued by the White House Office of Science several online collections.

Vendrow, Joshua, Haddock, Jamie, Needell, Deanna, Johnson, Lorraine

Lyme disease is a rapidly growing illness that remains poorly understood within the medical community. Critical questions about when and why patients respond to treatment or stay ill, what kinds of treatments are effective, and even how to properly diagnose the disease remain largely unanswered. We investigate these questions by applying machine learning techniques to a large scale Lyme disease patient registry, MyLymeData, developed by the nonprofit LymeDisease.org. We apply various machine learning methods in order to measure the effect of individual features in predicting participants' answers to the Global Rating of Change (GROC) survey questions that assess the self-reported degree to which their condition improved, worsened, or remained unchanged following antibiotic treatment. We use basic linear regression, support vector machines, neural networks, entropy-based decision tree models, and $k$-nearest neighbors approaches. We first analyze the general performance of the model and then identify the most important features for predicting participant answers to GROC. After we identify the "key" features, we separate them from the dataset and demonstrate the effectiveness of these features at identifying GROC. In doing so, we highlight possible directions for future study both mathematically and clinically.

Needell, Deanna, Nelson, Aaron A., Saab, Rayan, Salanevich, Palina

The learning speed of feed-forward neural networks is notoriously slow and has presented a bottleneck in deep learning applications for several decades. For instance, gradient-based learning algorithms, which are used extensively to train neural networks, tend to work slowly when all of the network parameters must be iteratively tuned. To counter this, both researchers and practitioners have tried introducing randomness to reduce the learning requirement. Based on the original construction of Igelnik and Pao, single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice, but the necessary theoretical justification is lacking. In this paper, we begin to fill this theoretical gap. We provide a (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with approximation error decaying asymptotically like $O(1/\sqrt{n})$ for the number $n$ of network nodes. We then extend this result to the non-asymptotic setting, proving that one can achieve any desired approximation error with high probability provided $n$ is sufficiently large. We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space, providing theoretical guarantees in both the asymptotic and non-asymptotic forms. Finally, we illustrate our results on manifolds with numerical experiments.

Lyu, Hanbaek, Strohmeier, Christopher, Menz, Georg, Needell, Deanna

Predicting the spread and containment of COVID-19 is a challenge of utmost importance that the broader scientific community is currently facing. One of the main sources of difficulty is that a very limited amount of daily COVID-19 case data is available, and with few exceptions, the majority of countries are currently in the "exponential spread stage," and thus there is scarce information available which would enable one to predict the phase transition between spread and containment. In this paper, we propose a novel approach to predicting the spread of COVID-19 based on dictionary learning and online nonnegative matrix factorization (online NMF). The key idea is to learn dictionary patterns of short evolution instances of the new daily cases in multiple countries at the same time, so that their latent correlation structures are captured in the dictionary patterns. We first learn such patterns by minibatch learning from the entire time-series and then further adapt them to the time-series by online NMF. As we progressively adapt and improve the learned dictionary patterns to the more recent observations, we also use them to make one-step predictions by the partial fitting. Lastly, by recursively applying the one-step predictions, we can extrapolate our predictions into the near future. Our prediction results can be directly attributed to the learned dictionary patterns due to their interpretability.

Needell, Deanna, Ward, Rachel, Srebro, Nati

We improve a recent gurantee of Bach and Moulines on the linear convergence of SGD for smooth and strongly convex objectives, reducing a quadratic dependence on the strong convexity to a linear dependence. Furthermore, we show how reweighting the sampling distribution (i.e. Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods. Papers published at the Neural Information Processing Systems Conference.

Ahn, Miju, Eikmeier, Nicole, Haddock, Jamie, Kassab, Lara, Kryshchenko, Alona, Leonard, Kathryn, Needell, Deanna, Madushani, R. W. M. A., Sizikova, Elena, Wang, Chuntian

There is currently an unprecedented demand for large-scale temporal data analysis due to the explosive growth of data. Dynamic topic modeling has been widely used in social and data sciences with the goal of learning latent topics that emerge, evolve, and fade over time. Previous work on dynamic topic modeling primarily employ the method of nonnegative matrix factorization (NMF), where slices of the data tensor are each factorized into the product of lower-dimensional nonnegative matrices. With this approach, however, information contained in the temporal dimension of the data is often neglected or underutilized. To overcome this issue, we propose instead adopting the method of nonnegative CANDECOMP/PARAPAC (CP) tensor decomposition (NNCPD), where the data tensor is directly decomposed into a minimal sum of outer products of nonnegative vectors, thereby preserving the temporal information. The viability of NNCPD is demonstrated through application to both synthetic and real data, where significantly improved results are obtained compared to those of typical NMF-based methods. The advantages of NNCPD over such approaches are studied and discussed. To the best of our knowledge, this is the first time that NNCPD has been utilized for the purpose of dynamic topic modeling, and our findings will be transformative for both applications and further developments.

Guo, Yuchen, Hanoian, Nicholas, Lin, Zhexiao, Liskij, Nicholas, Lyu, Hanbaek, Needell, Deanna, Qu, Jiahao, Sojico, Henry, Wang, Yuliang, Xiong, Zhe, Zou, Zhenhong

After learning topic vectors from an auxiliary text corpus via NMF, the decoder is trained so that it is more likely to sample response words from the most correlated topic vectors. One of the main advantages in our architecture is that the user can easily switch the NMF-learned topic vectors so that the chatbot obtains desired topic-awareness. We demonstrate our model by training on a single conversational data set which is then augmented with topic matrices learned from different auxiliary data sets. We show that our topic-aware chatbot not only outperforms the non-topic counterpart, but also that each topic-aware model qualitatively and contextually gives the most relevant answer depending on the topic of question. Another area where deep learning algorithms have been successfully applied is sequence learning, which aims at understanding the structure of sequential data such as language, musical notes, and videos. One example of an application of deep learning in language modeling is conversational chatbots . A chatbot is a program that conducts a conversation with a user by simulating one side of it. Chatbots receive inputs from a user one message, or question, at a time, and then form a response that is sent back to the user. One of the most widely used machine learning techniques for sequence learning is Recurrent Neural Networks (RNN).

Lyu, Hanbaek, Needell, Deanna, Balzano, Laura

Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems, giving an approximate representation of complex data sets in terms of a reduced number of extracted features. Convergence guarantees for most of the OMF algorithms in the literature assume independence between data matrices, and the case of a dependent data stream remains largely unexplored. In this paper, we show that the well-known OMF algorithm for i.i.d. Furthermore, we extend the convergence result to the case when we can only approximately solve each step of the optimization problems in the algorithm. For applications, we demonstrate dictionary learning from a sequence of images generated by a Markov Chain Monte Carlo (MCMC) sampler. Lastly, by combining online nonnegative matrix factorization and a recent MCMC algorithm for sampling motifs from networks, we propose a novel framework of Network Dictionary Learning, which extracts'network dictionary patches' from a given network in an online manner that encodes main features of the network. We demonstrate this technique on real-world text data. I NTRODUCTION In modern data analysis, a central step is to find a low-dimensional representation to better understand, compress, or convey the key phenomena captured in the data. Matrix factorization provides a powerful setting for one to describe data in terms of a linear combination of factors or atoms. In this setting, we have a data matrix X R d n, and we seek a factorization of X into the product W H for W R d r and H R r n . This problem has gone by many names over the decades, each with different constraints: dictionary learning, factor analysis, topic modeling, component analysis. It has applications in text analysis, image reconstruction, medical imaging, bioinformatics, and many other scientific fields more generally [SGH02, BB05, BBL 07, CWS 11, TN12, BMB 15, RPZ 18]. Each column of the data matrix is approximated by a linear combination of the columns of the dictionary matrix. Online matrix factorization is a problem setting where data are accessed in a streaming manner and the matrix factors should be updated each time. That is, we get draws of X from some distribution π and seek the best factorization such that the expected loss E X πnull null X W H null 2 F null is small. This is a relevant setting in today' s data world, where large companies, scientific instruments, and healthcare systems are collecting massive amounts of data every day . One cannot compute with the entire 1 arXiv:1911.01931v3 There are several algorithms for computing factorizations of various kinds in an online context. Many of them have algorithmic convergence guarantees, however, all these guarantees require that data are sampled at each iteration i.i.d. with respect to previous iterations. In all of the application examples mentioned above, one may make an argument for (nearly) identical distributions, but never for independence.

Grotheer, Rachel, Li, Shuang, Ma, Anna, Needell, Deanna, Qin, Jing

Recovery of low-rank matrices from a small number of linear measurements is now well-known to be possible under various model assumptions on the measurements. Such results demonstrate robustness and are backed with provable theoretical guarantees. However, extensions to tensor recovery have only recently began to be studied and developed, despite an abundance of practical tensor applications. Recently, a tensor variant of the Iterative Hard Thresholding method was proposed and theoretical results were obtained that guarantee exact recovery of tensors with low Tucker rank. In this paper, we utilize the same tensor version of the Restricted Isometry Property (RIP) to extend these results for tensors with low CANDECOMP/PARAFAC (CP) rank. In doing so, we leverage recent results on efficient approximations of CP decompositions that remove the need for challenging assumptions in prior works. We complement our theoretical findings with empirical results that showcase the potential of the approach.