Directed Networks
Comparison of Deep Neural Networks and Deep Hierarchical Models for Spatio-Temporal Data
Spatio-temporal data are ubiquitous in the agricultural, ecological, and environmental sciences, and their study is important for understanding and predicting a wide variety of processes. One of the difficulties with modeling spatial processes that change in time is the complexity of the dependence structures that must describe how such a process varies, and the presence of high-dimensional complex data sets and large prediction domains. It is particularly challenging to specify parameterizations for nonlinear dynamic spatio-temporal models (DSTMs) that are simultaneously useful scientifically and efficient computationally. Statisticians have developed deep hierarchical models that can accommodate process complexity as well as the uncertainties in the predictions and inference. However, these models can be expensive and are typically application specific. On the other hand, the machine learning community has developed alternative "deep learning" approaches for nonlinear spatio-temporal modeling. These models are flexible yet are typically not implemented in a probabilistic framework. The two paradigms have many things in common and suggest hybrid approaches that can benefit from elements of each framework. This overview paper presents a brief introduction to the deep hierarchical DSTM (DH-DSTM) framework, and deep models in machine learning, culminating with the deep neural DSTM (DN-DSTM). Recent approaches that combine elements from DH-DSTMs and echo state network DN-DSTMs are presented as illustrations.
Web Links Prediction And Category-Wise Recommendation Based On Browser History
Shawon, Ashadullah, Zuhori, Syed Tauhid, Mahmud, Firoz, Rahman, Md. Jamil-Ur
A web browser should not be only for browsing web pages but also help users to find out their target websites and recommend similar type websites based on their behavior. Throughout this paper, we propose two methods to make a web browser more intelligent about link prediction which works during typing on address-bar and recommendation of websites according to several categories. Our proposed link prediction system is actually frecency prediction which is predicted based on the first visit, last visit and URL counts. But recommend system is the most challenging as it is needed to classify web URLs according to names without visiting web pages. So we use existing model for URL classification. The only existing approach gives unsatisfactory results and low accuracy. So we add hyperparameter optimization with an existing approach that finds the best parameters for existing URL classification model and gives better accuracy. In this paper, we propose a category wise recommendation system using frecency value and the total visit of individual URL category.
UQ-CHI: An Uncertainty Quantification-Based Contemporaneous Health Index for Degenerative Disease Monitoring
Developing knowledge-driven contemporaneous health index (CHI) that can precisely reflect the underlying patient across the course of the condition's progression holds a unique value, like facilitating a range of clinical decision-making opportunities. This is particularly important for monitoring degenerative condition such as Alzheimer's disease (AD), where the condition of the patient will decay over time. Detecting early symptoms and progression sign, and continuous severity evaluation, are all essential for disease management. While a few methods have been developed in the literature, uncertainty quantification of those health index models has been largely neglected. To ensure the continuity of the care, we should be more explicit about the level of confidence in model outputs. Ideally, decision-makers should be provided with recommendations that are robust in the face of substantial uncertainty about future outcomes. In this paper, we aim at filling this gap by developing an uncertainty quantification based contemporaneous longitudinal index, named UQ-CHI, with a particular focus on continuous patient monitoring of degenerative conditions. Our method is to combine convex optimization and Bayesian learning using the maximum entropy learning (MEL) framework, integrating uncertainty on labels as well. Our methodology also provides closed-form solutions in some important decision making tasks, e.g., such as predicting the label of a new sample. Numerical studies demonstrate the effectiveness of the propose UQ-CHI method in prediction accuracy, monitoring efficacy, and unique advantages if uncertainty quantification is enabled practice.
Stacking with Neural network for Cryptocurrency investment
Barnwal, Avinash, Bharti, Haripad, Ali, Aasim, Singh, Vishal
Predicting the direction of assets have been an active area of study and a difficult task. Machine learning models have been used to build robust models to model the above task. Ensemble methods is one of them showing results better than a single supervised method. In this paper, we have used generative and discriminative classifiers to create the stack, particularly 3 generative and 9 discriminative classifiers and optimized over one-layer Neural Network to model the direction of price cryptocurrencies. Features used are technical indicators used are not limited to trend, momentum, volume, volatility indicators, and sentiment analysis has also been used to gain useful insight combined with the above features. For Cross-validation, Purged Walk forward cross-validation has been used. In terms of accuracy, we have done a comparative analysis of the performance of Ensemble method with Stacking and Ensemble method with blending. We have also developed a methodology for combined features importance for the stacked model. Important indicators are also identified based on feature importance.
Learning to Generalize from Sparse and Underspecified Rewards
Agarwal, Rishabh, Liang, Chen, Schuurmans, Dale, Norouzi, Mohammad
We consider the problem of learning from sparse and underspecified rewards, where an agent receives a complex input, such as a natural language instruction, and needs to generate a complex response, such as an action sequence, while only receiving binary success-failure feedback. Such success-failure rewards are often underspecified: they do not distinguish between purposeful and accidental success. Generalization from underspecified rewards hinges on discounting spurious trajectories that attain accidental success, while learning from sparse feedback requires effective exploration. We address exploration by using a mode covering direction of KL divergence to collect a diverse set of successful trajectories, followed by a mode seeking KL divergence to train a robust policy. We propose Meta Reward Learning (MeRL) to construct an auxiliary reward function that provides more refined feedback for learning. The parameters of the auxiliary reward function are optimized with respect to the validation performance of a trained policy. The MeRL approach outperforms our alternative reward learning technique based on Bayesian Optimization, and achieves the state-of-the-art on weakly-supervised semantic parsing. It improves previous work by 1.2% and 2.4% on WikiTableQuestions and WikiSQL datasets respectively.
On the consistency of supervised learning with missing values
Josse, Julie, Prost, Nicolas, Scornet, Erwan, Varoquaux, Gaรซl
In many application settings, the data are plagued with missing features. These hinder data analysis. An abundant literature addresses missing values in an inferential framework, where the aim is to estimate parameters and their variance from incomplete tables. Here, we consider supervised-learning settings where the objective is to best predict a target when missing values appear in both training and test sets. We analyze which missing-values strategies lead to good prediction. We show the consistency of two approaches to estimating the prediction function. The most striking one shows that the widely-used mean imputation prior to learning method is consistent when missing values are not informative. This is in contrast with inferential settings as mean imputation is known to have serious drawbacks in terms of deformation of the joint and marginal distribution of the data. That such a simple approach can be consistent has important consequences in practice. This result holds asymptotically when the learning algorithm is consistent in itself. We contribute additional analysis on decision trees as they can naturally tackle empirical risk minimization with missing values. This is due to their ability to handle the half-discrete nature of variables with missing values. After comparing theoretically and empirically different missing-values strategies in trees, we recommend using the missing incorporated in attributes method as it can handle both non-informative and informative missing values.
Classifying textual data: shallow, deep and ensemble methods
Anderlucci, Laura, Guastadisegni, Lucia, Viroli, Cinzia
Nowadays the increasing and rapid progress of technology and the availability of electronic documents from a variety of sources have made a huge amount of textual data available. Hence, one of the prominent research topics of statistical andmachine learning communities is to provide suitable and feasible methods to extract high-quality information from unstructured textual data (Lata and Loar, 2018) for the different purposes of clustering, classification and document retrieval (Khan et al., 2010). This work originates from an empirical problem of classification of the content ofcalls made to the customer service of an important mobile phone company inItaly. The received calls are written down by an operator and classified into relevant classes (e.g.
Is a single unique Bayesian network enough to accurately represent your data?
Kratzer, Gilles, Furrer, Reinhard
Bayesian network (BN) modelling is extensively used in systems epidemiology. Usually it consists in selecting and reporting the best-fitting structure conditional to the data. A major practical concern is avoiding overfitting, on account of its extreme flexibility and its modelling richness. Many approaches have been proposed to control for overfitting. Unfortunately, they essentially all rely on very crude decisions that result in too simplistic approaches for such complex systems. In practice, with limited data sampled from complex system, this approach seems too simplistic. An alternative would be to use the Monte Carlo Markov chain model choice (MC3) over the network to learn the landscape of reasonably supported networks, and then to present all possible arcs with their MCMC support. This paper presents an R implementation, called mcmcabn, of a flexible structural MC3 that is accessible to non-specialists.
A Unifying Bayesian View of Continual Learning
Farquhar, Sebastian, Gal, Yarin
Some machine learning applications require continual learning - where data comes in a sequence of datasets, each is used for training and then permanently discarded. From a Bayesian perspective, continual learning seems straightforward: Given the model posterior one would simply use this as the prior for the next task. However, exact posterior evaluation is intractable with many models, especially with Bayesian neural networks (BNNs). Instead, posterior approximations are often sought. Unfortunately, when posterior approximations are used, prior-focused approaches do not succeed in evaluations designed to capture properties of realistic continual learning use cases. As an alternative to prior-focused methods, we introduce a new approximate Bayesian derivation of the continual learning loss. Our loss does not rely on the posterior from earlier tasks, and instead adapts the model itself by changing the likelihood term. We call these approaches likelihood-focused. We then combine prior- and likelihood-focused methods into one objective, tying the two views together under a single unifying framework of approximate Bayesian continual learning.
Estimating Buildings' Parameters over Time Including Prior Knowledge
Pathak, Nilavra, Foulds, James, Roy, Nirmalya, Banerjee, Nilanjan, Robucci, Ryan
Modeling buildings' heat dynamics is a complex process which depends on various factors including weather, building thermal capacity, insulation preservation, and residents' behavior. Gray-box models offer a causal inference of those dynamics expressed in few parameters specific to built environments. These parameters can provide compelling insights into the characteristics of building artifacts and have various applications such as forecasting HVAC usage, indoor temperature control monitoring of built environments, etc. In this paper, we present a systematic study of modeling buildings' thermal characteristics and thus derive the parameters of built conditions with a Bayesian approach. We build a Bayesian state-space model that can adapt and incorporate buildings' thermal equations and propose a generalized solution that can easily adapt prior knowledge regarding the parameters. We show that a faster approximate approach using variational inference for parameter estimation can provide similar parameters as that of a more time-consuming Markov Chain Monte Carlo (MCMC) approach. We perform extensive evaluations on two datasets to understand the generative process and show that the Bayesian approach is more interpretable. We further study the effects of prior selection for the model parameters and transfer learning, where we learn parameters from one season and use them to fit the model in the other. We perform extensive evaluations on controlled and real data traces to enumerate buildings' parameter within a 95% credible interval.