Undirected Networks
Diffusion Approximations for Online Principal Component Estimation and Global Convergence
Li, Chris Junchi, Wang, Mengdi, Liu, Han, Zhang, Tong
In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja's iteration which is an online stochastic gradient descent method for the principal component analysis. Oja's iteration maintains a running estimate of the true principal component from streaming data and enjoys less temporal and spatial complexities. We show that the Oja's iteration for the top eigenvector generates a continuous-state discrete-time Markov chain over the unit sphere. We characterize the Oja's iteration in three phases using diffusion approximation and weak convergence tools. Our three-phase analysis further provides a finite-sample error bound for the running estimate, which matches the minimax information lower bound for principal component analysis under the additional assumption of bounded samples.
On Structured Prediction Theory with Calibrated Convex Surrogate Losses
Osokin, Anton, Bach, Francis, Lacoste-Julien, Simon
We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for structured prediction.
Restricted Boltzmann Machines for Robust and Fast Latent Truth Discovery
Broelemann, Klaus, Gottron, Thomas, Kasneci, Gjergji
We address the problem of latent truth discovery, LTD for short, where the goal is to discover the underlying true values of entity attributes in the presence of noisy, conflicting or incomplete information. Despite a multitude of algorithms to address the LTD problem that can be found in literature, only little is known about their overall performance with respect to effectiveness (in terms of truth discovery capabilities), efficiency and robustness. A practical LTD approach should satisfy all these characteristics so that it can be applied to heterogeneous datasets of varying quality and degrees of cleanliness. We propose a novel algorithm for LTD that satisfies the above requirements. The proposed model is based on Restricted Boltzmann Machines, thus coined LTD-RBM. In extensive experiments on various heterogeneous and publicly available datasets, LTD-RBM is superior to state-of-the-art LTD techniques in terms of an overall consideration of effectiveness, efficiency and robustness.
A dynamic network model with persistent links and node-specific latent variables, with an application to the interbank market
Mazzarisi, Piero, Barucca, Paolo, Lillo, Fabrizio, Tantari, Daniele
We propose a dynamic network model where two mechanisms control the probability of a link between two nodes: (i) the existence or absence of this link in the past, and (ii) node-specific latent variables (dynamic fitnesses) describing the propensity of each node to create links. Assuming a Markov dynamics for both mechanisms, we propose an Expectation-Maximization algorithm for model estimation and inference of the latent variables. The estimated parameters and fitnesses can be used to forecast the presence of a link in the future. We apply our methodology to the e-MID interbank network for which the two linkage mechanisms are associated with two different trading behaviors in the process of network formation, namely preferential trading and trading driven by node-specific characteristics. The empirical results allow to recognise preferential lending in the interbank market and indicate how a method that does not account for time-varying network topologies tends to overestimate preferential linkage.
Modular proximal optimization for multidimensional total-variation regularization
We study \emph{TV regularization}, a widely used technique for eliciting structured sparsity. In particular, we propose efficient algorithms for computing prox-operators for $\ell_p$-norm TV. The most important among these is $\ell_1$-norm TV, for whose prox-operator we present a new geometric analysis which unveils a hitherto unknown connection to taut-string methods. This connection turns out to be remarkably useful as it shows how our geometry guided implementation results in efficient weighted and unweighted 1D-TV solvers, surpassing state-of-the-art methods. Our 1D-TV solvers provide the backbone for building more complex (two or higher-dimensional) TV solvers within a modular proximal optimization approach. We review the literature for an array of methods exploiting this strategy, and illustrate the benefits of our modular design through extensive suite of experiments on (i) image denoising, (ii) image deconvolution, (iii) four variants of fused-lasso, and (iv) video denoising. To underscore our claims and permit easy reproducibility, we provide all the reviewed and our new TV solvers in an easy to use multi-threaded C++, Matlab and Python library.
Overview of Udacity Artificial Intelligence Engineer Nanodegree, Term 1
After finishing Udacity Deep Learning Foundation I felt that I got a good introduction to Deep Learning, but to understand things, I must dig deeper. Besides I had a guaranteed admission to Self-Driving Car Engineer, Artificial Intelligence, or Robotics Nanodegree programs. Before I turn to Udacity advanced courses, I want to mention one thing at the beginning. If I could give advice to myself, I would select another introduction course on Deep Learning -- Deep Learning Specialization by Andrew Ng. First of all, his way of mentoring is unique and he can explain complex things in most clear and understandable way.
Deep Architectures for Automated Seizure Detection in Scalp EEGs
Golmohammadi, Meysam, Ziyabari, Saeedeh, Shah, Vinit, de Diego, Silvia Lopez, Obeid, Iyad, Picone, Joseph
Automated seizure detection using clinical electroencephalograms is a challenging machine learning problem because the multichannel signal often has an extremely low signal to noise ratio. Events of interest such as seizures are easily confused with signal artifacts (e.g, eye movements) or benign variants (e.g., slowing). Commercially available systems suffer from unacceptably high false alarm rates. Deep learning algorithms that employ high dimensional models have not previously been effective due to the lack of big data resources. In this paper, we use the TUH EEG Seizure Corpus to evaluate a variety of hybrid deep structures including Convolutional Neural Networks and Long Short-Term Memory Networks. We introduce a novel recurrent convolutional architecture that delivers 30% sensitivity at 7 false alarms per 24 hours. We have also evaluated our system on a held-out evaluation set based on the Duke University Seizure Corpus and demonstrate that performance trends are similar to the TUH EEG Seizure Corpus. This is a significant finding because the Duke corpus was collected with different instrumentation and at different hospitals. Our work shows that deep learning architectures that integrate spatial and temporal contexts are critical to achieving state of the art performance and will enable a new generation of clinically-acceptable technology.
Automatic Analysis of EEGs Using Big Data and Hybrid Deep Learning Architectures
Golmohammadi, Meysam, Torbati, Amir Hossein Harati Nejad, de Diego, Silvia Lopez, Obeid, Iyad, Picone, Joseph
Objective: A clinical decision support tool that automatically interprets EEGs can reduce time to diagnosis and enhance real-time applications such as ICU monitoring. Clinicians have indicated that a sensitivity of 95% with a specificity below 5% was the minimum requirement for clinical acceptance. We propose a highperformance classification system based on principles of big data and machine learning. Methods: A hybrid machine learning system that uses hidden Markov models (HMM) for sequential decoding and deep learning networks for postprocessing is proposed. These algorithms were trained and evaluated using the TUH EEG Corpus, which is the world's largest publicly available database of clinical EEG data. Results: Our approach delivers a sensitivity above 90% while maintaining a specificity below 5%. This system detects three events of clinical interest: (1) spike and/or sharp waves, (2) periodic lateralized epileptiform discharges, (3) generalized periodic epileptiform discharges. It also detects three events used to model background noise: (1) artifacts, (2) eye movement (3) background. Conclusions: A hybrid HMM/deep learning system can deliver a low false alarm rate on EEG event detection, making automated analysis a viable option for clinicians. Significance: The TUH EEG Corpus enables application of highly data consumptive machine learning algorithms to EEG analysis. Performance is approaching clinical acceptance for real-time applications.
Behaviour Patterns with Machine Learning Techniques
Nowadays web-sites needs to handle huge amount of traffic. We can leverage that fact and capture user interactions with the application. Next, we can analyze users behavior and capture patterns on which we are able to react properly. In applications that needs to deal with huge amount of traffic it is very hard to detect anomalies. We'll learn how to apply clustering to find anomalies in web traffic.
Data-adaptive Active Sampling for Efficient Graph-Cognizant Classification
Berberidis, Dimitris, Giannakis, Georgios B.
The present work deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values - an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Such a strategy subsumes several measures of expected model change, including uncertainty sampling, variance minimization, and sampling based on the $\Sigma-$optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by taking into account the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state-of-the-art even at reduced runtime.