AITopics | Jung, Alexander

Collaborating Authors

Jung, Alexander

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Structural Feature Selection for Event Logs

Hinkka, Markku, Lehto, Teemu, Heljanko, Keijo, Jung, Alexander

arXiv.org Machine LearningOct-8-2017

We consider the problem of classifying business process instances based on structural features derived from event logs. The main motivation is to provide machine learning based techniques with quick response times for interactive computer assisted root cause analysis. In particular, we create structural features from process mining such as activity and transition occurrence counts, and ordering of activities to be evaluated as potential features for classification. We show that adding such structural features increases the amount of information thus potentially increasing classification accuracy. However, there is an inherent trade-off as using too many features leads to too long run-times for machine learning classification models. One way to improve the machine learning algorithms' run-time is to only select a small number of features by a feature selection algorithm. However, the run-time required by the feature selection algorithm must also be taken into account. Also, the classification accuracy should not suffer too much from the feature selection. The main contributions of this paper are as follows: First, we propose and compare six different feature selection algorithms by means of an experimental setup comparing their classification accuracy and achievable response times. Second, we discuss the potential use of feature selection results for computer assisted root cause analysis as well as the properties of different types of structural features in the context of feature selection.

algorithm, artificial intelligence, health & medicine, (18 more...)

arXiv.org Machine Learning

doi: 10.1007/978-3-319-74030-0_2

1710.02823

Country:

Europe (1.00)
South America > Brazil > Rio de Janeiro (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.75)

Add feedback

Recovery Conditions and Sampling Strategies for Network Lasso

Mara, Alexandru, Jung, Alexander

arXiv.org Machine LearningSep-3-2017

The network Lasso is a recently proposed convex optimization method for machine learning from massive network structured datasets, i.e., big data over networks. It is a variant of the well-known least absolute shrinkage and selection operator (Lasso), which is underlying many methods in learning and signal processing involving sparse models. Highly scalable implementations of the network Lasso can be obtained by state-of-the art proximal methods, e.g., the alternating direction method of multipliers (ADMM). By generalizing the concept of the compatibility condition put forward by van de Geer and Buehlmann as a powerful tool for the analysis of plain Lasso, we derive a sufficient condition, i.e., the network compatibility condition, on the underlying network topology such that network Lasso accurately learns a clustered underlying graph signal. This network compatibility condition relates the location of the sampled nodes with the clustering structure of the network. In particular, the NCC informs the choice of which nodes to sample, or in machine learning terms, which data points provide most information if labeled.

artificial intelligence, graph signal, optimization problem, (17 more...)

arXiv.org Machine Learning

1709.01402

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.35)

Add feedback

A Fixed-Point of View on Gradient Methods for Big Data

Jung, Alexander

arXiv.org Machine LearningAug-15-2017

Interpreting gradient methods as fixed-point iterations, we provide a detailed analysis of those methods for minimizing convex objective functions. Due to their conceptual and algorithmic simplicity, gradient methods are widely used in machine learning for massive data sets (big data). In particular, stochastic gradient methods are considered the de- facto standard for training deep neural networks. Studying gradient methods within the realm of fixed-point theory provides us with powerful tools to analyze their convergence properties. In particular, gradient methods using inexact or noisy gradients, such as stochastic gradient descent, can be studied conveniently using well-known results on inexact fixed-point iterations. Moreover, as we demonstrate in this paper, the fixed-point approach allows an elegant derivation of accelerations for basic gradient methods. In particular, we will show how gradient descent can be accelerated by a fixed-point preserving transformation of an operator associated with the objective function.

artificial intelligence, iteration, machine learning, (16 more...)

arXiv.org Machine Learning

1706.0988

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Add feedback

Semi-Supervised Learning via Sparse Label Propagation

Jung, Alexander, Hero, Alfred O. III, Mara, Alexandru, Jahromi, Saeed

arXiv.org Machine LearningMay-15-2017

This work proposes a novel method for semi-supervised learning from partially labeled massive network-structured datasets, i.e., big data over networks. We model the underlying hypothesis, which relates data points to labels, as a graph signal, defined over some graph (network) structure intrinsic to the dataset. Following the key principle of supervised learning, i.e., similar inputs yield similar outputs, we require the graph signals induced by labels to have small total variation. Accordingly, we formulate the problem of learning the labels of data points as a non-smooth convex optimization problem which amounts to balancing between the empirical loss, i.e., the discrepancy with some partially available label information, and the smoothness quantified by the total variation of the learned graph signal. We solve this optimization problem by appealing to a recently proposed preconditioned variant of the popular primal-dual method by Pock and Chambolle, which results in a sparse label propagation algorithm. This learning algorithm allows for a highly scalable implementation as message passing over the underlying data graph. By applying concepts of compressed sensing to the learning problem, we are also able to provide a transparent sufficient condition on the underlying network structure such that accurate learning of the labels is possible. We also present an implementation of the message passing formulation allows for a highly scalable implementation in big data frameworks.

graph signal, inductive learning, optimization problem, (14 more...)

arXiv.org Machine Learning

1612.01414

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

On the Sample Complexity of Graphical Model Selection for Non-Stationary Processes

Quang, Nguyen Tran, Jung, Alexander

arXiv.org Machine LearningJan-24-2017

We formulate and analyze a graphical model selection method for inferring the conditional independence graph of a high-dimensional non-stationary Gaussian random process (time series) from a finite-length observation. The observed process samples are assumed uncorrelated over time but having different covariance matrices. We characterize the sample complexity of graphical model selection for such processes by analyzing a particular selection method, which is based on sparse neighborhood regression. Our results indicate, similar to the case of i.i.d. samples, accurate GMS is possible even in the high- dimensional regime if the underlying conditional independence graph is sufficiently sparse.

artificial intelligence, graphical model selection, machine learning, (12 more...)

arXiv.org Machine Learning

1701.04724

Country:

Europe > United Kingdom > Wales (0.14)
Europe > Portugal (0.14)
Europe > Italy (0.14)
Asia > Taiwan (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Learning conditional independence structure for high-dimensional uncorrelated vector processes

Quang, Nguyen Tran, Jung, Alexander

arXiv.org Machine LearningSep-13-2016

We formulate and analyze a graphical model selection method for inferring the conditional independence graph of a high-dimensional nonstationary Gaussian random process (time series) from a finite-length observation. The observed process samples are assumed uncorrelated over time and having a time-varying marginal distribution. The selection method is based on testing conditional variances obtained for small subsets of process components. This allows to cope with the high-dimensional regime, where the sample size can be (drastically) smaller than the process dimension. We characterize the required sample size such that the proposed selection method is successful with high probability.

artificial intelligence, machine learning, time series, (14 more...)

arXiv.org Machine Learning

1609.03772

Country:

Europe > Portugal (0.14)
Europe > Italy (0.14)
Europe > France (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

On the Minimax Risk of Dictionary Learning

Jung, Alexander, Eldar, Yonina C., Görtz, Norbert

arXiv.org Machine LearningJul-20-2015

We consider the problem of learning a dictionary matrix from a number of observed signals, which are assumed to be generated via a linear model with a common underlying dictionary. In particular, we derive lower bounds on the minimum achievable worst case mean squared error (MSE), regardless of computational complexity of the dictionary learning (DL) schemes. By casting DL as a classical (or frequentist) estimation problem, the lower bounds on the worst case MSE are derived by following an established information-theoretic approach to minimax estimation. The main conceptual contribution of this paper is the adaption of the information-theoretic approach to minimax estimation for the DL problem in order to derive lower bounds on the worst case MSE of any DL scheme. We derive three different lower bounds applying to different generative models for the observed signals. The first bound applies to a wide range of models, it only requires the existence of a covariance matrix of the (unknown) underlying coefficient vector. By specializing this bound to the case of sparse coefficient distributions, and assuming the true dictionary satisfies the restricted isometry property, we obtain a lower bound on the worst case MSE of DL schemes in terms of a signal to noise ratio (SNR). The third bound applies to a more restrictive subclass of coefficient distributions by requiring the non-zero coefficients to be Gaussian. While, compared with the previous two bounds, the applicability of this final bound is the most limited it is the tightest of the three bounds in the low SNR regime.

artificial intelligence, machine learning, minimax risk, (16 more...)

arXiv.org Machine Learning

1507.05498

Country: North America > United States > New Jersey (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Learning the Conditional Independence Structure of Stationary Time Series: A Multitask Learning Approach

Jung, Alexander

arXiv.org Machine LearningJan-11-2015

We propose a method for inferring the conditional independence graph (CIG) of a high-dimensional Gaussian vector time series (discrete-time process) from a finite-length observation. By contrast to existing approaches, we do not rely on a parametric process model (such as, e.g., an autoregressive model) for the observed random process. Instead, we only require certain smoothness properties (in the Fourier domain) of the process. The proposed inference scheme works even for sample sizes much smaller than the number of scalar process components if the true underlying CIG is sufficiently sparse. A theoretical performance analysis provides conditions which guarantee that the probability of the proposed inference method to deliver a wrong CIG is below a prescribed value. These conditions imply lower bounds on the sample size such that the new method is consistent asymptotically. Some numerical experiments validate our theoretical performance analysis and demonstrate superior performance of our scheme compared to an existing (parametric) approach in case of model mismatch.

artificial intelligence, health & medicine, time series, (19 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2015.2460219

1404.1361

Country:

North America > United States (0.67)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Graphical LASSO Based Model Selection for Time Series

Jung, Alexander, Hannak, Gabor, Görtz, Norbert

arXiv.org Machine LearningOct-28-2014

We propose a novel graphical model selection (GMS) scheme for high-dimensional stationary time series or discrete time process. The method is based on a natural generalization of the graphical LASSO (gLASSO), introduced originally for GMS based on i.i.d. samples, and estimates the conditional independence graph (CIG) of a time series from a finite length observation. The gLASSO for time series is defined as the solution of an l1-regularized maximum (approximate) likelihood problem. We solve this optimization problem using the alternating direction method of multipliers (ADMM). Our approach is nonparametric as we do not assume a finite dimensional (e.g., an autoregressive) parametric model for the observed process. Instead, we require the process to be sufficiently smooth in the spectral domain. For Gaussian processes, we characterize the performance of our method theoretically by deriving an upper bound on the probability that our algorithm fails to correctly identify the CIG. Numerical experiments demonstrate the ability of our method to recover the correct CIG from a limited amount of samples.

artificial intelligence, machine learning, time series, (16 more...)

arXiv.org Machine Learning

1410.1184

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Performance Limits of Dictionary Learning for Sparse Coding

Jung, Alexander, Eldar, Yonina C., Görtz, Norbert

arXiv.org Machine LearningJun-27-2014

We consider the problem of dictionary learning under the assumption that the observed signals can be represented as sparse linear combinations of the columns of a single large dictionary matrix. In particular, we analyze the minimax risk of the dictionary learning problem which governs the mean squared error (MSE) performance of any learning scheme, regardless of its computational complexity. By following an established information-theoretic method based on Fanos inequality, we derive a lower bound on the minimax risk for a given dictionary learning problem. This lower bound yields a characterization of the sample-complexity, i.e., a lower bound on the required number of observations such that consistent dictionary learning schemes exist. Our bounds may be compared with the performance of a given learning scheme, allowing to characterize how far the method is from optimal performance.

artificial intelligence, data mining, dictionary learning problem, (16 more...)

arXiv.org Machine Learning

1402.4078

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Education > Focused Education > Special Education (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback