AITopics

In this paper, we propose a novel sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for the purpose of multimodal Bayesian learning. It simulates a noisy dynamical system by incorporating both a continuously-varying tempering variable and the Nos\'e-Hoover thermostats. A significant benefit is that it is not only able to efficiently generate i.i.d. samples when the underlying posterior distributions are multimodal, but also capable of adaptively neutralising the noise arising from the use of mini-batches. While the properties of the approach have been studied using synthetic datasets, our experiments on three real datasets have also shown its performance gains over several strong baselines for Bayesian learning with various types of neural networks plunged in.

artificial intelligence, bayesian inference, machine learning, (18 more...)

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.46)

Learning Versatile Filters for Efficient Convolutional Neural Networks

Wang, Yunhe, Xu, Chang, XU, Chunjing, Xu, Chao, Tao, Dacheng

This paper introduces versatile filters to construct efficient convolutional neural network. Considering the demands of efficient deep learning techniques running on cost-effective hardware, a number of methods have been developed to learn compact neural networks. Most of these works aim to slim down filters in different ways, e.g., investigating small, sparse or binarized filters. In contrast, we treat filters from an additive perspective. A series of secondary filters can be derived from a primary filter. These secondary filters all inherit in the primary filter without occupying more storage, but once been unfolded in computation they could significantly enhance the capability of the filter by integrating information extracted from different receptive fields. Besides spatial versatile filters, we additionally investigate versatile filters from the channel perspective. The new techniques are general to upgrade filters in existing CNNs. Experimental results on benchmark datasets and neural networks demonstrate that CNNs constructed with our versatile filters are able to achieve comparable accuracy as that of original filters, but require less memory and FLOPs.

artificial intelligence, convolution filter, machine learning, (17 more...)

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Frecon, Jordan, Salzo, Saverio, Pontil, Massimiliano

Bilevel learning of the Group Lasso structure

Regression with group-sparsity penalty plays a central role in high-dimensional prediction problems. However, most existing methods require the group structure to be known a priori. In practice, this may be a too strong assumption, potentially hampering the effectiveness of the regularization method. To circumvent this issue, we present a method to estimate the group structure by means of a continuous bilevel optimization problem where the data is split into training and validation sets. Our approach relies on an approximation scheme where the lower level problem is replaced by a smooth dual forward-backward algorithm with Bregman distances. We provide guarantees regarding the convergence of the approximate procedure to the exact problem and demonstrate the well behaviour of the proposed method on synthetic experiments. Finally, a preliminary application to genes expression data is tackled with the purpose of unveiling functional groups.

bioinformatics, machine learning, problem 2, (15 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York (0.04)
(7 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Mukherjee, Soumendu Sundar, Sarkar, Purnamrita, Wang, Y. X. Rachel, Yan, Bowei

Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues

Variational approximation has been widely used in large-scale Bayesian inference recently, the simplest kind of which involves imposing a mean field assumption to approximate complicated latent structures. Despite the computational scalability of mean field, theoretical studies of its loss function surface and the convergence behavior of iterative updates for optimizing the loss are far from complete. In this paper, we focus on the problem of community detection for a simple two-class Stochastic Blockmodel (SBM). Using batch co-ordinate ascent (BCAVI) for updates, we give a complete characterization of all the critical points and show different convergence behaviors with respect to initializations. When the parameters are known, we show a significant proportion of random initializations will converge to ground truth. On the other hand, when the parameters themselves need to be estimated, a random initialization will converge to an uninformative local optimum.

artificial intelligence, initialization, machine learning, (13 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Asia > Middle East > Jordan (0.05)
Asia > India > West Bengal > Kolkata (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Variational Memory Encoder-Decoder

Le, Hung, Tran, Truyen, Nguyen, Thin, Venkatesh, Svetha

Introducing variability while maintaining coherence is a core task in learning to generate utterances in conversation. Standard neural encoder-decoder models and their extensions using conditional variational autoencoder often result in either trivial or digressive responses. To overcome this, we explore a novel approach that injects variability into neural encoder-decoder via the use of external memory as a mixture model, namely Variational Memory Encoder-Decoder (VMED). By associating each memory read with a mode in the latent mixture distribution at each timestep, our model can capture the variability observed in sequential data such as natural conversations. We empirically compare the proposed model against other recent approaches on various conversational datasets. The results show that VMED consistently achieves significant improvement over others in both metric-based and qualitative evaluations.

artificial intelligence, machine learning, natural language, (16 more...)

Country:

Oceania > Australia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Balaji, Yogesh, Sankaranarayanan, Swami, Chellappa, Rama

MetaReg: Towards Domain Generalization using Meta-Regularization

Training models that generalize to new domains at test time is a problem of fundamental importance in machine learning. In this work, we encode this notion of domain generalization using a novel regularization function. We pose the problem of finding such a regularization function in a Learning to Learn (or) meta-learning framework. The objective of domain generalization is explicitly modeled by learning a regularizer that makes the model trained on one domain to perform well on another domain. Experimental validations on computer vision and natural language datasets indicate that our method can learn regularizers that achieve good cross-domain generalization.

artificial intelligence, generalization, machine learning, (19 more...)

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(5 more...)

Industry: Government > Military (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Tang, Junqi, Golbabaee, Mohammad, Bach, Francis, davies, Mike E.

Rest-Katyusha: Exploiting the Solution's Structure via Scheduled Restart Schemes

We propose a structure-adaptive variant of a state-of-the-art stochastic variancereduced gradient algorithm Katyusha for regularized empirical risk minimization. The proposed method is able to exploit the intrinsic low-dimensional structure of the solution, such as sparsity or low rank which is enforced by a non-smooth regularization, to achieve even faster convergence rate. This provable algorithmic improvement is done by restarting the Katyusha algorithm according to restricted strong-convexity (RSC) constants. We also propose an adaptive-restart variant which is able to estimate the RSC on the fly and adjust the restart period automatically. We demonstrate the effectiveness of our approach via numerical experiments.

artificial intelligence, machine learning, rest-katyusha, (16 more...)

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Somerset > Bath (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units

Xu, Yixi, Wang, Xiao

This paper presents a general framework for norm-based capacity control for $L_{p,q}$ weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an $L_{p,q}$ normalization where $q\le p^*$ and $1/p+1/p^{*}=1$, we discuss properties of a width-independent capacity control, which only depends on the depth by a square root term. We further analyze the approximation properties of $L_{p,q}$ weight normalized deep neural networks. In particular, for an $L_{1,\infty}$ weight normalized network, the approximation error can be controlled by the $L_1$ norm of the output layer, and the corresponding generalization error only depends on the architecture by the square root of the depth.

artificial intelligence, machine learning, neural network, (14 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Halvani, Oren, Winter, Christian, Graner, Lukas

Unary and Binary Classification Approaches and their Implications for Authorship Verification

arXiv.org Machine LearningDec-31-2018

Retrieving indexed documents, not by their topical content but their writing style opens the door for a number of applications in information retrieval (IR). One application is to retrieve textual content of a certain author X, where the queried IR system is provided beforehand with a set of reference texts of X. Authorship verification (AV), which is a research subject in the field of digital text forensics, is suitable for this purpose. The task of AV is to determine if two documents (i.e. an indexed and a reference document) have been written by the same author X. Even though AV represents a unary classification problem, a number of existing approaches consider it as a binary classification task. However, the underlying classification model of an AV method has a number of serious implications regarding its prerequisites, evaluability, and applicability. In our comprehensive literature review, we observed several misunderstandings regarding the differentiation of unary and binary AV approaches that require consideration. The objective of this paper is, therefore, to clarify these by proposing clear criteria and new properties that aim to improve the characterization of existing and future AV approaches. Given both, we investigate the applicability of eleven existing unary and binary AV methods as well as four generic unary classification algorithms on two self-compiled corpora. Furthermore, we highlight an important issue concerning the evaluation of AV methods based on fixed decision criterions, which has not been paid attention in previous AV studies.

authorship verification, corpora, verification, (15 more...)

arXiv.org Machine Learning

1901.00399

Country:

Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > United States > New York > New York County > New York City (0.04)
(16 more...)

Genre:

Research Report (0.82)
Overview (0.66)
Instructional Material > Course Syllabus & Notes (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Gower, Robert, Hanzely, Filip, Richtarik, Peter, Stich, Sebastian U.

Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization

We present the first accelerated randomized algorithm for solving linear systems in Euclidean spaces. One essential problem of this type is the matrix inversion problem. In particular, our algorithm can be specialized to invert positive definite matrices in such a way that all iterates (approximate solutions) generated by the algorithm are positive definite matrices themselves. This opens the way for many applications in the field of optimization and machine learning. As an application of our general theory, we develop the first accelerated (deterministic and stochastic) quasi-Newton updates. Our updates lead to provably more aggressive approximations of the inverse Hessian, and lead to speed-ups over classical non-accelerated rules in numerical experiments. Experiments with empirical risk minimization show that our rules can accelerate training of machine learning models.

algorithm, matrix, quasi-newton method, (13 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Middle East > Saudi Arabia > Mecca Province > Thuwal (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(6 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)