AITopics

Scientific experiments are usually expensive due to complex experimental preparation and processing. Experimental design is therefore involved with the task of finding the optimal experimental input that results in the desirable output by using as few experiments as possible. Experimenters can often acquire the knowledge about the location of the global optimum. However, they do not know how to exploit this knowledge to accelerate experimental design. In this paper, we adopt the technique of Bayesian optimization for experimental design since Bayesian optimization has established itself as an efficient tool for optimizing expensive black-box functions. Again, it is unknown how to incorporate the expert prior knowledge about the global optimum into Bayesian optimization process. To address it, we represent the expert knowledge about the global optimum via placing a prior distribution on it and we then derive its posterior distribution. An efficient Bayesian optimization approach has been proposed via posterior sampling on the posterior distribution of the global optimum. We theoretically analyze the convergence of the proposed algorithm and discuss the robustness of incorporating expert prior. We evaluate the efficiency of our algorithm by optimizing synthetic functions and tuning hyperparameters of classifiers along with a real-world experiment on the synthesis of short polymer fiber. The results clearly demonstrate the advantages of our proposed method.

algorithm, knowledge, optimization, (14 more...)

2002.11256

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Italy > Sardinia (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Han, Rungang, Willett, Rebecca, Zhang, Anru

An Optimal Statistical and Computational Framework for Generalized Tensor Estimation

This paper describes a flexible framework for generalized low-rank tensor estimation problems that includes many important instances arising from applications in computational imaging, genomics, and network analysis. The proposed estimator consists of finding a low-rank tensor fit to the data under generalized parametric models. To overcome the difficulty of non-convexity in these problems, we introduce a unified approach of projected gradient descent that adapts to the underlying low-rank structure. Under mild conditions on the loss function, we establish both an upper bound on statistical error and the linear rate of computational convergence through a general deterministic analysis. Then we further consider a suite of generalized tensor estimation problems, including sub-Gaussian tensor denoising, tensor regression, and Poisson and binomial tensor PCA. We prove that the proposed algorithm achieves the minimax optimal rate of convergence in estimation error. Finally, we demonstrate the superiority of the proposed framework via extensive experiments on both simulated and real data.

jkl, probability, tensor, (13 more...)

2002.11255

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Michigan (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.65)
Information Technology > Services (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Fan, Xuhui, Gaussier, Eric

Supervised Categorical Metric Learning with Schatten p-Norms

Metric learning has been successful in learning new metrics adapted to numerical datasets. However, its development on categorical data still needs further exploration. In this paper, we propose a method, called CPML for \emph{categorical projected metric learning}, that tries to efficiently~(i.e. less computational time and better prediction accuracy) address the problem of metric learning in categorical data. We make use of the Value Distance Metric to represent our data and propose new distances based on this representation. We then show how to efficiently learn new metrics. We also generalize several previous regularizers through the Schatten $p$-norm and provides a generalization bound for it that complements the standard generalization bound for metric learning. Experimental results show that our method provides

constraint, metric learning, weight 0, (14 more...)

2002.11246

Country:

Oceania > Australia (0.04)
Europe > United Kingdom > Wales (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Attacks Which Do Not Kill Training Make Adversarial Learning Stronger

Zhang, Jingfeng, Xu, Xilie, Han, Bo, Niu, Gang, Cui, Lizhen, Sugiyama, Masashi, Kankanhalli, Mohan

Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question---do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ confident adversarial data for updating the current model. We propose a novel approach of friendly adversarial training (FAT): rather than employing most adversarial data maximizing the loss, we search for least adversarial (i.e., friendly adversarial) data minimizing the loss, among the adversarial data that are confidently misclassified. Our novel formulation is easy to implement by just stopping the most adversarial data searching algorithms such as PGD (projected gradient descent) early, which we call early-stopped PGD. Theoretically, FAT is justified by an upper bound of the adversarial risk. Empirically, early-stopped PGD allows us to answer the earlier question negatively---adversarial robustness can indeed be achieved without compromising the natural generalization.

accuracy, adversarial data, test accuracy, (15 more...)

2002.11242

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(11 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Dabrowski, Joel Janek, de Villiers, Johan Pieter, Rahman, Ashfaqur, Beyers, Conrad

Deep Learning and Statistical Models for Time-Critical Pedestrian Behaviour Prediction

The time it takes for a classifier to make an accurate prediction can be crucial in many behaviour recognition problems. For example, an autonomous vehicle should detect hazardous pedestrian behaviour early enough for it to take appropriate measures. In this context, we compare the switching linear dynamical system (SLDS) and a three-layered bi-directional long short-term memory (LSTM) neural network, which are applied to infer pedestrian behaviour from motion tracks. We show that, though the neural network model achieves an accuracy of 80%, it requires long sequences to achieve this (100 samples or more). The SLDS, has a lower accuracy of 74%, but it achieves this result with short sequences (10 samples). To our knowledge, such a comparison on sequence length has not been considered in the literature before. The results provide a key intuition of the suitability of the models in time-critical problems.

deep learning and statistical model, prediction, sequence, (12 more...)

doi: 10.1007/978-3-030-36808-1_50

2002.11226

Country:

North America > Canada > Ontario > Toronto (0.14)
Africa > South Africa > Gauteng > Pretoria (0.05)
Oceania > Australia (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Laguel, Yassine, Pillutla, Krishna, Malick, Jérôme, Harchaoui, Zaid

Device Heterogeneity in Federated Learning: A Superquantile Approach

The proliferation of mobile phones, wearables and edge devices has led to an unprecedented growth in the generation of user interaction data. Systems which tap into the power of this rich data while respecting the privacy of users are geared to lead the next generation of intelligent applications and devices. The leading distributed learning framework in this setting is federated learning [30]. In federated learning, a number of client devices with privacy-sensitive data collaboratively learn a machine learning model under the orchestration of a central server, while keeping their data decentralized. This is achieved by pushing the actual computation to the devices while the server coordinates with the devices for aggregation of model updates.

communication round, local data, test misc, (15 more...)

2002.11223

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Ergen, Tolga, Pilanci, Mert

Convex Geometry and Duality of Over-parameterized Neural Networks

We develop a convex analytic framework for ReLU neural networks which elucidates the inner workings of hidden neurons and their function space characteristics. We show that neural networks with rectified linear units act as convex regularizers, where simple solutions are encouraged via extreme points of a certain convex set. For one dimensional regression and classification, as well as rank-one data matrices, we prove that finite two-layer ReLU networks with norm regularization yield linear spline interpolation. We characterize the classification decision regions in terms of a closed form kernel matrix and minimum L1 norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain neural network predictions with finitely many neurons. Our convex geometric description also provides intuitive explanations of hidden neurons as auto-encoders. In higher dimensions, we show that the training problem for two-layer networks can be cast as a convex optimization problem with infinitely many constraints. We then provide a family of convex relaxations to approximate the solution, and a cutting-plane algorithm to improve the relaxations. We derive conditions for the exactness of the relaxations and provide simple closed form formulas for the optimal neural network weights in certain cases. We also establish a connection to $\ell_0$-$\ell_1$ equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing. Extensive experimental results show that the proposed approach yields interpretable and accurate models.

data matrix, extreme point, matrix, (16 more...)

2002.11219

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Sarthak, null, Shukla, Shikhar, Tripathi, Surya Prakash

EmbPred30: Assessing 30-days Readmission for Diabetic Patients using Categorical Embeddings

Diabetes is a disease-causing high level of blood sugar. In type 1 Diabetes, body doesn't produce insulin, but if injected from external sources, will use it and in type 2, the body doesn't produce as well as use insulin. It is estimated that 30.3 million people of all ages in the US are suffering from Diabetes as of 2015, out of which 7.2 million are unaware[1]. As of 2016, it is ranked seventh in the list of global causes of mortality. Diabetes can be an underlying cause for many cardiovascular diseases, retinopathy, and nephropathy leading to frequent readmission in the hospital. The Centers for Medicare and Medicaid Services(CMS) labeled a 30-day readmission rate as a measure of healthcare quality offered by the hospital in order to provide the best inpatient care and improve the healthcare quality. Hospitals with high readmission rates will be penalized as per the Patient Protection and Affordable Care Act(ACA) of 2010[2]. During the recent studies[19], it was observed that a 30-day readmission rate for patients with Diabetes ranges between 14.4%-22.7%,

dataset, diabetic patient, readmission, (12 more...)

2002.11215

Country:

Asia > India > Karnataka > Bengaluru (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Orange County > Irvine (0.04)
(3 more...)

Genre: Research Report > New Finding (0.95)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Health Care Providers & Services > Reimbursement (1.00)
Health & Medicine > Government Relations & Public Policy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Ghimire, Sandesh, Gyawali, Prashnna K, Wang, Linwei

Reliable Estimation of Kullback-Leibler Divergence by Controlling Discriminator Complexity in the Reproducing Kernel Hilbert Space

Several scalable methods to compute the Kullback Leibler (KL) divergence between two distributions using their samples have been proposed and applied in large-scale machine learning models. While they have been found to be unstable, the theoretical root cause of the problem is not clear. In this paper, we study in detail a generative adversarial network based approach that uses a neural network discriminator to estimate KL divergence. We argue that, in such case, high fluctuations in the estimates are a consequence of not controlling the complexity of the discriminator function space. We provide a theoretical underpinning and remedy for this problem through the following contributions. First, we construct a discriminator in the Reproducing Kernel Hilbert Space (RKHS). This enables us to leverage sample complexity and mean embedding to theoretically relate the error probability bound of the KL estimates to the complexity of the neural-net discriminator. Based on this theory, we then present a scalable way to control the complexity of the discriminator for a consistent estimation of KL divergence. We support both our proposed theory and method to control the complexity of the RKHS discriminator in controlled experiments.

complexity, discriminator, kl divergence, (12 more...)

2002.11187

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Strength High (0.54)
Research Report > Experimental Study (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Kirschner, Johannes, Lattimore, Tor, Krause, Andreas

Information Directed Sampling for Linear Partial Monitoring

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game. Moreover, we prove lower bounds that classify the minimax regret of all finite games into four possible regimes. IDS achieves the optimal rate in all cases up to logarithmic factors, without tuning any hyper-parameters. We further extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.

bandit, information directed sampling, observable game, (12 more...)

2002.11182

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)