AITopics

2111.01235

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Broward County (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)

Deshwal, Aryan, Doppa, Janardhan Rao

Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces

arXiv.org Artificial IntelligenceNov-1-2021

We consider the problem of optimizing combinatorial spaces (e.g., sequences, trees, and graphs) using expensive black-box function evaluations. For example, optimizing molecules for drug design using physical lab experiments. Bayesian optimization (BO) is an efficient framework for solving such problems by intelligently selecting the inputs with high utility guided by a learned surrogate model. A recent BO approach for combinatorial spaces is through a reduction to BO over continuous spaces by learning a latent representation of structures using deep generative models (DGMs). The selected input from the continuous space is decoded into a discrete structure for performing function evaluation. However, the surrogate model over the latent space only uses the information learned by the DGM, which may not have the desired inductive bias to approximate the target black-box function. To overcome this drawback, this paper proposes a principled approach referred as LADDER. The key idea is to define a novel structure-coupled kernel that explicitly integrates the structural information from decoded structures with the learned latent space representation for better surrogate modeling. Our experiments on real-world benchmarks show that LADDER significantly improves over the BO over latent space method, and performs better or similar to state-of-the-art methods.

kernel, latent space, optimization, (11 more...)

2111.01186

Country:

North America > United States > Washington (0.04)
Europe > France (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Air (0.55)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)

Farokhi, Farhad, Leong, Alex S., Zamani, Mohammad, Shames, Iman

Safe Learning of Linear Time-Invariant Systems

arXiv.org Machine LearningOct-31-2021

We consider safety in simultaneous learning and control of discrete-time linear time-invariant systems. We provide rigorous confidence bounds on the learned model of the system based on the number of utilized state measurements. These bounds are used to modify control inputs to the system via an optimization problem with potentially time-varying safety constraints. We prove that the state can only exit the safe set with small probability, provided a feasible solution to the safety-constrained optimization exists. This optimization problem is then reformulated in a more computationally-friendly format by tightening the safety constraints to account for model uncertainty during learning. The tightening decreases as the confidence in the learned model improves. We finally prove that, under persistence of excitation, the tightening becomes negligible as more measurements are gathered.

process noise, rk 1, safety, (15 more...)

2111.00631

Country:

Asia > South Korea (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.75)

arXiv.org Artificial IntelligenceOct-31-2021

Fast Global Convergence of Policy Optimization for Constrained MDPs

Liu, Tao, Zhou, Ruida, Kalathil, Dileep, Kumar, P. R., Tian, Chao

We address the issue of safety in reinforcement learning. We pose the problem in a discounted infinite-horizon constrained Markov decision process framework. Existing results have shown that gradient-based methods are able to achieve an $\mathcal{O}(1/\sqrt{T})$ global convergence rate both for the optimality gap and the constraint violation. We exhibit a natural policy gradient-based algorithm that has a faster convergence rate $\mathcal{O}(\log(T)/T)$ for both the optimality gap and the constraint violation. When Slater's condition is satisfied and known a priori, zero constraint violation can be further guaranteed for a sufficiently large $T$ while maintaining the same convergence rate.

algorithm, constraint violation, convergence rate, (12 more...)

2111.00552

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy > Renewable (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Cuesta-Ramirez, Jhouben, Riche, Rodolphe Le, Roustant, Olivier, Perrin, Guillaume, Durantin, Cedric, Gliere, Alain

A comparison of mixed-variables Bayesian optimization approaches

arXiv.org Machine LearningOct-30-2021

Most real optimization problems are defined over a mixed search space where the variables are both discrete and continuous. In engineering applications, the objective function is typically calculated with a numerically costly black-box simulation.General mixed and costly optimization problems are therefore of a great practical interest, yet their resolution remains in a large part an open scientific question. In this article, costly mixed problems are approached through Gaussian processes where the discrete variables are relaxed into continuous latent variables. The continuous space is more easily harvested by classical Bayesian optimization techniques than a mixed space would. Discrete variables are recovered either subsequently to the continuous optimization, or simultaneously with an additional continuous-discrete compatibility constraint that is handled with augmented Lagrangians. Several possible implementations of such Bayesian mixed optimizers are compared. In particular, the reformulation of the problem with continuous latent variables is put in competition with searches working directly in the mixed space. Among the algorithms involving latent variables and an augmented Lagrangian, a particular attention is devoted to the Lagrange multipliers for which a local and a global estimation techniques are studied. The comparisons are based on the repeated optimization of three analytical functions and a beam design problem.

algorithm, latent variable, optimization, (14 more...)

2111.01533

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
North America > United States > New York (0.04)
Europe > Slovakia > Košice > Košice (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Robey, Alexander, Chamon, Luiz F. O., Pappas, George J., Hassani, Hamed, Ribeiro, Alejandro

Adversarial Robustness with Semi-Infinite Constrained Learning

arXiv.org Machine LearningOct-29-2021

Despite strong performance in numerous applications, the fragility of deep learning to input perturbations has raised serious questions about its use in safety-critical domains. While adversarial training can mitigate this issue in practice, state-of-the-art methods are increasingly application-dependent, heuristic in nature, and suffer from fundamental trade-offs between nominal performance and robustness. Moreover, the problem of finding worst-case perturbations is non-convex and underparameterized, both of which engender a non-favorable optimization landscape. Thus, there is a gap between the theory and practice of adversarial training, particularly with respect to when and why adversarial training works. In this paper, we take a constrained learning approach to address these questions and to provide a theoretical foundation for robust learning. In particular, we leverage semi-infinite optimization and non-convex duality theory to show that adversarial training is equivalent to a statistical problem over perturbation distributions, which we characterize completely. Notably, we show that a myriad of previous robust training techniques can be recovered for particular, sub-optimal choices of these distributions. Using these insights, we then propose a hybrid Langevin Monte Carlo approach of which several common algorithms (e.g., PGD) are special cases. Finally, we show that our approach can mitigate the trade-off between nominal and robust performance, yielding state-of-the-art results on MNIST and CIFAR-10. Our code is available at: https://github.com/arobey1/advbench.

arxiv preprint arxiv, classifier, perturbation, (14 more...)

2110.15767

Country:

North America > United States > Pennsylvania (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (0.93)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

García-Ortegón, Miguel, Simm, Gregor N. C., Tripp, Austin J., Hernández-Lobato, José Miguel, Bender, Andreas, Bacallado, Sergio

DOCKSTRING: easy molecular docking yields better benchmarks for ligand design

arXiv.org Machine LearningOct-28-2021

The field of machine learning for drug discovery is witnessing an explosion of novel methods. These methods are often benchmarked on simple physicochemical properties such as solubility or general druglikeness, which can be readily computed. However, these properties are poor representatives of objective functions in drug design, mainly because they do not depend on the candidate's interaction with the target. By contrast, molecular docking is a widely successful method in drug discovery to estimate binding affinities. However, docking simulations require a significant amount of domain knowledge to set up correctly which hampers adoption. To this end, we present DOCKSTRING, a bundle for meaningful and robust comparison of ML models consisting of three components: (1) an open-source Python package for straightforward computation of docking scores; (2) an extensive dataset of docking scores and poses of more than 260K ligands for 58 medically-relevant targets; and (3) a set of pharmaceutically-relevant benchmark tasks including regression, virtual screening, and de novo design. The Python package implements a robust ligand and target preparation protocol that allows non-experts to obtain meaningful docking scores. Our dataset is the first to include docking poses, as well as the first of its size that is a full matrix, thus facilitating experiments in multiobjective optimization and transfer learning. Overall, our results indicate that docking scores are a more appropriate evaluation objective than simple physicochemical properties, yielding more realistic benchmark tasks and molecular candidates.

algorithm, dataset, molecule, (14 more...)

2110.15486

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

arXiv.org Artificial IntelligenceOct-28-2021

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Holmes, Connor, Zhang, Minjia, He, Yuxiong, Wu, Bo

Natural Language Processing (NLP) has recently achieved success by using huge pre-trained Transformer networks. However, these models often contain hundreds of millions or even billions of parameters, bringing challenges to online deployment due to latency constraints. Recently, hardware manufacturers have introduced dedicated hardware for NxM sparsity to provide the flexibility of unstructured pruning with the runtime efficiency of structured approaches. NxM sparsity permits arbitrarily selecting M parameters to retain from a contiguous group of N in the dense representation. However, due to the extremely high complexity of pre-trained models, the standard sparse fine-tuning techniques often fail to generalize well on downstream tasks, which have limited data resources. To address such an issue in a principled manner, we introduce a new learning framework, called NxMTransformer, to induce NxM semi-structured sparsity on pretrained language models for natural language understanding to obtain better performance. In particular, we propose to formulate the NxM sparsity as a constrained optimization problem and use Alternating Direction Method of Multipliers (ADMM) to optimize the downstream tasks while taking the underlying hardware constraints into consideration. ADMM decomposes the NxM sparsification problem into two sub-problems that can be solved sequentially, generating sparsified Transformer networks that achieve high accuracy while being able to effectively execute on newly released hardware. We apply our approach to a wide range of NLP tasks, and our proposed method is able to achieve 1.7 points higher accuracy in GLUE score than current practices. Moreover, we perform detailed analysis on our approach and shed light on how ADMM affects fine-tuning accuracy for downstream tasks. Finally, we illustrate how NxMTransformer achieves performance improvement with knowledge distillation.

caption, obj type annot subtype link, xyz 107 757, (10 more...)

2110.15766

Genre: Research Report (0.40)

Industry: Information Technology > Hardware (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Understanding (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)

Choukroun, Yoni, Katz, Michael

Meta Subspace Optimization

arXiv.org Artificial IntelligenceOct-28-2021

Subspace optimization methods have the attractive property of reducing large-scale optimization problems to a sequence of low-dimensional subspace optimization problems. However, existing subspace optimization frameworks adopt a fixed update policy of the subspace, and therefore, appear to be sub-optimal. In this paper we propose a new \emph{Meta Subspace Optimization} (MSO) framework for large-scale optimization problems, which allows to determine the subspace matrix at each optimization iteration. In order to remain invariant to the optimization problem's dimension, we design an efficient meta optimizer based on very low-dimensional subspace optimization coefficients, inducing a rule-based agent that can significantly improve performance. Finally, we design and analyze a reinforcement learning procedure based on the subspace optimization dynamics whose learnt policies outperform existing subspace optimization methods.

meta optimizer, optimization, optimizer, (15 more...)

2110.1492

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Vanchurin, Vitaly, Wolf, Yuri I., Katsnelson, Mikhail I., Koonin, Eugene V.

Towards a Theory of Evolution as Multilevel Learning

arXiv.org Artificial IntelligenceOct-27-2021

We formulate seven fundamental principles of evolution that appear to be necessary and sufficient to render a universe observable and show that they entail the major features of biological evolution, including replication and natural selection. These principles also follow naturally from the theory of learning. We formulate the theory of evolution using the mathematical framework of neural networks, which provides for detailed analysis of evolutionary phenomena. To demonstrate the potential of the proposed theoretical framework, we derive a generalized version of the Central Dogma of molecular biology by analyzing the flow of information during learning (back-propagation) and predicting (forward-propagation) the environment by evolving organisms. The more complex evolutionary phenomena, such as major transitions in evolution, in particular, the origin of life, have to be analyzed in the thermodynamic limit, which is described in detail in the accompanying paper. Significance statement Modern evolutionary theory gives a detailed quantitative description of microevolutionary processes that occur within evolving populations of organisms, but evolutionary transitions and emergence of multiple levels of complexity remain poorly understood. Here we establish correspondence between the key features of evolution, renormalizability of physical theories and learning dynamics, to outline a theory of evolution that strives to incorporate all evolutionary processes within a unified mathematical framework of the theory of learning. Under this theory, for example, natural selection readily arises from the learning dynamics, and in sufficiently complex systems, the same learning phenomena occur on multiple levels or on different scales, similar to the case of renormalizable physical theories.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

doi: 10.1073/pnas.2120037119

2110.14602

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(10 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)