AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

CoNES: Convex Natural Evolutionary Strategies

arXiv.org Machine LearningJul-20-2020

We present a novel algorithm -- convex natural evolutionary strategies (CoNES) -- for optimizing high-dimensional blackbox functions by leveraging tools from convex optimization and information geometry. CoNES is formulated as an efficiently-solvable convex program that adapts the evolutionary strategies (ES) gradient estimate to promote rapid convergence. The resulting algorithm is invariant to the parameterization of the belief distribution. Our numerical results demonstrate that CoNES vastly outperforms conventional blackbox optimization methods on a suite of functions used for benchmarking blackbox optimizers. Furthermore, CoNES demonstrates the ability to converge faster than conventional blackbox methods on a selection of OpenAI's MuJoCo reinforcement learning tasks for locomotion.

cones, evolutionary algorithm, machine learning, (17 more...)

arXiv.org Machine Learning

2007.08601

Country:

North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

$p$-Norm Flow Diffusion for Local Graph Clustering

Fountoulakis, Kimon, Wang, Di, Yang, Shenghao

arXiv.org Machine LearningJul-20-2020

Local graph clustering and the closely related seed set expansion problem are primitives on graphs that are central to a wide range of analytic and learning tasks such as local clustering, community detection, nodes ranking and feature inference. Prior work on local graph clustering mostly falls into two categories with numerical and combinatorial roots respectively. In this work, we draw inspiration from both fields and propose a family of convex optimization formulations based on the idea of diffusion with p-norm network flow for $p\in (1,\infty)$. In the context of local clustering, we characterize the optimal solutions for these optimization problems and show their usefulness in finding low conductance cuts around input seed set. In particular, we achieve quadratic approximation of conductance in the case of $p=2$ similar to the Cheeger-type bounds of spectral methods, constant factor approximation when $p\rightarrow\infty$ similar to max-flow based methods, and a smooth transition for general $p$ values in between. Thus, our optimization formulation can be viewed as bridging the numerical and combinatorial approaches, and we can achieve the best of both worlds in terms of speed and noise robustness. We show that the proposed problem can be solved in strongly local running time for $p\ge 2$ and conduct empirical evaluations on both synthetic and real-world graphs to illustrate our approach compares favorably with existing methods.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

2005.0981

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

A Novel Column Generation Heuristic for Airline Crew Pairing Optimization with Large-scale Complex Flight Networks

Aggarwal, Divyam, Saxena, Dhish Kumar, Bäck, Thomas, Emmerich, Michael

arXiv.org Artificial IntelligenceJul-19-2020

For an airline, the crew operating cost is second only to the fuel cost, making the crew pairing optimization (CPO) critical for business viability. Its aim is to generate a set of flight sequences (crew pairings) that cover all flights in an airline's schedule, at minimum cost, while satisfying several legality constraints. Being an NP-hard combinatorial optimization problem, CPO is tackled by relaxing the underlying Integer Programming Problem into a Linear Programming Problem, and solving the latter through Column generation (CG) technique. However, with the expansion of airlines' operations lately, the curse of dimensionality renders the exact CG-implementations obsolete, paving the way for heuristic-based CG-implementations. Yet, the much prevalent large-scale complex flight networks involving multiple-crew bases and hub-and-spoke sub-networks, largely remain unaddressed. To bridge the research gap, this paper proposes a novel CG heuristic, which has enabled in-house development of an Airline Crew Pairing Optimizer (AirCROP). The efficacy of the heuristic/AirCROP has been: (a) tested on real-world airline data with an unprecedented conjunct scale-and-complexity, marked by over 4200 flights, 15 crew bases, and over a billion pairings, and (b) validated by the research consortium's industrial sponsor. This paper has a dedicated focus on the proposed CG heuristic which constitutes the core search mechanism of the optimizer, by balancing random exploration (of pairings' space), exploitation of domain knowledge (on optimal solution's features), and utilization of the past computational effort through archiving. Though this paper has an airline context, the underlying propositions may find applications across different domains as the proposed CG heuristic can serve as a template on how to utilize domain knowledge to better tackle large-scale combinatorial optimization problems.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

2005.08636

Country:

Asia > India > Uttarakhand > Roorkee (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Consumer Products & Services > Travel (1.00)
Transportation > Infrastructure & Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Flexible Optimization Framework for Regularized Matrix-Tensor Factorizations with Linear Couplings

Schenker, Carla, Cohen, Jeremy E., Acar, Evrim

arXiv.org Machine LearningJul-19-2020

In many areas of science, various sensing technologies are used to obtain information about a single system of interest. Often, none of the datasets alone contains a complete view of the system, but the data measured from different modalities can complement each other. For instance, brain activity patterns can be captured using both electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) signals, which have complementary temporal and spatial resolutions. Similarly, in metabolomics, multiple analytical techniques such as LCMS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) spectroscopy are used to measure chemical compounds in biological samples, providing a more complete picture of underlying biological processes. Joint analysis of datasets from multiple sources, also referred to as data fusion (or multi-modal data mining), exploits these complementary measurements, and allows for better interpretability and, potentially, more accurate recovery of patterns characterizing the underlying phenomena. Nevertheless, data fusion poses many challenges, and there is an emerging need for data fusion methods that can take into account different characteristics of data from multiple sources in many disciplines [1-4]. Data from multiple sources can often be represented in the form of matrices and higher-order tensors. Coupled matrix and tensor factorizations (CMTF) are an effective approach for joint analysis of such datasets in many domains including social network analysis [5-8], neuroscience [9-13], and chemometrics [2, 14]. In such coupled factorizations, each dataset is modelled by a low-rank approximation.

constraint, factorization, matrix, (15 more...)

arXiv.org Machine Learning

2007.09605

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)
Europe > France (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.86)
Health & Medicine > Diagnostic Medicine > Imaging (0.86)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Learning

Ramírez, Guillem, Dangovski, Rumen, Nakov, Preslav, Soljačić, Marin

arXiv.org Machine LearningJul-18-2020

The emergence of unsupervised word embeddings, pre-trained on very large monolingual text corpora, is at the core of the ongoing neural revolution in Natural Language Processing (NLP). Initially introduced for English, such pre-trained word embeddings quickly emerged for a number of other languages. Subsequently, there have been a number of attempts to align the embedding spaces across languages, which could enable a number of cross-language NLP applications. Performing the alignment using unsupervised cross-lingual learning (UCL) is especially attractive as it requires little data and often rivals supervised and semi-supervised approaches. Here, we analyze popular methods for UCL and we find that often their objectives are, intrinsically, versions of the Wasserstein-Procrustes problem. Hence, we devise an approach to solve Wasserstein-Procrustes in a direct way, which can be used to refine and to improve popular UCL methods such as iterative closest point (ICP), multilingual unsupervised and supervised embeddings (MUSE) and supervised Procrustes methods. Our evaluation experiments on standard datasets show sizable improvements over these approaches. We believe that our rethinking of the Wasserstein-Procrustes problem could enable further research, thus helping to develop better algorithms for aligning word embeddings across languages. Our code and instructions to reproduce the experiments are available at https://github.com/guillemram97/wp-hungarian.

algorithm, artificial intelligence, natural language, (16 more...)

arXiv.org Machine Learning

2007.09456

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(19 more...)

Genre: Research Report (0.64)

Industry:

Government > Military (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Distributed Learning via Filtered Hyperinterpolation on Manifolds

Montúfar, Guido, Wang, Yu Guang

arXiv.org Machine LearningJul-18-2020

Learning mappings of data on manifolds is an important topic in contemporary machine learning, with applications in astrophysics, geophysics, statistical physics, medical diagnosis, biochemistry, 3D object analysis. This paper studies the problem of learning real-valued functions on manifolds through filtered hyperinterpolation of input-output data pairs where the inputs may be sampled deterministically or at random and the outputs may be clean or noisy. Motivated by the problem of handling large data sets, it presents a parallel data processing approach which distributes the data-fitting task among multiple servers and synthesizes the fitted sub-models into a global estimator. We prove quantitative relations between the approximation quality of the learned function over the entire manifold, the type of target function, the number of servers, and the number and type of available samples. We obtain the approximation rates of convergence for distributed and non-distributed approaches. For the non-distributed case, the approximation order is optimal.

artificial intelligence, hyperinterpolation, machine learning, (19 more...)

arXiv.org Machine Learning

2007.09392

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)

Add feedback

From Symmetry to Geometry: Tractable Nonconvex Problems

Zhang, Yuqian, Qu, Qing, Wright, John

arXiv.org Machine LearningJul-17-2020

As science and engineering have become increasingly data-driven, the role of optimization has expanded to touch almost every stage of the data analysis pipeline, from the signal and data acquisition to modeling and prediction. The optimization problems encountered in practice are often nonconvex. While challenges vary from problem to problem, one common source of nonconvexity is nonlinearity in the data or measurement model. Nonlinear models often exhibit symmetries, creating complicated, nonconvex objective landscapes, with multiple equivalent solutions. Nevertheless, simple methods (e.g., gradient descent) often perform surprisingly well in practice. The goal of this survey is to highlight a class of tractable nonconvex problems, which can be understood through the lens of symmetries. These problems exhibit a characteristic geometric structure: local minimizers are symmetric copies of a single ``ground truth'' solution, while other critical points occur at balanced superpositions of symmetric copies of the ground truth, and exhibit negative curvature in directions that break the symmetry. This structure enables efficient methods to obtain global minimizers. We discuss examples of this phenomenon arising from a wide range of problems in imaging, signal processing, and data analysis. We highlight the key role of symmetry in shaping the objective landscape and discuss the different roles of rotational and discrete symmetries. This area is rich with observed phenomena and open problems; we close by highlighting directions for future research.

curvature, minimizer, symmetry, (14 more...)

arXiv.org Machine Learning

2007.06753

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > California (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Cumulant GAN

Pantazis, Yannis, Paul, Dipjyoti, Fasoulakis, Michail, Stylianou, Yannis, Katsoulakis, Markos

arXiv.org Machine LearningJul-17-2020

In this paper, we propose a novel loss function for training Generative Adversarial Networks (GANs) aiming towards deeper theoretical understanding as well as improved performance for the underlying optimization problem. The new loss function is based on cumulant generating functions giving rise to \emph{Cumulant GAN}. Relying on a recently-derived variational formula, we show that the corresponding optimization problem is equivalent to R{\'e}nyi divergence minimization, thus offering a (partially) unified perspective of GAN losses: the R{\'e}nyi family encompasses Kullback-Leibler divergence (KLD), reverse KLD, Hellinger distance and $\chi^2$-divergence. Wasserstein GAN is also a member of the proposed cumulant GAN. In terms of stability, we rigorously prove the exponential convergence of cumulant GAN to the Nash equilibrium for a linear discriminator, Gaussian distributions and the standard gradient descent algorithm. Finally, we experimentally demonstrate that image generation is generally more robust relative to Wasserstein GAN and it is substantially improved in terms of inception score when weaker discriminators are considered.

artificial intelligence, discriminator, machine learning, (16 more...)

arXiv.org Machine Learning

2006.06625

Country:

Europe > Greece (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

One Size Fits All: Can We Train One Denoiser for All Noise Levels?

Gnansambandam, Abhiram, Chan, Stanley H.

arXiv.org Machine LearningJul-16-2020

When training an estimator such as a neural network for tasks like image denoising, it is often preferred to train one estimator and apply it to all noise levels. The de facto training protocol to achieve this goal is to train the estimator with noisy samples whose noise levels are uniformly distributed across the range of interest. However, why should we allocate the samples uniformly? Can we have more training samples that are less noisy, and fewer samples that are more noisy? What is the optimal distribution? How do we obtain such a distribution? The goal of this paper is to address this training sample distribution problem from a minimax risk optimization perspective. We derive a dual ascent algorithm to determine the optimal sampling distribution of which the convergence is guaranteed as long as the set of admissible estimators is closed and convex. For estimators with non-convex admissible sets such as deep neural networks, our dual formulation converges to a solution of the convex relaxation. We discuss how the algorithm can be implemented in practice. We evaluate the algorithm on linear estimators and deep networks.

artificial intelligence, machine learning, noise level, (18 more...)

arXiv.org Machine Learning

2005.09627

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Comparator-adaptive Convex Bandits

van der Hoeven, Dirk, Cutkosky, Ashok, Luo, Haipeng

arXiv.org Machine LearningJul-16-2020

We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart. Specifically, we develop convex bandit algorithms with regret bounds that are small whenever the norm of the comparator is small. We first use techniques from the full-information setting to develop comparator-adaptive algorithms for linear bandits. Then, we extend the ideas to convex bandits with Lipschitz or smooth loss functions, using a new single-point gradient estimator and carefully designed surrogate losses.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2007.08448

Country:

North America > United States > California (0.14)
Europe > Netherlands > South Holland > Leiden (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback