AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

[Explained] Machine Learning Fundamentals: Optimization Problems and How to Solve Them

#artificialintelligenceNov-14-2019, 16:23:41 GMT

If you start to look into machine learning and the math behind it, you will quickly notice that everything comes down to an optimization problem. Even the training of neural networks is basically just finding the optimal parameter configuration for a really high dimensional function. In this article, we will go through the steps of solving a simple Machine Learning problem step by step. We will see why and how it always comes down to an optimization problem, which parameters are optimized and how we compute the optimal value in the end. To start, let's have a look at a simple dataset (x1, x2): If you are lucky, one computer in the dataset had the exactly same age as your, but that's highly unlikely.

approximation function, approximation line, machine learning, (15 more...)

#artificialintelligence

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.81)

Add feedback

The Similarity-Consensus Regularized Multi-view Learning for Dimension Reduction

Meng, Xiangzhu, Wang, Huibing, Feng, Lin

arXiv.org Machine LearningNov-14-2019

During the last decades, learning a low-dimensional space with discriminative information for dimension reduction (DR) has gained a surge of interest. However, it's not accessible for these DR methods to achieve satisfactory performance when facing the features from multiple views. In multi-view learning problems, one instance can be represented by multiple heterogeneous features, which are highly related but sometimes look different from each other. In addition, correlations between features from multiple views always vary greatly, which challenges the capability of multi-view learning methods. Consequently, constructing a multi-view learning framework with generalization and scalability, which could take advantage of multi-view information as much as possible, is extremely necessary but challenging. To implement the above target, this paper proposes a novel multi-view learning framework based on similarity consensus, which makes full use of correlations among multi-view features while considering the scalability and robustness of the framework. It aims to straightforwardly extend those existing DR methods into multi-view learning domain by preserving the similarity between different views to capture the low-dimensional embedding. Two schemes based on pairwise-consensus and centroid-consensus are separately proposed to force multiple views to learn from each other and then an iterative alternating strategy is developed to obtain the optimal solution. The proposed method is evaluated on 5 benchmark datasets and comprehensive experiments show that our proposed multi-view framework can yield comparable and promising performance with previous approaches proposed in recent literatures.

dataset, dr method, information, (16 more...)

arXiv.org Machine Learning

1911.07656

Country:

Asia > China > Liaoning Province > Dalian (0.05)
Oceania > Australia > South Australia > Adelaide (0.04)

Genre: Research Report (0.50)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.61)

Add feedback

Solving Inverse Problems by Joint Posterior Maximization with a VAE Prior

González, Mario, Almansa, Andrés, Delbracio, Mauricio, Musé, Pablo, Tan, Pauline

arXiv.org Machine LearningNov-14-2019

In this paper we address the problem of solving ill-posed inverse problems in imaging where the prior is a neural generative model. Specifically we consider the decoupled case where the prior is trained once and can be reused for many different log-concave degradation models without retraining. Whereas previous MAP-based approaches to this problem lead to highly non-convex optimization algorithms, our approach computes the joint (space-latent) MAP that naturally leads to alternate optimization algorithms and to the use of a stochastic encoder to accelerate computations. The resulting technique is called JPMAP because it performs Joint Posterior Maximization using an Autoencoding Prior. We show theoretical and experimental evidence that the proposed objective function is quite close to bi-convex. Indeed it satisfies a weak bi-convexity property which is sufficient to guarantee that our optimization scheme converges to a stationary point. Experimental results also show the higher quality of the solutions obtained by our JPMAP approach with respect to other non-convex MAP approaches which more often get stuck in spurious local optima.

accumulation point, approximation, sequence, (15 more...)

arXiv.org Machine Learning

1911.06379

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
South America > Uruguay > Salto > Salto (0.04)
South America > Uruguay > Montevideo > Montevideo (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Election Control in Social Networks via Edge Addition or Removal

Castiglioni, Matteo, Ferraioli, Diodato, Gatti, Nicola

arXiv.org Artificial IntelligenceNov-14-2019

We focus on the scenario in which messages pro and/or against one or multiple candidates are spread through a social network in order to affect the votes of the receivers. Several results are known in the literature when the manipulator can make seeding by buying influencers. In this paper, instead, we assume the set of influencers and their messages to be given, and we ask whether a manipulator ( e.g., the platform) can alter the outcome of the election by adding or removing edges in the social network. We study a wide range of cases distinguishing for the number of candidates or for the kind of messages spread over the network. We provide a positive result, showing that, except for trivial cases, manipulation is not affordable, the optimization problem being hard even if the manipulator has an unlimited budget ( i.e., he can add or remove as many edges as desired). Furthermore, we prove that our hardness results still hold in a reoptimization variant, where the manipulator already knows an optimal solution to the problem and needs to compute a new solution once a local modification occurs ( e.g., in bandit scenarios where estimations related to random variables change over time). Introduction Nowadays, social network media are the most used, if not the unique, sources of information. This indisputable fact turned out to influence most of our daily actions, and also to have severe effects on the political life of our countries. Indeed, in many of the recent political elections around the world, there has been evidence of the impact that false or incomplete news spread through these media influenced the electoral outcome. For example, in the recent US presidential election, Allcott and Gentzkow (2017) and Guess, Nyhan, and Reifler (2018) show that, on average, 92% of Americans remembered pro-Trump false news, while 23% of them remembered the pro-Clinton fake news. As another example, Ferrara (2017) shows that automated accounts in Twitter spread a considerable amount of political news in order to alter the outcome of 2017 French elections. In this scenario, a natural question is to understand at which extent the spread of (mis)information on social network media may alter the result of a political election. This topic has recently received large interest in the community: e.g., Auletta et al. (2015; 2017a; 2017b) show that, in the case of two only candidates, a manipulator may be able to lead the minority to become a majority by influencing the order in which voters change their mind.

manipulator, mov, node, (15 more...)

arXiv.org Artificial Intelligence

1911.06198

Country:

Europe > France (0.34)
Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Services (1.00)
Government > Voting & Elections (1.00)
Government > Regional Government > Europe Government > France Government (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.56)

Add feedback

Bayesian Optimization with Uncertain Preferences over Attributes

Astudillo, Raul, Frazier, Peter I.

arXiv.org Machine LearningNov-13-2019

We consider black-box global optimization of time-consuming-to-evaluate functions on behalf of a decision-maker whose preferences must be learned. Each feasible design is associated with a time-consuming-to-evaluate vector of attributes, each vector of attributes is assigned a utility by the decision-maker's utility function, and this utility function may be learned approximately using preferences expressed by the decision-maker over pairs of attribute vectors. Past work has used this estimated utility function as if it were error-free within single-objective optimization. However, errors in utility estimation may yield a poor suggested decision. Furthermore, this approach produces a single suggested "best" design, whereas decision-makers often prefer to choose among a menu of designs. We propose a novel Bayesian optimization algorithm that acknowledges the uncertainty in preference estimation and implicitly chooses designs to evaluate using the time-consuming function that are good not just for a single estimated utility function but a range of likely utility functions. Our algorithm then shows a menu of designs and evaluated attributes to the decision-maker who makes a final selection. We demonstrate the value of our algorithm in a variety of numerical experiments.

bayesian optimization, optimization, utility function, (14 more...)

arXiv.org Machine Learning

1911.05934

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Ohio (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Banking & Finance (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Recent Advances in Algorithmic High-Dimensional Robust Statistics

Diakonikolas, Ilias, Kane, Daniel M.

arXiv.org Machine LearningNov-13-2019

Learning in the presence of outliers is a fundamental problem in statistics. Until recently, all known efficient unsupervised learning algorithms were very sensitive to outliers in high dimensions. In particular, even for the task of robust mean estimation under natural distributional assumptions, no efficient algorithm was known. Recent work in theoretical computer science gave the first efficient robust estimators for a number of fundamental statistical tasks, including mean and covariance estimation. Since then, there has been a flurry of research activity on algorithmic high-dimensional robust estimation in a range of settings. In this survey article, we introduce the core ideas and algorithmic techniques in the emerging area of algorithmic high-dimensional robust statistics with a focus on robust mean estimation. We also provide an overview of the approaches that have led to computationally efficient robust estimators for a range of broader statistical tasks and discuss new directions and opportunities for future work.

algorithm, estimation, mean estimation, (15 more...)

arXiv.org Machine Learning

1911.05911

Country:

North America > United States > New York (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(3 more...)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Convergence to minima for the continuous version of Backtracking Gradient Descent

Truong, Tuyen Trung

arXiv.org Machine LearningNov-13-2019

The main result of this paper is: {\bf Theorem.} Let $f:\mathbb{R}^k\rightarrow \mathbb{R}$ be a $C^{1}$ function, so that $\nabla f$ is locally Lipschitz continuous. Assume moreover that $f$ is $C^2$ near its generalised saddle points. Fix real numbers $\delta_0>0$ and $0<\alpha <1$. Then there is a smooth function $h:\mathbb{R}^k\rightarrow (0,\delta_0]$ so that the map $H:\mathbb{R}^k\rightarrow \mathbb{R}^k$ defined by $H(x)=x-h(x)\nabla f(x)$ has the following property: (i) For all $x\in \mathbb{R}^k$, we have $f(H(x)))-f(x)\leq -\alpha h(x)||\nabla f(x)||^2$. (ii) For every $x_0\in \mathbb{R}^k$, the sequence $x_{n+1}=H(x_n)$ either satisfies $\lim_{n\rightarrow\infty}||x_{n+1}-x_n||=0$ or $ \lim_{n\rightarrow\infty}||x_n||=\infty$. Each cluster point of $\{x_n\}$ is a critical point of $f$. If moreover $f$ has at most countably many critical points, then $\{x_n\}$ either converges to a critical point of $f$ or $\lim_{n\rightarrow\infty}||x_n||=\infty$. (iii) There is a set $\mathcal{E}_1\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_1$, the sequence $x_{n+1}=H(x_n)$, {\bf if converges}, cannot converge to a {\bf generalised} saddle point. (iv) There is a set $\mathcal{E}_2\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_2$, any cluster point of the sequence $x_{n+1}=H(x_n)$ is not a saddle point, and more generally cannot be an isolated generalised saddle point. Some other results are proven.

critical point, generalised saddle point, saddle point, (15 more...)

arXiv.org Machine Learning

1911.04221

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(4 more...)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Add feedback

Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning

Ma, Qiang, Ge, Suwen, He, Danyang, Thaker, Darshan, Drori, Iddo

arXiv.org Machine LearningNov-12-2019

In this work, we introduce Graph Pointer Networks (GPNs) trained using reinforcement learning (RL) for tackling the traveling salesman problem (TSP). GPNs build upon Pointer Networks by introducing a graph embedding layer on the input, which captures relationships between nodes. Furthermore, to approximate solutions to constrained combinatorial optimization problems such as the TSP with time windows, we train hierarchical GPNs (HG-PNs) using RL, which learns a hierarchical policy to find an optimal city permutation under constraints. Each layer of the hierarchy is designed with a separate reward function, resulting in stable training. Our results demonstrate that GPNs trained on small-scale TSP50/100 problems generalize well to larger-scale TSP500/1000 problems, with shorter tour lengths and faster computational times. We verify that for constrained TSP problems such as the TSP with time windows, the feasible solutions found via hierarchical RL training outperform previous baselines. In the spirit of reproducible research we make our data, models, and code publicly available.

combinatorial optimization problem, reward function, tsp, (15 more...)

arXiv.org Machine Learning

1911.04936

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Fast Approximate Time-Delay Estimation in Ultrasound Elastography Using Principal Component Analysis

Zayed, Abdelrahman, Rivaz, Hassan

arXiv.org Machine LearningNov-12-2019

Time delay estimation (TDE) is a critical and challenging step in all ultrasound elastography methods. A growing number of TDE techniques require an approximate but robust and fast method to initialize solving for TDE. Herein, we present a fast method for calculating an approximate TDE between two radio frequency (RF) frames of ultrasound. Although this approximate TDE can be useful for several algorithms, we focus on GLobal Ultrasound Elastography (GLUE), which currently relies on Dynamic Programming (DP) to provide this approximate TDE. We exploit Principal Component Analysis (PCA) to find the general modes of deformation in quasi-static elastography, and therefore call our method PCA-GLUE. PCA-GLUE is a data-driven approach that learns a set of TDE principal components from a training database in real experiments. In the test phase, TDE is approximated as a weighted sum of these principal components. Our algorithm robustly estimates the weights from sparse feature matches, then passes the resulting displacement field to GLUE as initial estimates to perform a more accurate displacement estimation. PCA-GLUE is more than ten times faster than DP in estimation of the initial displacement field and yields similar results.

displacement, elastography, principal component, (14 more...)

arXiv.org Machine Learning

doi: 10.1109/embc.2019.8857242

1911.05242

Country: North America > Canada > Quebec > Montreal (0.05)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

Add feedback

Efficient Fair Principal Component Analysis

Kamani, Mohammad Mahdi, Haddadpour, Farzin, Forsati, Rana, Mahdavi, Mehrdad

arXiv.org Machine LearningNov-12-2019

The flourishing assessments of fairness measure in machine learning algorithms have shown that dimension reduction methods such as PCA treat data from different sensitive groups unfairly. In particular, by aggregating data of different groups, the reconstruction error of the learned subspace becomes biased towards some populations that might hurt or benefit those groups inherently, leading to an unfair representation. On the other hand, alleviating the bias to protect sensitive groups in learning the optimal projection, would lead to a higher reconstruction error overall. This introduces a trade-off between sensitive groups' sacrifices and benefits, and the overall reconstruction error. In this paper, in pursuit of achieving fairness criteria in PCA, we introduce a more efficient notion of Pareto fairness, cast the Pareto fair dimensionality reduction as a multi-objective optimization problem, and propose an adaptive gradient-based algorithm to solve it. Using the notion of Pareto optimality, we can guarantee that the solution of our proposed algorithm belongs to the Pareto frontier for all groups, which achieves the optimal trade-off between those aforementioned conflicting objectives. This framework can be efficiently generalized to multiple group sensitive features, as well. We provide convergence analysis of our algorithm for both convex and non-convex objectives and show its efficacy through empirical studies on different datasets, in comparison with the state-of-the-art algorithm.

disparity error, objective, subspace, (16 more...)

arXiv.org Machine Learning

1911.04931

Country:

North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.70)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.40)

Add feedback