AITopics

The random forest algorithm (RF) has several hyperparameters that have to be set by the user, e.g., the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain and the number of trees. In this paper, we first provide a literature review on the parameters' influence on the prediction performance and on variable importance measures, also considering interactions between hyperparameters. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a brief overview of tuning strategies we demonstrate the application of one of the most established tuning strategies, model-based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters.

artificial intelligence, machine learning, optimization problem, (20 more...)

1804.03515

Country: Europe > Germany (0.28)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Education (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.65)

Alabi, Daniel, Immorlica, Nicole, Kalai, Adam Tauman

When optimizing nonlinear objectives is no harder than linear objectives

However, many objectives of practical interest are more complex than simply average loss. Examples include balancing performance or loss with fairness across people, as well as balancing precision and recall. We prove that, from a computational perspective, fairly general families of complex objectives are not significantly harder to optimize than standard averages, by providing polynomial-time reductions, i.e., algorithms that optimize complex objectives using linear optimizers. The families of objectives included are arbitrary continuous functions of average group performances and also convex objectives. We illustrate with applications to fair machine learning, fair optimization and F1-scores.

algorithm, artificial intelligence, machine learning, (16 more...)

1804.04503

Country:

North America > United States (1.00)
North America > Canada (0.67)
Europe (0.67)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Hashimoto, Tatsunori B., Yadlowsky, Steve, Duchi, John C.

Derivative free optimization via repeated classification

We develop an algorithm for minimizing a function using $n$ batched function value measurements at each of $T$ rounds by using classifiers to identify a function's sublevel set. We show that sufficiently accurate classifiers can achieve linear convergence rates, and show that the convergence rate is tied to the difficulty of active learning sublevel sets. Further, we show that the bootstrap is a computationally efficient approximation to the necessary classification scheme. The end result is a computationally efficient derivative-free algorithm requiring no tuning that consistently outperforms other approaches on simulations, standard benchmarks, real-world DNA binding optimization, and airfoil design problems whenever batched function queries are natural.

artificial intelligence, classifier, machine learning, (19 more...)

1804.03761

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Fedorov, Igor, Rao, Bhaskar D.

Multimodal Sparse Bayesian Dictionary Learning

The purpose of this paper is to address the problem of learning dictionaries for multimodal datasets, i.e. datasets collected from multiple data sources. We present an algorithm called multimodal sparse Bayesian dictionary learning (MSBDL). The MSBDL algorithm is able to leverage information from all available data modalities through a joint sparsity constraint on each modality's sparse codes without restricting the coefficients themselves to be equal. Our framework offers a considerable amount of flexibility to practitioners and addresses many of the shortcomings of existing multimodal dictionary learning approaches. Unlike existing approaches, MSBDL allows the dictionaries for each data modality to have different cardinality. In addition, MSBDL can be used in numerous scenarios, from small datasets to extensive datasets with large dimensionality. MSBDL can also be used in supervised settings and allows for learning multimodal dictionaries concurrently with classifiers for each modality.

artificial intelligence, bayesian inference, machine learning, (18 more...)

1804.0374

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Gidel, Gauthier, Pedregosa, Fabian, Lacoste-Julien, Simon

Frank-Wolfe Splitting via Augmented Lagrangian Method

arXiv.org Machine LearningApr-9-2018

Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than minimizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets, and only require access to the oracle on the individual constraints. In this work, we develop and analyze the Frank-Wolfe Augmented Lagrangian (FW-AL) algorithm, a method for minimizing a smooth function over convex compact sets related by a "linear consistency" constraint that only requires access to a linear minimization oracle over the individual constraints. It is based on the Augmented Lagrangian Method (ALM), also known as Method of Multipliers, but unlike most existing splitting methods, it only requires access to linear (instead of quadratic) minimization oracles. We use recent advances in the analysis of Frank-Wolfe and the alternating direction method of multipliers algorithms to prove a sublinear convergence rate for FW-AL over general convex compact sets and a linear convergence rate over polytopes.

algorithm, artificial intelligence, machine learning, (17 more...)

1804.03176

Country:

Europe (0.45)
North America (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Cremer, Jochen L., Konstantelos, Ioannis, Tindemans, Simon H., Strbac, Goran

Sample-Derived Disjunctive Rules for Secure Power System Operation

arXiv.org Machine LearningApr-9-2018

Machine learning techniques have been used in the past using Monte Carlo samples to construct predictors of the dynamic stability of power systems. In this paper we move beyond the task of prediction and propose a comprehensive approach to use predictors, such as Decision Trees (DT), within a standard optimization framework for pre- and post-fault control purposes. In particular, we present a generalizable method for embedding rules derived from DTs in an operation decision-making model. We begin by pointing out the specific challenges entailed when moving from a prediction to a control framework. We proceed with introducing the solution strategy based on generalized disjunctive programming (GDP) as well as a two-step search method for identifying optimal hyper-parameters for balancing cost and control accuracy. We showcase how the proposed approach constructs security proxies that cover multiple contingencies while facing high-dimensional uncertainty with respect to operating conditions with the use of a case study on the IEEE 39-bus system. The method is shown to achieve efficient system control at a marginal increase in system price compared to an oracle model.

artificial intelligence, machine learning, operating point, (16 more...)

1804.02948

Country: Europe (0.46)

Genre: Research Report (0.50)

Industry:

Energy > Power Industry (1.00)
Machinery > Industrial Machinery (0.73)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceApr-8-2018, 06:27:29 GMT

Inventory Optimization Solution in the Azure AI Gallery

This post is co-authored by Dmitry Pechyoni, Senior Data Scientist, Hong Lu and Chenhui Hu, Data Scientists, Praneet Solanki, Software Engineer, and Ilan Reiter, Principal Data Scientist Manager at Microsoft. For retailers, inventory optimization is a critical task to facilitate production planning, cost reduction, and operation management. Tremendous business value can be derived by optimizing the inventories. For example, optimizing product ordering strategy can help minimize inventory cost and reduce stockout events. This also improves customer satisfaction levels.

azure ai gallery, inventory management policy, inventory optimization solution, (7 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (0.80)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.52)

arXiv.org Machine LearningApr-8-2018

Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation

Koanantakool, Penporn, Ali, Alnur, Azad, Ariful, Buluc, Aydin, Morozov, Dmitriy, Oliker, Leonid, Yelick, Katherine, Oh, Sang-Yun

Across a variety of scientific disciplines, sparse inverse covariance estimation is a popular tool for capturing the underlying dependency relationships in multivariate data. Unfortunately, most estimators are not scalable enough to handle the sizes of modern high-dimensional data sets (often on the order of terabytes), and assume Gaussian samples. To address these deficiencies, we introduce HP-CONCORD, a highly scalable optimization method for estimating a sparse inverse covariance matrix based on a regularized pseudolikelihood framework, without assuming Gaussianity. Our parallel proximal gradient method uses a novel communication-avoiding linear algebra algorithm and runs across a multi-node cluster with up to 1k nodes (24k cores), achieving parallel scalability on problems with up to ~819 billion parameters (1.28 million dimensions); even on a single node, HP-CONCORD demonstrates scalability, outperforming a state-of-the-art method. We also use HP-CONCORD to estimate the underlying dependency structure of the brain from fMRI data, and use the result to identify functional regions automatically. The results show good agreement with a clustering from the neuroscience literature.

artificial intelligence, machine learning, optimization problem, (19 more...)

1710.10769

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.66)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.84)

Herlands, William, McFowland, Edward III, Wilson, Andrew Gordon, Neill, Daniel B.

Gaussian Process Subset Scanning for Anomalous Pattern Detection in Non-iid Data

arXiv.org Machine LearningApr-4-2018

Identifying anomalous patterns in real-world data is essential for understanding where, when, and how systems deviate from their expected dynamics. Yet methods that separately consider the anomalousness of each individual data point have low detection power for subtle, emerging irregularities. Additionally, recent detection techniques based on subset scanning make strong independence assumptions and suffer degraded performance in correlated data. We introduce methods for identifying anomalous patterns in non-iid data by combining Gaussian processes with novel log-likelihood ratio statistic and subset scanning techniques. Our approaches are powerful, interpretable, and can integrate information across multiple data streams. We illustrate their performance on numeric simulations and three open source spatiotemporal datasets of opioid overdose deaths, 311 calls, and storm reports.

artificial intelligence, data mining, machine learning, (17 more...)

1804.01466

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > New York > Bronx County > New York City (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.49)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
(2 more...)

Bassetti, Federico, Gualandi, Stefano, Veneroni, Marco

On the Computation of Kantorovich-Wasserstein Distances between 2D-Histograms by Uncapacitated Minimum Cost Flows

arXiv.org Machine LearningApr-2-2018

In this work, we present a method to compute the Kantorovich distance, that is, the Wasserstein distance of order one, between a pair of two-dimensional histograms. Recent works in Computer Vision and Machine Learning have shown the benefits of measuring Wasserstein distances of order one between histograms with $N$ bins, by solving a classical transportation problem on (very large) complete bipartite graphs with $N$ nodes and $N^2$ edges. The main contribution of our work is to approximate the original transportation problem by an uncapacitated min cost flow problem on a reduced flow network of size $O(N)$. More precisely, when the distance among the bin centers is measured with the 1-norm or the $\infty$-norm, our approach provides an optimal solution. When the distance amongst bins is measured with the 2-norm: (i) we derive a quantitative estimate on the error between optimal and approximate solution; (ii) given the error, we construct a reduced flow network of size $O(N)$. We numerically show the benefits of our approach by computing Wasserstein distances of order one on a set of grey scale images used as benchmarks in the literature. We show how our approach scales with the size of the images with 1-norm, 2-norm and $\infty$-norm ground distances.

artificial intelligence, flow problem, machine learning, (18 more...)

1804.00445

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)