Goto

Collaborating Authors

 Optimization


Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds

arXiv.org Machine Learning

High-dimensional and sparse data are commonly encountered in many applications of machine learning, such as computer vision, bioinformatics, text mining and behavioral targeting. To classify, cluster or rank data points, it is important to be able to compute semantically meaningful similarities between them. However, defining an appropriate similarity measure for a given task is often difficult as only a small and unknown subset of all features are actually relevant. For instance, in drug discovery studies, chemical compounds are typically represented by a large number of sparse features describing their 2D and 3D properties, and only a few of them play in role in determining whether the compound will bind to a particular target receptor (Leach and Gillet, 2007). In text classification and clustering, a document is often represented as a sparse bag of words, and only a small subset of the dictionary is generally useful to discriminate between documents about different topics. Another example is targeted advertising, where ads are selected based on fine-grained user history (Chen et al., 2009). Similarity and metric learning (Bellet et al., 2015) offers principled approaches to construct a taskspecific similarity measure by learning it from weakly supervised data, and has been used in many application domains. The main theme in these methods is to learn the parameters of a similarity (or distance) function such that it agrees with task-specific similarity judgments (e.g., of the form "data point x should


Crowdsourcing with Fairness, Diversity and Budget Constraints

arXiv.org Artificial Intelligence

Recent studies have shown that the labels collected from crowdworkers can be discriminatory with respect to sensitive attributes such as gender and race. This raises questions about the suitability of using crowdsourced data for further use, such as for training machine learning algorithms. In this work, we address the problem of fair and diverse data collection from a crowd under budget constraints. We propose a novel algorithm which maximizes the expected accuracy of the collected data, while ensuring that the errors satisfy desired notions of fairness. We provide guarantees on the performance of our algorithm and show that the algorithm performs well in practice through experiments on real dataset.


Taking Human out of Learning Applications: A Survey on Automated Machine Learning

arXiv.org Artificial Intelligence

Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursuit good learning performance, human experts are heavily engaged in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automatic machine learning~(AutoML) has emerged as a hot topic of both in industry and academy. In this paper, we provide a survey on existing AutoML works. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers almost all existing approaches but also guides the design for new methods. Afterward, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future researches.


OptStream: Releasing Time Series Privately

arXiv.org Artificial Intelligence

Many applications of machine learning and optimization operate on data streams. While these datasets are fundamental to fuel decision-making algorithms, often they contain sensitive information about individuals and their usage poses significant privacy risks. Motivated by an application in energy systems, this paper presents OPTSTREAM, a novel algorithm for releasing differentially private data streams under the w-event model of privacy. OPTSTREAM is a 4-step procedure consisting of sampling, perturbation, reconstruction, and post-processing modules. First, the sampling module selects a small set of points to access in each period of interest. Then, the perturbation module adds noise to the sampled data points to guarantee privacy. Next, the reconstruction module reassembles non-sampled data points from the perturbed sample points. Finally, the post-processing module uses convex optimization over the private output of the previous modules, as well as the private answers of additional queries on the data stream, to improve accuracy by redistributing the added noise. OPTSTREAM is evaluated on a test case involving the release of a real data stream from the largest European transmission operator. Experimental results show that OPTSTREAM may not only improve the accuracy of state-of-the-art methods by at least one order of magnitude but also supports accurate load forecasting on the private data.


Dynamic Assortment Optimization with Changing Contextual Information

arXiv.org Machine Learning

In this paper, we study the dynamic assortment optimization problem under a finite selling season of length $T$. At each time period, the seller offers an arriving customer an assortment of substitutable products under a cardinality constraint, and the customer makes the purchase among offered products according to a discrete choice model. Most existing work associates each product with a real-valued fixed mean utility and assumes a multinomial logit choice (MNL) model. In many practical applications, feature/contexutal information of products is readily available. In this paper, we incorporate the feature information by assuming a linear relationship between the mean utility and the feature. In addition, we allow the feature information of products to change over time so that the underlying choice model can also be non-stationary. To solve the dynamic assortment optimization under this changing contextual MNL model, we need to simultaneously learn the underlying unknown coefficient and makes the decision on the assortment. To this end, we develop an upper confidence bound (UCB) based policy and establish the regret bound on the order of $\widetilde O(d\sqrt{T})$, where $d$ is the dimension of the feature and $\widetilde O$ suppresses logarithmic dependence. We further established the lower bound $\Omega(d\sqrt{T}/K)$ where $K$ is the cardinality constraint of an offered assortment, which is usually small. When $K$ is a constant, our policy is optimal up to logarithmic factors. In the exploitation phase of the UCB algorithm, we need to solve a combinatorial optimization for assortment optimization based on the learned information. We further develop an approximation algorithm and an efficient greedy heuristic. The effectiveness of the proposed policy is further demonstrated by our numerical studies.


Scalable Laplacian K-modes

arXiv.org Machine Learning

We advocate Laplacian K-modes for joint clustering and density mode finding, and propose a concave-convex relaxation of the problem, which yields a parallel algorithm that scales up to large datasets and high dimensions. We optimize a tight bound (auxiliary function) of our relaxation, which, at each iteration, amounts to computing an independent update for each cluster-assignment variable, with guaranteed convergence. Therefore, our bound optimizer can be trivially distributed for large-scale data sets. Furthermore, we show that the density modes can be obtained as byproducts of the assignment variables via simple maximum-value operations whose additional computational cost is linear in the number of data points. Our formulation does not need storing a full affinity matrix and computing its eigenvalue decomposition, neither does it perform expensive projection steps and Lagrangian-dual inner iterates for the simplex constraints of each point. Furthermore, unlike mean-shift, our density-mode estimation does not require inner-loop gradient-ascent iterates. It has a complexity independent of feature-space dimension, yields modes that are valid data points in the input set and is applicable to discrete domains as well as arbitrary kernels. We report comprehensive experiments over various data sets, which show that our algorithm yields very competitive performances in term of optimization quality (i.e., the value of the discrete-variable objective at convergence) and clustering accuracy.


Stochastic Optimal Control of Epidemic Processes in Networks

arXiv.org Machine Learning

We approach the development of models and control strategies of susceptible-infected-susceptible (SIS) epidemic processes from the perspective of marked temporal point processes and stochastic optimal control of stochastic differential equations (SDEs) with jumps. In contrast to previous work, this novel perspective is particularly well-suited to make use of fine-grained data about disease outbreaks, and it lets us overcome the shortcomings of current control strategies. Our control strategy resorts to treatment intensities to determine who to treat and when to do so, to minimize the amount of infected individuals over time. Preliminary experiments with synthetic data show that our control strategy consistently outperforms several alternatives. Looking into the future, we believe our methodology provides a promising step towards the development of practical data-driven control strategies of epidemic processes.


Budgeted Multi-Objective Optimization with a Focus on the Central Part of the Pareto Front - Extended Version

arXiv.org Machine Learning

Optimizing nonlinear systems involving expensive (computer) experiments with regard to conflicting objectives is a common challenge. When the number of experiments is severely restricted and/or when the number of objectives increases, uncovering the whole set of optimal solutions (the Pareto front) is out of reach, even for surrogate-based approaches. As non-compromising Pareto optimal solutions have usually little point in applications, this work restricts the search to relevant solutions that are close to the Pareto front center. The article starts by characterizing this center. Next, a Bayesian multi-objective optimization method for directing the search towards it is proposed. A criterion for detecting convergence to the center is described. If the criterion is triggered, a widened central part of the Pareto front is targeted such that sufficiently accurate convergence to it is forecasted within the remaining budget. Numerical experiments show how the resulting algorithm, C-EHI, better locates the central part of the Pareto front when compared to state-of-the-art Bayesian algorithms.


An Online-Learning Approach to Inverse Optimization

arXiv.org Artificial Intelligence

Human decision-makers are very good at taking decisions under rather imprecise specification of the decision-making problem, both in terms of constraints as well as objective. One 1 might argue that the human decision-maker can pretty reliably learn from observed previous decisions - a traditional learning-by-example setup. At the same time, when we try to turn these decision-making problems into actual optimization problems, we often run into all types of issues in terms of specifying the model. In an optimal world, we would be able to infer or learn the optimization problem from previously observed decisions taken by an expert. This problem naturally occurs in many settings where we do not have direct access to the decision-maker's preference or objective function but can observe his behaviour, and where the learner as well as the decision-maker have access to the same information. Natural examples are as diverse as making recommendations based on user history and strategic planning problems, where the agent's preferences are unknown but the system is observable. Other examples include knowledge transfer from a human planner into a decision support system: often human operators have arrived at finely-tuned "objective functions" through many years of experience, and in many cases it is desirable to replicate the decision-making process both for scaling up and also for potentially including it in large-scale scenario analysis and simulation to explore responses under varying conditions. Here we consider the learning of preferences or objectives from an expert by means of observing his actions.


Leveraging Gaussian Process and Voting-Empowered Many-Objective Evaluation for Fault Identification

arXiv.org Machine Learning

Using piezoelectric impedance/admittance sensing for structural health monitoring is promising, owing to the simplicity in circuitry design as well as the high-frequency interrogation capability. The actual identification of fault location and severity using impedance/admittance measurements, nevertheless, remains to be an extremely challenging task. A first-principle based structural model using finite element discretization requires high dimensionality to characterize the high-frequency response. As such, direct inversion using the sensitivity matrix usually yields an under-determined problem. Alternatively, the identification problem may be cast into an optimization framework in which fault parameters are identified through repeated forward finite element analysis which however is oftentimes computationally prohibitive. This paper presents an efficient data-assisted optimization approach for fault identification without using finite element model iteratively. We formulate a many-objective optimization problem to identify fault parameters, where response surfaces of impedance measurements are constructed through Gaussian process-based calibration. To balance between solution diversity and convergence, an -dominance enabled many-objective simulated annealing algorithm is established. As multiple solutions are expected, a voting score calculation procedure is developed to further identify those solutions that yield better implications regarding structural health condition. The effectiveness of the proposed approach is demonstrated by systematic numerical and experimental case studies.