Optimization
Computational Sustainability
These are exciting times for computational sciences with the digital revolution permeating a variety of areas and radically transforming business, science, and our daily lives. The Internet and the World Wide Web, GPS, satellite communications, remote sensing, and smartphones are dramatically accelerating the pace of discovery, engendering globally connected networks of people and devices. The rise of practically relevant artificial intelligence (AI) is also playing an increasing part in this revolution, fostering e-commerce, social networks, personalized medicine, IBM Watson and AlphaGo, self-driving cars, and other groundbreaking transformations. Unfortunately, humanity is also facing tremendous challenges. Nearly a billion people still live below the international poverty line and human activities and climate change are threatening our planet and the livelihood of current and future generations. Moreover, the impact of computing and information technology has been uneven, mainly benefiting profitable sectors, with fewer societal and environmental benefits, further exacerbating inequalities and the destruction of our planet. Our vision is that computer scientists can and should play a key role in helping address societal and environmental challenges in pursuit of a sustainable future, while also advancing computer science as a discipline. For over a decade, we have been deeply engaged in computational research to address societal and environmental challenges, while nurturing the new field of Computational Sustainability.
Mobility-aware Content Preference Learning in Decentralized Caching Networks
Ye, Yu, Xiao, Ming, Skoglund, Mikael
--Due to the drastic increase of mobile traffic, wireless caching is proposed to serve repeated requests for content download. T o determine the caching scheme for decentralized caching networks, the content preference learning problem based on mobility prediction is studied. We first formulate preference prediction as a decentralized regularized multi-task learning (DRMTL) problem without considering the mobility of mobile terminals (MTs). The problem is solved by a hybrid Jacobian and Gauss-Seidel proximal multi-block alternating direction method (ADMM) based algorithm, which is proven to conditionally converge to the optimal solution with a rate O (1 / k) . Then we use the tool of Markov renewal process to predict the moving path and sojourn time for MTs, and integrate the mobility pattern with the DRMTL model by reweighting the training samples and introducing a transfer penalty in the objective. We solve the problem and prove that the developed algorithm has the same convergence property but with different conditions. Through simulation we show the convergence analysis on proposed algorithms. Our real trace driven experiments illustrate that the mobility-aware DRMTL model can provide a more accurate prediction on geography preference than DRMTL model. Besides, the hit ratio achieved by most popular proactive caching (MPC) policy with preference predicted by mobility-aware DRMTL outperforms the MPC with preference from DRMTL and random caching (RC) schemes. As a promising technology for the fifth-generation (5G) wireless networks and beyond, proactive caching can alleviate the heavy traffic burden on backhaul links and reduce service delay, through proactively storing popular contents at base stations (BSs) and mobile terminals (MTs) [1]-[3]. With the limitation of storage memory, determining where and what to cache in content centric wireless networks becomes one of the main challenges in the design of proactive caching schemes. Among the various factors affecting the wireless caching design, involving the mobility of MTs and learning content preference are two critical challenges, which have attracted more and more research interest recently. A. background Current investigation on mobility aware wireless caching mainly includes two aspects: studying the impact of MT mobility on caching schemes [4]-[7], and optimizing the wireless caching schemes based on the mobility information of MTs Y u Y e, Ming Xiao and Mikael Skoglund are with the School of Electrical Engineering and Computer Science, Royal Institute of Technology (KTH), Stockholm, Sweden (email: yu9@kth.se,
A novel approach to multivariate redundancy and synergy
Consider a situation in which a set of $n$ "source" random variables $X_{1},\dots,X_{n}$ have information about some "target" random variable $Y$. For example, in neuroscience $Y$ might represent the state of an external stimulus and $X_{1},\dots,X_{n}$ the activity of $n$ different brain regions. Recent work in information theory has considered how to decompose the information that the sources $X_{1},\dots,X_{n}$ provide about the target $Y$ into separate terms such as (1) the "redundant information" that is shared among all of sources, (2) the "unique information" that is provided only by a single source, (3) the "synergistic information" that is provided by all sources only when considered jointly, and (4) the "union information" that is provided by at least one source. We propose a novel framework deriving such a decomposition that can be applied to any number of sources. Our measures are motivated in three distinct ways: via a formal analogy to intersection and union operators in set theory, via a decision-theoretic operationalization based on Blackwell's theorem, and via an axiomatic derivation. A key aspect of our approach is that we relax the assumption that measures of redundancy and union information should be related by the inclusion-exclusion principle. We discuss relations to previous proposals as well as possible generalizations.
Quadratic Surface Support Vector Machine with L1 Norm Regularization
Mousavi, Seyedahmad, Gao, Zheming, Han, Lanshan, Lim, Alvin
We propose $\ell_1$ norm regularized quadratic surface support vector machine models for binary classification in supervised learning. We establish their desired theoretical properties, including the existence and uniqueness of the optimal solution, reduction to the standard SVMs over (almost) linearly separable data sets, and detection of true sparsity pattern over (almost) quadratically separable data sets if the penalty parameter of $\ell_1$ norm is large enough. We also demonstrate their promising practical efficiency by conducting various numerical experiments on both synthetic and publicly available benchmark data sets.
Applications of Nature-Inspired Algorithms for Dimension Reduction: Enabling Efficient Data Analytics
Mohammadi, Farid Ghareh, Amini, M. Hadi, Arabnia, Hamid R.
In [1], we have explored the theoretical aspects of feature selection and evolutionary algorithms. In this chapter, we focus on optimization algorithms for enhancing data analytic process, i.e., we propose to explore applications of nature-inspired algorithms in data science. Feature selection optimization is a hybrid approach leveraging feature selection techniques and evolutionary algorithms process to optimize the selected features. Prior works solve this problem iteratively to converge to an optimal feature subset. Feature selection optimization is a non-specific domain approach. Data scientists mainly attempt to find an advanced way to analyze data n with high computational efficiency and low time complexity, leading to efficient data analytics. Thus, by increasing generated/measured/sensed data from various sources, analysis, manipulation and illustration of data grow exponentially. Due to the large scale data sets, Curse of dimensionality (CoD) is one of the NP-hard problems in data science. Hence, several efforts have been focused on leveraging evolutionary algorithms (EAs) to address the complex issues in large scale data analytics problems. Dimension reduction, together with EAs, lends itself to solve CoD and solve complex problems, in terms of time complexity, efficiently. In this chapter, we first provide a brief overview of previous studies that focused on solving CoD using feature extraction optimization process. We then discuss practical examples of research studies are successfully tackled some application domains, such as image processing, sentiment analysis, network traffics / anomalies analysis, credit score analysis and other benchmark functions/data sets analysis.
The Learning of Fuzzy Cognitive Maps With Noisy Data: A Rapid and Robust Learning Method With Maximum Entropy
Feng, Guoliang, Lu, Wei, Pedrycz, Witold, Yang, Jianhua, Liu, Xiaodong
Numerous learning methods for fuzzy cognitive maps (FCMs), such as the Hebbian-based and the population-based learning methods, have been developed for modeling and simulating dynamic systems. However, these methods are faced with several obvious limitations. Most of these models are extremely time consuming when learning the large-scale FCMs with hundreds of nodes. Furthermore, the FCMs learned by those algorithms lack robustness when the experimental data contain noise. In addition, reasonable distribution of the weights is rarely considered in these algorithms, which could result in the reduction of the performance of the resulting FCM. In this article, a straightforward, rapid, and robust learning method is proposed to learn FCMs from noisy data, especially, to learn large-scale FCMs. The crux of the proposed algorithm is to equivalently transform the learning problem of FCMs to a classic-constrained convex optimization problem in which the least-squares term ensures the robustness of the well-learned FCM and the maximum entropy term regularizes the distribution of the weights of the well-learned FCM. A series of experiments covering two frequently used activation functions (the sigmoid and hyperbolic tangent functions) are performed on both synthetic datasets with noise and real-world datasets. The experimental results show that the proposed method is rapid and robust against data containing noise and that the well-learned weights have better distribution. In addition, the FCMs learned by the proposed method also exhibit superior performance in comparison with the existing methods. Index Terms-Fuzzy cognitive maps (FCMs), maximum entropy, noisy data, rapid and robust learning.
The double traveling salesman problem with partial last-in-first-out loading constraints
Chagas, Jonatas B. C., Toffolo, Túlio A. M., Souza, Marcone J. F., Iori, Manuel
In this paper, we introduce the Double Traveling Salesman Problem with Partial Last-In-First-Out Loading Constraints (DTSPPL), a pickup-and-delivery single-vehicle routing problem where all pickup operations must be performed before any delivery one because the pickup and delivery areas are geographically separated. The vehicle collects items in the pickup area and loads them into its container, a horizontal stack. After performing all pickup operations, the vehicle begins delivering the items in the delivery area. Loading and unloading operations must obey a partial Last-In-First-Out (LIFO) policy, i.e., a version of the LIFO policy that may be violated within a given reloading depth. The objective of the DTSPPL is to minimize the total cost, which involves the total distance traveled by the vehicle and the number of reloaded items due to violations of the standard LIFO policy. We formally describe the DTSPPL by means of two Integer Linear Programming (ILP) formulations, and propose a heuristic algorithm based on the Biased Random-Key Genetic Algorithm (BRKGA) to find high-quality solutions. The performance of the proposed solution approaches is assessed over a broad set of instances. Computational results have shown that both ILP formulations were able to solve only the smaller instances, whereas the BRKGA obtained better solutions for almost all instances, requiring shorter computational time.
A tree-based radial basis function method for noisy parallel surrogate optimization
Parallel surrogate optimization algorithms have proven to be efficient methods for solving expensive noisy optimization problems. In this work we develop a new parallel surrogate optimization algorithm (ProSRS), using a novel tree-based "zoom strategy" to improve the efficiency of the algorithm. We prove that if ProSRS is run for sufficiently long, with probability converging to one there will be at least one point among all the evaluations that will be arbitrarily close to the global minimum. We compare our algorithm to several state-of-the-art Bayesian optimization algorithms on a suite of standard benchmark functions and two real machine learning hyperparameter-tuning problems. We find that our algorithm not only achieves significantly faster optimization convergence, but is also 1-4 orders of magnitude cheaper in computational cost.
Estimation of perceptual scales using ordinal embedding
Haghiri, Siavash, Wichmann, Felix, von Luxburg, Ulrike
In this paper, we address the problem of measuring and analysing sensation, the subjective magnitude of one's experience. We do this in the context of the method of triads: the sensation of the stimulus is evaluated via relative judgments of the form: "Is stimulus S_i more similar to stimulus S_j or to stimulus S_k?". We propose to use ordinal embedding methods from machine learning to estimate the scaling function from the relative judgments. We review two relevant and well-known methods in psychophysics which are partially applicable in our setting: non-metric multi-dimensional scaling (NMDS) and the method of maximum likelihood difference scaling (MLDS). We perform an extensive set of simulations, considering various scaling functions, to demonstrate the performance of the ordinal embedding methods. We show that in contrast to existing approaches our ordinal embedding approach allows, first, to obtain reasonable scaling function from comparatively few relative judgments, second, the estimation of non-monotonous scaling functions, and, third, multi-dimensional perceptual scales. In addition to the simulations, we analyse data from two real psychophysics experiments using ordinal embedding methods. Our results show that in the one-dimensional, monotonically increasing perceptual scale our ordinal embedding approach works as well as MLDS, while in higher dimensions, only our ordinal embedding methods can produce a desirable scaling function. To make our methods widely accessible, we provide an R-implementation and general rules of thumb on how to use ordinal embedding in the context of psychophysics.
Federated Learning: Challenges, Methods, and Future Directions
Li, Tian, Sahu, Anit Kumar, Talwalkar, Ameet, Smith, Virginia
Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving data analysis. In this article, we discuss the unique characteristics and challenges of federated learning, provide a broad overview of current approaches, and outline several directions of future work that are relevant to a wide range of research communities.