Optimization
Eigendecomposition of Q in Equally Constrained Quadratic Programming
When applying eigenvalue decomposition on the quadratic term matrix in a type of linear equally constrained quadratic programming (EQP), there exists a linear mapping to project optimal solutions between the new EQP formulation where $Q$ is diagonalized and the original formulation. Although such a mapping requires a particular type of equality constraints, it is generalizable to some real problems such as efficient frontier for portfolio allocation and classification of Least Square Support Vector Machines (LSSVM). The established mapping could be potentially useful to explore optimal solutions in subspace, but it is not very clear to the author. This work was inspired by similar work proved on unconstrained formulation discussed earlier in \cite{Tan}, but its current proof is much improved and generalized. To the author's knowledge, very few similar discussion appears in literature.
Discovering Imperfectly Observable Adversarial Actions using Anomaly Detection
Petrova, Olga, Durkota, Karel, Alperovich, Galina, Horak, Karel, Najman, Michal, Bosansky, Branislav, Lisy, Viliam
Anomaly detection is a method for discovering unusual and suspicious behavior. In many real-world scenarios, the examined events can be directly linked to the actions of an adversary, such as attacks on computer networks or frauds in financial operations. While the defender wants to discover such malicious behavior, the attacker seeks to accomplish their goal (e.g., exfiltrating data) while avoiding the detection. To this end, anomaly detectors have been used in a game-theoretic framework that captures these goals of a two-player competition. We extend the existing models to more realistic settings by (1) allowing both players to have continuous action spaces and by assuming that (2) the defender cannot perfectly observe the action of the attacker. We propose two algorithms for solving such games -- a direct extension of existing algorithms based on discretizing the feature space and linear programming and the second algorithm based on constrained learning. Experiments show that both algorithms are applicable for cases with low feature space dimensions but the learning-based method produces less exploitable strategies and it is scalable to higher dimensions. Moreover, we use real-world data to compare our approaches with existing classifiers in a data-exfiltration scenario via the DNS channel. The results show that our models are significantly less exploitable by an informed attacker.
Moment-Based Domain Adaptation: Learning Bounds and Algorithms
This thesis contributes to the mathematical foundation of domain adaptation as emerging field in machine learning. In contrast to classical statistical learning, the framework of domain adaptation takes into account deviations between probability distributions in the training and application setting. Domain adaptation applies for a wider range of applications as future samples often follow a distribution that differs from the ones of the training samples. A decisive point is the generality of the assumptions about the similarity of the distributions. Therefore, in this thesis we study domain adaptation problems under as weak similarity assumptions as can be modelled by finitely many moments.
Up or Down? Adaptive Rounding for Post-Training Quantization
Nagel, Markus, Amjad, Rana Ali, van Baalen, Mart, Louizos, Christos, Blankevoort, Tijmen
When quantizing neural networks, assigning each floating-point weight to its nearest fixed-point value is the predominant approach. We find that, perhaps surprisingly, this is not the best we can do. In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss. AdaRound is fast, does not require fine-tuning of the network, and only uses a small amount of unlabelled data. We start by theoretically analyzing the rounding problem for a pre-trained neural network. By approximating the task loss with a Taylor series expansion, the rounding task is posed as a quadratic unconstrained binary optimization problem. We simplify this to a layer-wise local loss and propose to optimize this loss with a soft relaxation. AdaRound not only outperforms rounding-to-nearest by a significant margin but also establishes a new state-of-the-art for post-training quantization on several networks and tasks. Without fine-tuning, we can quantize the weights of Resnet18 and Resnet50 to 4 bits while staying within an accuracy loss of 1%.
Assortative-Constrained Stochastic Block Models
Gribel, Daniel, Vidal, Thibaut, Gendreau, Michel
Stochastic block models (SBMs) are often used to find assortative community structures in networks, such that the probability of connections within communities is higher than in between communities. However, classic SBMs are not limited to assortative structures. In this study, we discuss the implications of this model-inherent indifference towards assortativity or disassortativity, and show that this characteristic can lead to undesirable outcomes for networks which are presupposedy assortative but which contain a reduced amount of information. To circumvent this issue, we introduce a constrained SBM that imposes strong assortativity constraints, along with efficient algorithmic approaches to solve it. These constraints significantly boost community recovery capabilities in regimes that are close to the information-theoretic threshold. They also permit to identify structurally-different communities in networks representing cerebral-cortex activity regions.
Verification of Markov Decision Processes with Risk-Sensitive Measures
We develop a method for computing policies in Markov decision processes with risk-sensitive measures subject to temporal logic constraints. Specifically, we use a particular risk-sensitive measure from cumulative prospect theory, which has been previously adopted in psychology and economics. The nonlinear transformation of the probabilities and utility functions yields a nonlinear programming problem, which makes computation of optimal policies typically challenging. We show that this nonlinear weighting function can be accurately approximated by the difference of two convex functions. This observation enables efficient policy computation using convex-concave programming. We demonstrate the effectiveness of the approach on several scenarios.
Safe Screening Rules for $\ell_0$-Regression
Atamtรผrk, Alper, Gรณmez, Andrรฉs
We give safe screening rules to eliminate variables from regression with $\ell_0$ regularization or cardinality constraint. These rules are based on guarantees that a feature may or may not be selected in an optimal solution. The screening rules can be computed from a convex relaxation solution in linear time, without solving the $\ell_0$ optimization problem. Thus, they can be used in a preprocessing step to safely remove variables from consideration apriori. Numerical experiments on real and synthetic data indicate that, on average, 76\% of the variables can be fixed to their optimal values, hence, reducing the computational burden for optimization substantially. Therefore, the proposed fast and effective screening rules extend the scope of algorithms for $\ell_0$-regression to larger data sets.
AutoAI set to make it easy to create machine learning algorithms
Artificial intelligence has the potential to greatly simplify our lives โ but not everyone is a data scientist and not all data scientists are experts in machine learning. Enter AutoAI โ a novel approach of designing, training and optimizing machine learning models automatically. With AutoAI, anyone could soon build machine learning pipelines from raw data directly, without writing complex code and performing tedious tuning and optimization, to then automate complicated, labor-intensive tasks. Several IBM papers selected for the AAAI-20 conference in New York demonstrate the value of AutoAI and different approaches to it in great detail. Most AutoAI research currently focuses on three areas: automatically determining the best models for each step of the desired machine learning and data science pipeline (model selection), automatically finding the best architecture of a deep learning-based AI model, and automatically finding the best hyperparameters (parameters for the model training process) for AI models and algorithms.
Model-Predictive Control via Cross-Entropy and Gradient-Based Optimization
Bharadhwaj, Homanga, Xie, Kevin, Shkurti, Florian
Recent works in high-dimensional model-predictive control and model-based reinforcement learning with learned dynamics and reward models have resorted to population-based optimization methods, such as the Cross-Entropy Method (CEM), for planning a sequence of actions. To decide on an action to take, CEM conducts a search for the action sequence with the highest return according to the dynamics model and reward. Action sequences are typically randomly sampled from an unconditional Gaussian distribution and evaluated on the environment. This distribution is iteratively updated towards action sequences with higher returns. However, this planning method can be very inefficient, especially for high-dimensional action spaces. An alternative line of approaches optimize action sequences directly via gradient descent, but are prone to local optima. We propose a method to solve this planning problem by interleaving CEM and gradient descent steps in optimizing the action sequence. Our experiments show faster convergence of the proposed hybrid approach, even for high-dimensional action spaces, avoidance of local minima, and better or equal performance to CEM. Code accompanying the paper is available here 1 .
Accumulator Bet Selection Through Stochastic Diffusion Search
The global sports betting market is worth an estimated $700 billion annually Flepp et al. (2017), and association football (also known as soccer or simply football), being the world's most popular spectator sport, constitutes around 70% of this ever-growing market Constantinou et al. (2012). The last decade has thus seen the emergence of numerous online and offline bookmakers, offering bettors the possibility to place wagers on the results of football matches in more than a hundred different leagues, worldwide. The sports betting industry offers a unique and very popular betting product known as an accumulator bet. In contrast with a single bet, which consists in betting on a single event for a payout equal to the stake (i.e. the sum wagered) multiplied by the odds set by the bookmaker for that event, an accumulator bet combines more than one (and generally less than seven) events into a single wager that pays out only when all individual events are correctly predicted. The payout for a correct accumulator bet is the stake multiplied by the product of the odds of all its constituting wagers. However, if one of these wagers is incorrect, the entire accumulator bet would lose. Thus, this product offers both significantly higher potential payouts and higher risks than single bets, and the large pool of online bookmakers, leagues and, matches that bettors can access nowadays has increased both the complexity of selecting a set of matches to place an accumulator bet on, and the number of opportunities to identify winning combinations. With the rise of sports analytics, a wide variety of statistical models for predicting the outcomes of football matches have been proposed, a good review of which can be found in Langseth (2013).