Mathematical & Statistical Methods
Neur2SP: Neural Two-Stage Stochastic Programming
Dumouchelle, Justin, Patel, Rahul, Khalil, Elias B., Bodur, Merve
Stochastic Programming is a powerful modeling framework for decision-making under uncertainty. In this work, we tackle two-stage stochastic programs (2SPs), the most widely used class of stochastic programming models. Solving 2SPs exactly requires optimizing over an expected value function that is computationally intractable. Having a mixed-integer linear program (MIP) or a nonlinear program (NLP) in the second stage further aggravates the intractability, even when specialized algorithms that exploit problem structure are employed. Finding high-quality (first-stage) solutions -- without leveraging problem structure -- can be crucial in such settings. We develop Neur2SP, a new method that approximates the expected value function via a neural network to obtain a surrogate model that can be solved more efficiently than the traditional extensive formulation approach. Neur2SP makes no assumptions about the problem structure, in particular about the second-stage problem, and can be implemented using an off-the-shelf MIP solver. Our extensive computational experiments on four benchmark 2SP problem classes with different structures (containing MIP and NLP second-stage problems) demonstrate the efficiency (time) and efficacy (solution quality) of Neur2SP. In under 1.66 seconds, Neur2SP finds high-quality solutions across all problems even as the number of scenarios increases, an ideal property that is difficult to have for traditional 2SP solution techniques. Namely, the most generic baseline method typically requires minutes to hours to find solutions of comparable quality.
Sample Constrained Treatment Effect Estimation
Addanki, Raghavendra, Arbour, David, Mai, Tung, Musco, Cameron, Rao, Anup
Treatment effect estimation is a fundamental problem in causal inference. We focus on designing efficient randomized controlled trials, to accurately estimate the effect of some treatment on a population of $n$ individuals. In particular, we study sample-constrained treatment effect estimation, where we must select a subset of $s \ll n$ individuals from the population to experiment on. This subset must be further partitioned into treatment and control groups. Algorithms for partitioning the entire population into treatment and control groups, or for choosing a single representative subset, have been well-studied. The key challenge in our setting is jointly choosing a representative subset and a partition for that set. We focus on both individual and average treatment effect estimation, under a linear effects model. We give provably efficient experimental designs and corresponding estimators, by identifying connections to discrepancy minimization and leverage-score-based sampling used in randomized numerical linear algebra. Our theoretical results obtain a smooth transition to known guarantees when $s$ equals the population size. We also empirically demonstrate the performance of our algorithms.
Giga-scale Kernel Matrix Vector Multiplication on GPU
Hu, Robert, Chau, Siu Lun, Sejdinovic, Dino, Glaunรจs, Joan Alexis
Kernel matrix-vector multiplication (KMVM) is a foundational operation in machine learning and scientific computing. However, as KMVM tends to scale quadratically in both memory and time, applications are often limited by these computational constraints. In this paper, we propose a novel approximation procedure coined \textit{Faster-Fast and Free Memory Method} ($\fthreem$) to address these scaling issues of KMVM for tall~($10^8\sim 10^9$) and skinny~($D\leq7$) data. Extensive experiments demonstrate that $\fthreem$ has empirical \emph{linear time and memory} complexity with a relative error of order $10^{-3}$ and can compute a full KMVM for a billion points \emph{in under a minute} on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods. We demonstrate the utility of our procedure by applying it as a drop-in for the state-of-the-art GPU-based linear solver FALKON, \emph{improving speed 1.5-5.5 times} at the cost of $<1\%$ drop in accuracy. We further demonstrate competitive results on \emph{Gaussian Process regression} coupled with significant speedups on a variety of real-world datasets.
[100%OFF] The Complete Mathematics Software Developer Course For 2022
This course covers all Mathematics needed to become Software Developer. Here we will discuss Linear Algebra, Modern Analysis, Mathematical Logic, Number Theory and Discrete Mathematics. By the end of this course you will be able to analyze and describe computer science concepts and methods. This course is a great opportunity for you to gain deep understanding of all processes a executed in the computer system when programming. Discrete mathematics is the study of mathematical structures that are fundamentally discrete rather than continuous. In contrast to real numbers that have the property of varying "smoothly", the objects studied in discrete mathematics โ such as integers, graphs, and statements in logic โ do not vary smoothly in this way, but have distinct, separated values.Discrete mathematics therefore excludes topics in "continuous mathematics" such as calculus or Euclidean geometry.
Zero-Order One-Point Estimate with Distributed Stochastic Gradient-Tracking Technique
Mhanna, Elissa, Assaad, Mohamad
In this work, we consider a distributed multi-agent stochastic optimization problem, where each agent holds a local objective function that is smooth and convex, and that is subject to a stochastic process. The goal is for all agents to collaborate to find a common solution that optimizes the sum of these local functions. With the practical assumption that agents can only obtain noisy numerical function queries at exactly one point at a time, we extend the distributed stochastic gradient-tracking method to the bandit setting where we don't have an estimate of the gradient, and we introduce a zero-order (ZO) one-point estimate (1P-DSGT). We analyze the convergence of this novel technique for smooth and convex objectives using stochastic approximation tools, and we prove that it converges almost surely to the optimum. We then study the convergence rate for when the objectives are additionally strongly convex. We obtain a rate of $O(\frac{1}{\sqrt{k}})$ after a sufficient number of iterations $k > K_2$ which is usually optimal for techniques utilizing one-point estimators. We also provide a regret bound of $O(\sqrt{k})$, which is exceptionally good compared to the aforementioned techniques. We further illustrate the usefulness of the proposed technique using numerical experiments.
A Hybrid Modelling Approach for Aerial Manipulators
Kremer, Paul, Sanchez-Lopez, Jose Luis, Voos, Holger
Aerial manipulators (AM) exhibit particularly challenging, non-linear dynamics; the UAV and the manipulator it is carrying form a tightly coupled dynamic system, mutually impacting each other. The mathematical model describing these dynamics forms the core of many solutions in non-linear control and deep reinforcement learning. Traditionally, the formulation of the dynamics involves Euler angle parametrization in the Lagrangian framework or quaternion parametrization in the Newton-Euler framework. The former has the disadvantage of giving birth to singularities and the latter of being algorithmically complex. This work presents a hybrid solution, combining the benefits of both, namely a quaternion approach leveraging the Lagrangian framework, connecting the singularity-free parameterization with the algorithmic simplicity of the Lagrangian approach. We do so by offering detailed insights into the kinematic modeling process and the formulation of the dynamics of a general aerial manipulator. The obtained dynamics model is validated experimentally against a real-time physics engine. A practical application of the obtained dynamics model is shown in the context of a computed torque feedback controller (feedback linearization), where we analyze its real-time capability with increasingly complex models.
Number Theory Meets Linguistics: Modelling Noun Pluralisation Across 1497 Languages Using 2-adic Metrics
Baker, Gregory, Molla-Aliod, Diego
It has been known in the mathematical community In this paper we use a simple and naive approach since 1897 --- although only clearly since for converting vocabulary words into vectors: use (Hensel, 1918) -- that there is an unusual and unexpected whatever the unicode bit sequence for the word family of distance metrics based on prime would be; this bit sequence can also be viewed as an numbers which can be used instead of Euclidean integer vector with one element. This is of course metrics, which have infinitesimals (to support calculus), extremely arbitrary and subject to the whims of the triangle inequality (to support geometry), the unicode consortium, but it is the most common and other useful properties all the while maintaining way to represent text from any human language on mathematical consistency. They are known a computer.
Large-Scale Differentiable Causal Discovery of Factor Graphs
Lopez, Romain, Hรผtter, Jan-Christian, Pritchard, Jonathan K., Regev, Aviv
A common theme in causal inference is learning causal relationships between observed variables, also known as causal discovery. This is usually a daunting task, given the large number of candidate causal graphs and the combinatorial nature of the search space. Perhaps for this reason, most research has so far focused on relatively small causal graphs, with up to hundreds of nodes. However, recent advances in fields like biology enable generating experimental data sets with thousands of interventions followed by rich profiling of thousands of variables, raising the opportunity and urgent need for large causal graph models. Here, we introduce the notion of factor directed acyclic graphs (f-DAGs) as a way to restrict the search space to non-linear low-rank causal interaction models. Combining this novel structural assumption with recent advances that bridge the gap between causal discovery and continuous optimization, we achieve causal discovery on thousands of variables. Additionally, as a model for the impact of statistical noise on this estimation procedure, we study a model of edge perturbations of the f-DAG skeleton based on random graphs and quantify the effect of such perturbations on the f-DAG rank. This theoretical analysis suggests that the set of candidate f-DAGs is much smaller than the whole DAG space and thus may be more suitable as a search space in the high-dimensional regime where the underlying skeleton is hard to assess. We propose Differentiable Causal Discovery of Factor Graphs (DCD-FG), a scalable implementation of -DAG constrained causal discovery for high-dimensional interventional data. DCD-FG uses a Gaussian non-linear low-rank structural equation model and shows significant improvements compared to state-of-the-art methods in both simulations as well as a recent large-scale single-cell RNA sequencing data set with hundreds of genetic interventions.
LGTBIDS: Layer-wise Graph Theory Based Intrusion Detection System in Beyond 5G
Shafi, Misbah, Jha, Rakesh Kumar, Jain, Sanjeev
The advancement in wireless communication technologies is becoming more demanding and pervasive. One of the fundamental parameters that limit the efficiency of the network are the security challenges. The communication network is vulnerable to security attacks such as spoofing attacks and signal strength attacks. Intrusion detection signifies a central approach to ensuring the security of the communication network. In this paper, an Intrusion Detection System based on the framework of graph theory is proposed. A Layerwise Graph Theory-Based Intrusion Detection System (LGTBIDS) algorithm is designed to detect the attacked node. The algorithm performs the layer-wise analysis to extract the vulnerable nodes and ultimately the attacked node(s). For each layer, every node is scanned for the possibility of susceptible node(s). The strategy of the IDS is based on the analysis of energy efficiency and secrecy rate. The nodes with the energy efficiency and secrecy rate beyond the range of upper and lower thresholds are detected as the nodes under attack. Further, detected node(s) are transmitted with a random sequence of bits followed by the process of re-authentication. The obtained results validate the better performance, low time computations, and low complexity. Finally, the proposed approach is compared with the conventional solution of intrusion detection.