Goto

Collaborating Authors

 Mathematical & Statistical Methods


Brushing Up Linear Algebra-Vectors โ€“ Srishti Sawla โ€“ Medium

#artificialintelligence

This is my first blog post of my Blogging Series "Brushing Up Linear Algebra" Most of us try to learn Mathematics and its applications but end up just learning the numerical computations.Mathematics is more than numerical computations.The Magic of Mathematics begins when we understand the geometrical interpretations of the numerical computations.Linear Algebra is not just about Matrices,Determinants and Vectors its about Understanding. This post is about Vectors. Almost each of us would have been come across vectors whether we study Mathematics,Physics or Computer Science. Lets try to understand Vectors from each of the perspectives-Mathematics,Physics and Computer Science and build an intuition about Vectors. Physics:Vectors are arrows pointing in space.What defines a vector is its length(magnitude) and direction.As shown in image the tail can also be called has Initial point and the head as Terminal point Computer Science: Vectors are ordered list of numbers or we can say another name for list.You would be more familiar with this definition if you have studied R. Lets us say we are defining a car on the basis of its kmph and its Price.Then we can represent the car using a vector say [35kmph, Rs10,00,000] .


Rao-Blackwellized Stochastic Gradients for Discrete Distributions

arXiv.org Machine Learning

We wish to compute the gradient of an expectation over a finite or countably infinite sample space having $K \leq \infty$ categories. When $K$ is indeed infinite, or finite but very large, the relevant summation is intractable. Accordingly, various stochastic gradient estimators have been proposed. In this paper, we describe a technique that can be applied to reduce the variance of any such estimator, without changing its bias---in particular, unbiasedness is retained. We show that our technique is an instance of Rao-Blackwellization, and we demonstrate the improvement it yields in empirical studies on both synthetic and real-world data.


Basic Linear Algebra for Deep Learning โ€“ Towards Data Science

#artificialintelligence

Linear Algebra is a continuous form of mathematics and is applied throughout science and engineering because it allows you to model natural phenomena and to compute them efficiently. Because it is a form of continuous and not discrete mathematics, a lot of computer scientists don't have a lot of experience with it. Linear Algebra is also central to almost all areas of mathematics like geometry and functional analysis. Its concepts are a crucial prerequisite for understanding the theory behind Machine Learning, especially if you are working with Deep Learning Algorithms. You don't need to understand Linear Algebra before getting started with Machine Learning, but at some point, you may want to gain a better understanding of how the different Machine Learning algorithms really work under the hood.


Math Titans Clash Over Epic Proof of the ABC Conjecture

WIRED

In a report posted online last week, Peter Scholze of the University of Bonn and Jakob Stix of Goethe University Frankfurt describe what Stix calls a "serious, unfixable gap" within a mammoth series of papers by Shinichi Mochizuki, a mathematician at Kyoto University who is renowned for his brilliance. Posted online in 2012, Mochizuki's papers supposedly prove the abc conjecture, one of the most far-reaching problems in number theory. Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences. Despite multiple conferences dedicated to explicating Mochizuki's proof, number theorists have struggled to come to grips with its underlying ideas. His series of papers, which total more than 500 pages, are written in an impenetrable style, and refer back to a further 500 pages or so of previous work by Mochizuki, creating what one mathematician, Brian Conrad of Stanford University, has called "a sense of infinite regress."


Manifold Alignment with Feature Correspondence

arXiv.org Machine Learning

We propose a novel framework for combining datasets via alignment of their associated intrinsic dimensions. Our approach assumes that two datasets are sampled from a common latent space, i.e., they measure equivalent systems. Thus, we expect there to exist a natural (albeit unknown) alignment of the data manifolds associated with the intrinsic geometry of these datasets, which are perturbed by measurement artifacts in the sampling process. Importantly, we do not assume any individual correspondence (partial or complete) between data points. Instead, we rely on our assumption that a subset of data features have correspondence across datasets. We leverage this assumption to estimate relations between intrinsic manifold dimensions, which are given by diffusion map coordinates over each of the datasets. We compute a correlation matrix between diffusion coordinates of the datasets by considering graph (or manifold) Fourier coefficients of corresponding data features. We then orthogonalize this correlation matrix to form an isometric transformation between the diffusion maps of the datasets. Finally, we apply this transformation to the diffusion coordinates and construct a unified diffusion geometry of the datasets together. We show that this approach successfully corrects misalignment artifacts and enables data integration.


The Partially Observable Games We Play for Cyber Deception

arXiv.org Artificial Intelligence

Progressively intricate cyber infiltration mechanisms have made conventional means of defense, such as firewalls and malware detectors, incompetent. These sophisticated infiltration mechanisms can study the defender's behavior, identify security caveats, and modify their actions adaptively. To tackle these security challenges, cyber-infrastructures require active defense techniques that incorporate cyber deception, in which the defender (deceiver) implements a strategy to mislead the infiltrator. To this end, we use a two-player partially observable stochastic game (POSG) framework, wherein the deceiver has full observability over the states of the POSG, and the infiltrator has partial observability. Then, the deception problem is to compute a strategy for the deceiver that minimizes the expected cost of deception against all strategies of the infiltrator. We first show that the underlying problem is a robust mixed-integer linear program, which is intractable to solve in general. Towards a scalable approach, we compute optimal finite-memory strategies for the infiltrator by a reduction to a series of synthesis problems for parametric Markov decision processes. We use these infiltration strategies to find robust strategies for the deceiver using mixed-integer linear programming. We illustrate the performance of our technique on a POSG model for network security. Our experiments demonstrate that the proposed approach handles scenarios considerably larger than those of the state-of-the-art methods.


Envy-Free Classification

arXiv.org Machine Learning

In classic fair division problems such as cake cutting and rent division, envy-freeness requires that each individual (weakly) prefer his allocation to anyone else's. On a conceptual level, we argue that envy-freeness also provides a compelling notion of fairness for classification tasks. Our technical focus is the generalizability of envy-free classification, i.e., understanding whether a classifier that is envy free on a sample would be almost envy free with respect to the underlying distribution with high probability. Our main result establishes that a small sample is sufficient to achieve such guarantees, when the classifier in question is a mixture of deterministic classifiers that belong to a family of low Natarajan dimension.


Multi-hop assortativities for networks classification

arXiv.org Machine Learning

Several social, medical, engineering and biological challenges rely on discovering the functionality of networks from their structure and node metadata, when is available. For example, in chemoinformatics one might want to detect whether a molecule is toxic based on structure and atomic types, or discover the research field for scientific collaboration networks. Existing techniques rely on counting or measuring structural patterns that are known to show large variations from network to network, such as number of triangles, or the assortativity of node metadata. We introduce the concept of multi-hop assortativity, that captures the similarity of node situated at the extremities of a randomly selected path of a given length. We show that multi-hop assortativity unifies various existing concepts and offers a versatile family of fingerprints to characterize networks. These fingerprints allow in turn to recover the functionalities of a network, with the help of the machine learning toolbox. Our method is evaluated empirically on established social and chemoinformatic network benchmarks. Results reveal that our assortativity based features are competitive providing highly accurate results often outperforming state of the art methods for the network classification task


Choosing to Rank

arXiv.org Machine Learning

Ranking data arises in a wide variety of application areas, generated both by complex algorithms and by human subjects, but remains difficult to model, learn from, and predict. Particularly when generated by humans, ranking datasets often feature multiple modes, intransitive aggregate preferences, or incomplete rankings, but popular probabilistic models such as the Plackett-Luce and Mallows models are too rigid to capture such complexities. In this work, we frame ranking as a sequence of discrete choices and then leverage recent advances in discrete choice modeling to build flexible and tractable models of ranking data. The basic building block of our connection between ranking and choice is the idea of repeated selection, first used to build the Plackett-Luce ranking model from the multinomial logit (MNL) choice model by repeatedly applying the choice model to a dwindling set of alternatives. We derive conditions under which repeated selection can be applied to other choice models to build new ranking models, addressing specific subtleties with modeling mixed-length top-k rankings as repeated selection. We translate several choice axioms through our framework, providing structure to our ranking models inherited from the underlying choice models. To train models from data, we transform ranking data into choice data and employ standard techniques for training choice models. We find that our ranking models provide higher out-of-sample likelihood when compared to Plackett-Luce and Mallows models on a broad collection of ranking tasks including food preferences, ranked-choice elections, car racing, and search engine relevance ranking data.


Stochastically Controlled Stochastic Gradient for the Convex and Non-convex Composition problem

arXiv.org Machine Learning

In this paper, we consider the convex and non-convex composition problem with the structure $\frac{1}{n}\sum\nolimits_{i = 1}^n {{F_i}( {G( x )} )}$, where $G( x )=\frac{1}{n}\sum\nolimits_{j = 1}^n {{G_j}( x )} $ is the inner function, and $F_i(\cdot)$ is the outer function. We explore the variance reduction based method to solve the composition optimization. Due to the fact that when the number of inner function and outer function are large, it is not reasonable to estimate them directly, thus we apply the stochastically controlled stochastic gradient (SCSG) method to estimate the gradient of the composition function and the value of the inner function. The query complexity of our proposed method for the convex and non-convex problem is equal to or better than the current method for the composition problem. Furthermore, we also present the mini-batch version of the proposed method, which has the improved the query complexity with related to the size of the mini-batch.