Mathematical & Statistical Methods
Enhancing Observability in Distribution Grids using Smart Meter Data
Bhela, Siddharth, Kekatos, Vassilis, Veeramachaneni, Sriharsha
Due to limited metering infrastructure, distribution grids are currently challenged by observability issues. On the other hand, smart meter data, including local voltage magnitudes and power injections, are communicated to the utility operator from grid buses with renewable generation and demand-response programs. This work employs grid data from metered buses towards inferring the underlying grid state. To this end, a coupled formulation of the power flow problem (CPF) is put forth. Exploiting the high variability of injections at metered buses, the controllability of solar inverters, and the relative time-invariance of conventional loads, the idea is to solve the non-linear power flow equations jointly over consecutive time instants. An intuitive and easily verifiable rule pertaining to the locations of metered and non-metered buses on the physical grid is shown to be a necessary and sufficient criterion for local observability in radial networks. To account for noisy smart meter readings, a coupled power system state estimation (CPSSE) problem is further developed. Both CPF and CPSSE tasks are tackled via augmented semi-definite program relaxations. The observability criterion along with the CPF and CPSSE solvers are numerically corroborated using synthetic and actual solar generation and load data on the IEEE 34-bus benchmark feeder.
Introduction to Number Theory: Fascinating Facts and Conjectures about Primes and Other Special Numbers
I discuss here off-the-beaten-path beautiful, even spectacular results from number theory: not just about prime numbers, but also about related problems such as integers that are sum of two squares. The connection between these numbers and prime numbers will appear later in this article. A few important unsolved mathematical conjectures are presented in a unified approach, and some new research material is also introduced, especially an attempt at generalizing and unifying concepts related to data set density and limiting distributions. The approach is very applied, focusing on algorithms, simulations, and big data, to help discover fascinating results. Even though some of the most exciting topics of mathematics are discussed here (including fundamental, century-old problems still unresolved as well as brand new hypotheses), most of the article can be understood by the layman. Among other things, you will learn some new ways to estimate Pi based on non-traditional experiments, or how a conjecture for prime numbers somehow generalizes to apply to Fibonacci numbers as well.
Safety Verification and Control for Collision Avoidance at Road Intersections
Ahn, Heejin, Del Vecchio, Domitilla
This paper presents the design of a supervisory algorithm that monitors safety at road intersections and overrides drivers with a safe input when necessary. The design of the supervisor consists of two parts: safety verification and control design. Safety verification is the problem to determine if vehicles will be able to cross the intersection without colliding with current drivers' inputs. We translate this safety verification problem into a jobshop scheduling problem, which minimizes the maximum lateness and evaluates if the optimal cost is zero. The zero optimal cost corresponds to the case in which all vehicles can cross each conflict area without collisions. Computing the optimal cost requires solving a Mixed Integer Nonlinear Programming (MINLP) problem due to the nonlinear second-order dynamics of the vehicles. We therefore estimate this optimal cost by formulating two related Mixed Integer Linear Programming (MILP) problems that assume simpler vehicle dynamics. We prove that these two MILP problems yield lower and upper bounds of the optimal cost. We also quantify the worst case approximation errors of these MILP problems. We design the supervisor to override the vehicles with a safe control input if the MILP problem that computes the upper bound yields a positive optimal cost. We theoretically demonstrate that the supervisor keeps the intersection safe and is non-blocking. Computer simulations further validate that the algorithms can run in real time for problems of realistic size.
Facebook's advice to students interested in artificial intelligence
That's the gist of the advice to students interested in AI from Facebook's Yann LeCun and Joaquin Quiรฑonero Candela who run the company's Artificial Intelligence Lab and Applied Machine Learning group respectively. Tech companies often advocate STEM (science, technology, engineering and math), but today's tips are particularly pointed. The pair specifically note that students should eat their vegetables take Calc I, Calc II, Calc III, Linear Algebra, Probability and Statistics as early as possible. From this list, probability and statistics are perhaps the most interesting. From what I remember about high-school, those two subjects are regularly dismissed as too-obvious strategies for skirting the informal AP Calculus preference of top colleges and universities (AP Statistics is often thought of as a cop-out by students).
Facebook's advice to students interested in artificial intelligence
That's the gist of the advice to students interested in AI from Facebook's Yann LeCun and Joaquin Quiรฑonero Candela who run the company's Artificial Intelligence Lab and Applied Machine Learning group respectively. Tech companies often advocate STEM (science, technology, engineering and math), but today's tips are particularly pointed. The pair specifically note that students should eat their vegetables take Calc I, Calc II, Calc III, Linear Algebra, Probability and Statistics as early as possible. From this list, probability and statistics are perhaps the most interesting. From what I remember about high-school, those two subjects are regularly dismissed as too-obvious strategies for skirting the informal AP Calculus preference of top colleges and universities (AP Statistics is often thought of as a cop-out by students).
16 analytic disciplines compared to data science
What are the differences between data science, data mining, machine learning, statistics, operations research, and so on? Here I compare several analytic disciplines that overlap, to explain the differences and common denominators. Sometimes differences exist for nothing else other than historical reasons. Sometimes the differences are real and subtle. I also provide typical job titles, types of analyses, and industries traditionally attached to each discipline. Underlined domains are main sub-domains. It would be great if someone can add an historical perspective to my article. First, let's start by describing data science, the new discipline. Job titles include data scientist, chief scientist, senior analyst, director of analytics and many more.
Think Stats: Probability and Statistics for Programmers
Think Stats emphasizes simple techniques you can use to explore real data sets and answer interesting questions. The book presents a case study using data from the National Institutes of Health. Readers are encouraged to work on a project with real datasets. If you have basic skills in Python, you can use them to learn concepts in probability and statistics. Think Stats is based on a Python library for probability distributions (PMFs and CDFs).
Chi-squared Amplification: Identifying Hidden Hubs
Kannan, Ravi, Vempala, Santosh
We consider the following general hidden hubs model: an $n \times n$ random matrix $A$ with a subset $S$ of $k$ special rows (hubs): entries in rows outside $S$ are generated from the probability distribution $p_0 \sim N(0,\sigma_0^2)$; for each row in $S$, some $k$ of its entries are generated from $p_1 \sim N(0,\sigma_1^2)$, $\sigma_1>\sigma_0$, and the rest of the entries from $p_0$. The problem is to identify the high-degree hubs efficiently. This model includes and significantly generalizes the planted Gaussian Submatrix Model, where the special entries are all in a $k \times k$ submatrix. There are two well-known barriers: if $k\geq c\sqrt{n\ln n}$, just the row sums are sufficient to find $S$ in the general model. For the submatrix problem, this can be improved by a $\sqrt{\ln n}$ factor to $k \ge c\sqrt{n}$ by spectral methods or combinatorial methods. In the variant with $p_0=\pm 1$ (with probability $1/2$ each) and $p_1\equiv 1$, neither barrier has been broken. We give a polynomial-time algorithm to identify all the hidden hubs with high probability for $k \ge n^{0.5-\delta}$ for some $\delta >0$, when $\sigma_1^2>2\sigma_0^2$. The algorithm extends to the setting where planted entries might have different variances each at least as large as $\sigma_1^2$. We also show a nearly matching lower bound: for $\sigma_1^2 \le 2\sigma_0^2$, there is no polynomial-time Statistical Query algorithm for distinguishing between a matrix whose entries are all from $N(0,\sigma_0^2)$ and a matrix with $k=n^{0.5-\delta}$ hidden hubs for any $\delta >0$. The lower bound as well as the algorithm are related to whether the chi-squared distance of the two distributions diverges. At the critical value $\sigma_1^2=2\sigma_0^2$, we show that the general hidden hubs problem can be solved for $k\geq c\sqrt n(\ln n)^{1/4}$, improving on the naive row sum-based method.
Multilinear Low-Rank Tensors on Graphs & Applications
Shahid, Nauman, Grassi, Francesco, Vandergheynst, Pierre
We propose a new framework for the analysis of low-rank tensors which lies at the intersection of spectral graph theory and signal processing. As a first step, we present a new graph based low-rank decomposition which approximates the classical low-rank SVD for matrices and multi-linear SVD for tensors. Then, building on this novel decomposition we construct a general class of convex optimization problems for approximately solving low-rank tensor inverse problems, such as tensor Robust PCA. The whole framework is named as 'Multilinear Low-rank tensors on Graphs (MLRTG)'. Our theoretical analysis shows: 1) MLRTG stands on the notion of approximate stationarity of multi-dimensional signals on graphs and 2) the approximation error depends on the eigen gaps of the graphs. We demonstrate applications for a wide variety of 4 artificial and 12 real tensor datasets, such as EEG, FMRI, BCI, surveillance videos and hyperspectral images. Generalization of the tensor concepts to non-euclidean domain, orders of magnitude speed-up, low-memory requirement and significantly enhanced performance at low SNR are the key aspects of our framework.
Coherent structure coloring: identification of coherent structures from sparse data using graph theory
Schlueter-Kuck, Kristy L., Dabiri, John O.
We present a frame-invariant method for detecting coherent structures from Lagrangian flow trajectories that can be sparse in number, as is the case in many fluid mechanics applications of practical interest. The method, based on principles used in graph coloring and spectral graph drawing algorithms, examines a measure of the kinematic dissimilarity of all pairs of fluid trajectories, either measured experimentally, e.g. using particle tracking velocimetry; or numerically, by advecting fluid particles in the Eulerian velocity field. Coherence is assigned to groups of particles whose kinematics remain similar throughout the time interval for which trajectory data is available, regardless of their physical proximity to one another. Through the use of several analytical and experimental validation cases, this algorithm is shown to robustly detect coherent structures using significantly less flow data than is required by existing spectral graph theory methods.