Mathematical & Statistical Methods
Computing Optimal Monitoring Strategy for Detecting Terrorist Plots
Wang, Zhen (Nanyang Technological University) | Yin, Yue (University of Chinese Academy of Sciences) | An, Bo (Nanyang Technological University)
In recent years, terrorist organizations (e.g., ISIS or al-Qaeda) are increasingly directing terrorists to launch coordinated attacks in their home countries. One example is the Paris shootings on January 7, 2015.By monitoring potential terrorists, security agencies are able to detect and stop terrorist plots at their planning stage.Although security agencies may have knowledge about potential terrorists (e.g., who they are, how they interact), they usually have limited resources and cannot monitor all terrorists.Moreover, a terrorist planner may strategically choose to arouse terrorists considering the security agency's monitoring strategy. This paper makes five key contributions toward the challenging problem of computing optimal monitoring strategies: 1) A new Stackelberg game model for terrorist plot detection;2) A modified double oracle framework for computing the optimal strategy effectively;3) Complexity results for both defender and attacker oracle problems;4) Novel mixed-integer linear programming (MILP) formulations for best response problems of both players;and 5) Effective approximation algorithms for generating suboptimal responses for both players.Experimental evaluation shows that our approach can obtain a robust enough solution outperforming widely-used centrality based heuristics significantly and scale up to realistic-sized problems.
The Complexity Landscape of Decompositional Parameters for ILP
Ganian, Robert (Technische Universitรคt Wien) | Ordyniak, Sebastian (Technische Universitรคt Wien)
Integer Linear Programming (ILP) can be seen as the archetypical problem for NP-complete optimization problems, and a wide range of problems in artificial intelligence are solved in practice via a translation to ILP. Despite its huge range of applications, only few tractable fragments of ILP are known, probably the most prominent of which is based on the notion of total unimodularity. Using entirely different techniques, we identify new tractable fragments of ILP by studying structural parameterizations of the constraint matrix within the framework of parameterized complexity. In particular, we show that ILP is fixed-parameter tractable when parameterized by the treedepth of the constraint matrix and the maximum absolute value of any coefficient occurring in the ILP instance. Together with matching hardness results for the more general parameter treewidth, we draw a detailed complexity landscape of ILP w.r.t. decompositional parameters defined on the constraint matrix.
RELOOP: A Python-Embedded Declarative Language for Relational Optimization
Mladenov, Martin (TU Dortmund University) | Heinrich, Danny (TU Dortmund University) | Kleinhans, Leonard (TU Dortmund University) | Gonsior, Felix (TU Dortmund University) | Kersting, Kristian (TU Dortmund University)
We present RELOOP, a domain-specific language for relational optimization embedded in Python. It allows the user to express relational optimization problems in a natural syntax that follows logic and linear algebra, rather than in the restrictive standard form required by solvers, and can automatically compile the model to a lower-order but equivalent model. Moreover, RELOOP makes it easy to combine relational optimization with high-level features of Python such as loops, parallelism and interfaces to relational databases.
What linear algebra is good for machine learning? โข /r/MachineLearning
I'm definitely interested in this question too, as someone who is self-taught about machine learning and skipped over a lot of the mathematical theory. I know that certain areas are very important, such as matrix factorisation methods which are huge. And algorithms such as neural networks rely heavily on lots of aspects of linear algebra (see https://www.utdallas.edu/ The kernel trick in SVM is also founded in linear algebra (the dot product). It'd be interesting to hear an expert elaborate on this.
Some Insights About the Small Ball Probability Factorization for Hilbert Random Elements
Asymptotic factorizations for the small-ball probability (SmBP) of a Hilbert valued random element $X$ are rigorously established and discussed. In particular, given the first $d$ principal components (PCs) and as the radius $\varepsilon$ of the ball tends to zero, the SmBP is asymptotically proportional to (a) the joint density of the first $d$ PCs, (b) the volume of the $d$-dimensional ball with radius $\varepsilon$, and (c) a correction factor weighting the use of a truncated version of the process expansion. Moreover, under suitable assumptions on the spectrum of the covariance operator of $X$ and as $d$ diverges to infinity when $\varepsilon$ vanishes, some simplifications occur. In particular, the SmBP factorizes asymptotically as the product of the joint density of the first $d$ PCs and a pure volume parameter. All the provided factorizations allow to define a surrogate intensity of the SmBP that, in some cases, leads to a genuine intensity. To operationalize the stated results, a non-parametric estimator for the surrogate intensity is introduced and it is proved that the use of estimated PCs, instead of the true ones, does not affect the rate of convergence. Finally, as an illustration, simulations in controlled frameworks are provided.
Three interesting but little known programming languages
Julia is a high-level dynamic programming language designed to address the requirements of high-performance numerical and scientific computing while also being effective for general purpose programming.[1][2][3][4] Unusual aspects of Julia's design include having a type system with parametric types in a fully dynamic programming language and adopting multiple dispatch as its core programming paradigm. It allows for parallel and distributed computing; and direct calling of C and Fortran libraries without a compiler without glue code and includes best-of-breed libraries for floating-point, linear algebra, random number generation, fast Fourier transforms, and regular expression matching. Julia's core is implemented in C and C, its parser in Scheme, and the LLVM compiler framework is used for just-in-time generation of machine code. The standard library is implemented in Julia itself, using the Node.js's The most notable aspect of Julia's implementation is its speed, which is often within a factor of two of fully optimized C code.[5] Development of Julia began in 2009 and an open-source version was publicized in February 2012.[6][7]
Computer algebra system - Wikipedia, the free encyclopedia
A computer algebra system (CAS) is a software program that allows computation over mathematical expressions in a way which is similar to the traditional manual computations of mathematicians and scientists. The development of the computer algebra systems in the second half of the 20th century is part of the discipline of "computer algebra" or "symbolic computation", which has spurred work in algorithms over mathematical objects such as polynomials. Computer algebra systems may be divided in two classes: the specialized ones and the general purpose ones. The specialized ones are devoted to a specific part of mathematics, such as number theory, group theory, or teaching of elementary mathematics. General purpose computer algebra systems aim to be useful to a user working in any scientific field that requires manipulation of mathematical expressions. The library must cover not only the needs of the users, but also the needs of the simplifier.
Biostatistics Careers for Data Scientists
Analytics is becoming critical in all part of our lives. Biostatistics has been a big driver of this analytics demand in the field of pharmaceuticals, biotech, health & medicine. Biostatistics (or biometry) is the application of statistics to a wide range of topics in biology. The science of biostatistics encompasses the design of biological experiments, especially in medicine, pharmacy, agriculture and fishery; the collection, summarization, and analysis of data from those experiments; and the interpretation of, and inference from, the results. A major branch of this is medical biostatistics,[1] which is exclusively concerned with medicine and health.
Three myths about data scientists and big data
What I found useful during my PhD (this could apply to master program too) is that I immediately started to work for a company on GIS, digital cartography, and water management (predicting extreme floods locally - how much the water could rise, at worse in 100 years, at any (x,y) coordinate on a digital map, modeling how any drop of water falling somewhere runs down, goes underground, eventually reaches low elevation and merges with other water drops on the way down - the digital maps had elevation and land use data available for each pixel; by land use I mean crop, forest, water, rock and so on, as this is important to model how water moves). Very applied and interesting stuff. My first paper (after an article about flood predictions, in a local specialized journal) was in Journal of Number Theory though I never attended classes on number theory. I then started to publish in computational statistics journal, but also in IEEE Pattern Analysis and Machine Intelligence, and Journal of the Royal Statistical Society, series B. I'm currently finishing a book on data science (Wiley, exp. The take away from this is that it helps getting polyvalent, if the PhD/Master student can do applied work for a real company, hired and paid as a real employee (partnership between university and private sector), at the beginning of his program. In my case, it was a small R&D company (20 people) so I had the chance to be exposed to many things, not least learning how to write good code used by a team, for real apps (for instance merging hundreds of small images to produce a big map, rotating, filtering images taken by a plane, make sure roads were not broken when moving from one image to another, and putting the whole stuff into some kind of hierarchical database to retrieve and display any portion of the map very fast including adjacent parts, to the end user querying the database).