Accuracy
Tracking as A Whole: Multi-Target Tracking by Modeling Group Behavior with Sequential Detection
Yuan, Yuan, Lu, Yuwei, Wang, Qi
Video-based vehicle detection and tracking is one of the most important components for Intelligent Transportation Systems (ITS). When it comes to road junctions, the problem becomes even more difficult due to the occlusions and complex interactions among vehicles. In order to get a precise detection and tracking result, in this work we propose a novel tracking-by-detection framework. In the detection stage, we present a sequential detection model to deal with serious occlusions. In the tracking stage, we model group behavior to treat complex interactions with overlaps and ambiguities. The main contributions of this paper are twofold: 1) Shape prior is exploited in the sequential detection model to tackle occlusions in crowded scene. 2) Traffic force is defined in the traffic scene to model group behavior, and it can assist to handle complex interactions among vehicles. We evaluate the proposed approach on real surveillance videos at road junctions and the performance has demonstrated the effectiveness of our method.
Achieving the Bayes Error Rate in Synchronization and Block Models by SDP, Robustly
We study the statistical performance of semidefinite programming (SDP) relaxations for clustering under random graph models. Under the $\mathbb{Z}_{2}$ Synchronization model, Censored Block Model and Stochastic Block Model, we show that SDP achieves an error rate of the form \[ \exp\Big[-\big(1-o(1)\big)\bar{n} I^* \Big]. \] Here $\bar{n}$ is an appropriate multiple of the number of nodes and $I^*$ is an information-theoretic measure of the signal-to-noise ratio. We provide matching lower bounds on the Bayes error for each model and therefore demonstrate that the SDP approach is Bayes optimal. As a corollary, our results imply that SDP achieves the optimal exact recovery threshold under each model. Furthermore, we show that SDP is robust: the above bound remains valid under semirandom versions of the models in which the observed graph is modified by a monotone adversary. Our proof is based on a novel primal-dual analysis of SDP under a unified framework for all three models, and the analysis shows that SDP tightly approximates a joint majority voting procedure.
FBI's use of facial recognition software is under fire AGAIN
The FBI has failed to appease concerns about the use of its facial recognition technology in criminal investigations. Multiple issues were raised three years ago after a congressional watchdog urged the bureau to improve its practices in order to meet privacy and accuracy standards. The FBI - and other US law enforcement agencies - have been using the Next Generation Identification-Interstate Photo System since 2015. It uses facial recognition software to link potential suspects to crimes from a vast database of 30 million pictures, including mugshots. The report slamming the FBI for its failure to moderate the software comes as the bureau increases its use of the technology.
Distributed generation of privacy preserving data with user customization
Chen, Xiao, Navidi, Thomas, Ermon, Stefano, Rajagopal, Ram
Distributed devices such as mobile phones can produce and store large amounts of data that can enhance machine learning models; however, this data may contain private information specific to the data owner that prevents the release of the data. We wish to reduce the correlation between user-specific private information and data while maintaining the useful information. Rather than learning a large model to achieve privatization from end to end, we introduce a decoupling of the creation of a latent representation and the privatization of data that allows user-specific privatization to occur in a distributed setting with limited computation and minimal disturbance on the utility of the data. We leverage a Variational Autoencoder (VAE) to create a compact latent representation of the data; however, the VAE remains fixed for all devices and all possible private labels. We then train a small generative filter to perturb the latent representation based on individual preferences regarding the private and utility information. The small filter is trained by utilizing a GAN-type robust optimization that can take place on a distributed device. We conduct experiments on three popular datasets: MNIST, UCI-Adult, and CelebA, and give a thorough evaluation including visualizing the geometry of the latent embeddings and estimating the empirical mutual information to show the effectiveness of our approach. The success of machine learning algorithms relies on not only technical methodologies, but also the availability of large datasets such as images (Krizhevsky et al., 2012; Oshri et al., 2018); however, data can often contain sensitive information, such as race or age, that may hinder the owner's ability to release the data to exploit its utility. We are interested in exploring methods of providing privatized data such that sensitive information cannot be easily inferred from the adversarial perspective, while preserving the utility of the dataset. In particular, we consider a setting where many participants are independently gathering data that will be collected by a third party.
Can Machine Learning Model with Static Features be Fooled: an Adversarial Machine Learning Approach
Taheri, Rahim, Javidan, Reza, Shojafar, Mohammad, P, Vinod, Conti, Mauro
Applied Intelligence manuscript No. (will be inserted by the editor) Abstract The widespread adoption of smartphones dramaticallygenerated by our attacks models when used to harden increases the risk of attacks and the spread the developed anti-malware system improves the detection of mobile malware, especially on the Android platform. Machine learning based solutions have been already Keywords Adversarial machine learning ยท malware used as a tool to supersede signature based anti-malware detection ยท poison attacks ยท adversarial example ยท systems. However, malware authors leverage attributes jacobian algorithm. Hence, to evaluate the vulnerability of machine 1 Introduction learning algorithms in malware detection, we propose five different attack scenarios to perturb malicious applications Nowadays using the Android application is very popular (apps). Every Android application inappropriately fits discriminant function on has a Jar-like APK format and is an archive file which the set of data points, eventually yielding a higher misclassification contains Android manifest and Classes.dex Further, to distinguish the adversarial manifest file holds information about the application examples from benign samples, we propose two defense structure and each part responsible for certain actions. To validate our For instance, the requested permissions must be accepted attacks and solutions, we test our model on three different by the users for successful installation of applications. We also test our methods The manifest file contains a list of hardware using various classifier algorithms and compare them components and permissions required by each application. Promising results show that generated the manifest file that are useful for running applications. Additionally, evasive variants is saved as the classes.dex In a nutshell, the by presenting some adversary-aware approaches?generated malware sample is statistically identical to a Do we require retraining of the current ML model to designbenign sample. To do so, adversaries adopt adversarial adversary-aware learning algorithms? How to properlymachine learning algorithms (AML) to design an example test and validate the countermeasure solutions inset called poison data which is used to fool machine a real-world network? The goal of this paper is to shedlearning models.
Machine learning for early prediction of circulatory failure in the intensive care unit
Hyland, Stephanie L., Faltys, Martin, Hรผser, Matthias, Lyu, Xinrui, Gumbsch, Thomas, Esteban, Cristรณbal, Bock, Christian, Horn, Max, Moor, Michael, Rieck, Bastian, Zimmermann, Marc, Bodenham, Dean, Borgwardt, Karsten, Rรคtsch, Gunnar, Merz, Tobias M.
Intensive care clinicians are presented with large quantities of patient information and measurements from a multitude of monitoring systems. The limited ability of humans to process such complex information hinders physicians to readily recognize and act on early signs of patient deterioration. We used machine learning to develop an early warning system for circulatory failure based on a high-resolution ICU database with 240 patient years of data. This automatic system predicts 90.0% of circulatory failure events (prevalence 3.1%), with 81.8% identified more than two hours in advance, resulting in an area under the receiver operating characteristic curve of 94.0% and area under the precision-recall curve of 63.0%. The model was externally validated in a large independent patient cohort.
Risk Convergence of Centered Kernel Ridge Regression with Large Dimensional Data
Elkhalil, Khalil, Kammoun, Abla, Zhang, Xiangliang, Alouini, Mohamed-Slim, Al-Naffouri, Tareq
This paper carries out a large dimensional analysis of a variation of kernel ridge regression that we call \emph{centered kernel ridge regression} (CKRR), also known in the literature as kernel ridge regression with offset. This modified technique is obtained by accounting for the bias in the regression problem resulting in the old kernel ridge regression but with \emph{centered} kernels. The analysis is carried out under the assumption that the data is drawn from a Gaussian distribution and heavily relies on tools from random matrix theory (RMT). Under the regime in which the data dimension and the training size grow infinitely large with fixed ratio and under some mild assumptions controlling the data statistics, we show that both the empirical and the prediction risks converge to a deterministic quantities that describe in closed form fashion the performance of CKRR in terms of the data statistics and dimensions. Inspired by this theoretical result, we subsequently build a consistent estimator of the prediction risk based on the training data which allows to optimally tune the design parameters. A key insight of the proposed analysis is the fact that asymptotically a large class of kernels achieve the same minimum prediction risk. This insight is validated with both synthetic and real data.
Causal Discovery with General Non-Linear Relationships Using Non-Linear ICA
Monti, Ricardo Pio, Zhang, Kun, Hyvarinen, Aapo
We consider the problem of inferring causal relationships between two or more passively observed variables. While the problem of such causal discovery has been extensively studied especially in the bivariate setting, the majority of current methods assume a linear causal relationship, and the few methods which consider non-linear dependencies usually make the assumption of additive noise. Here, we propose a framework through which we can perform causal discovery in the presence of general non-linear relationships. The proposed method is based on recent progress in non-linear independent component analysis and exploits the non-stationarity of observations in order to recover the underlying sources or latent disturbances. We show rigorously that in the case of bivariate causal discovery, such non-linear ICA can be used to infer the causal direction via a series of independence tests. We further propose an alternative measure of causal direction based on asymptotic approximations to the likelihood ratio, as well as an extension to multivariate causal discovery. We demonstrate the capabilities of the proposed method via a series of simulation studies and conclude with an application to neuroimaging data.
Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm
Microbial clades modeling is a challenging problem in biology based on microarray genome sequences, especially in new species gene isolates discovery and category. Marker family genome sequences play important roles in describing specific microbial clades within species, a framework of support vector machine (SVM) based microbial species classification with N-best algorithm is constructed to classify the centroid marker genome fragments randomly generated from marker genome sequences on MetaRef. A time series feature extraction method is proposed by segmenting the centroid gene sequences and mapping into different dimensional spaces. Two ways of data splitting are investigated according to random splitting fragments along genome sequence (DI) , or separating genome sequences into two parts (DII).Two strategies of fragments recognition tasks, dimension-by-dimension and sequence--by--sequence, are investigated. The k-mer size selection, overlap of segmentation and effects of random split percents are also discussed. Experiments on 12390 maker genome sequences belonging to marker families of 17 species from MetaRef show that, both for DI and DII in dimension-by-dimension and sequence-by-sequence recognition, the recognition accuracy rates can achieve above 28\% in top-1 candidate, and above 91\% in top-10 candidate both on training and testing sets overall.
Playgol: learning programs through play
Children learn though play. We introduce the analogous idea of learning programs through play. In this approach, a program induction system (the learner) is given a set of tasks and initial background knowledge. Before solving the tasks, the learner enters an unsupervised playing stage where it creates its own tasks to solve, tries to solve them, and saves any solutions (programs) to the background knowledge. After the playing stage is finished, the learner enters the supervised building stage where it tries to solve the user-supplied tasks and can reuse solutions learnt whilst playing. The idea is that playing allows the learner to discover reusable general programs on its own which can then help solve the user-supplied tasks. We claim that playing can improve learning performance. We show that playing can reduce the textual complexity of target concepts which in turn reduces the sample complexity of a learner. We implement our idea in Playgol, a new inductive logic programming system. We experimentally test our claim on two domains: robot planning and real-world string transformations. Our experimental results suggest that playing can substantially improve learning performance. We think that the idea of playing (or, more verbosely, unsupervised bootstrapping for supervised program induction) is an important contribution to the problem of developing program induction approaches that self-discover BK.