Goto

Collaborating Authors

 Performance Analysis


Per-instance Differential Privacy and the Adaptivity of Posterior Sampling in Linear and Ridge regression

arXiv.org Machine Learning

Differential privacy (DP), ever since its advent, has been a controversial object. On the one hand, it provides strong provable protection of individuals in a data set, on the other hand, it has been heavily criticized for being not practical, partially due to its complete independence to the actual data set it tries to protect. In this paper, we address this issue by a new and more fine-grained notion of differential privacy --- per instance differential privacy (pDP), which captures the privacy of a specific individual with respect to a fixed data set. We show that this is a strict generalization of the standard DP and inherits all its desirable properties, e.g., composition, invariance to side information and closedness to postprocessing, except that they all hold for every instance separately. When the data is drawn from a distribution, we show that per-instance DP implies generalization. Moreover, we provide explicit calculations of the per-instance DP for the output perturbation on a class of smooth learning problems. The result reveals an interesting and intuitive fact that an individual has stronger privacy if he/she has small "leverage score" with respect to the data set and if he/she can be predicted more accurately using the leave-one-out data set. Using the developed techniques, we provide a novel analysis of the One-Posterior-Sample (OPS) estimator and show that when the data set is well-conditioned it provides $(\epsilon,\delta)$-pDP for any target individuals and matches the exact lower bound up to a $1+\tilde{O}(n^{-1}\epsilon^{-2})$ multiplicative factor. We also propose AdaOPS which uses adaptive regularization to achieve the same results with $(\epsilon,\delta)$-DP. Simulation shows several orders-of-magnitude more favorable privacy and utility trade-off when we consider the privacy of only the users in the data set.


Deep Fruit Detection in Orchards

arXiv.org Artificial Intelligence

Abstract-- An accurate and reliable image based fruit detection system is critical for supporting higher level agriculture tasks such as yield mapping and robotic harvesting. This paper presents the use of a state-of-the-art object detection framework, Faster R-CNN, in the context of fruit detection in orchards, including mangoes, almonds and apples. Ablation studies are presented to better understand the practical deployment of the detection network, including how much training data is required to capture variability in the dataset. Data augmentation techniques are shown to yield significant performance gains, resulting in a greater than twofold reduction in the number of training images required. In contrast, transferring knowledge between orchards contributed to negligible performance gain over initialising the Deep Convolutional Neural Network directly from ImageNet features. Finally, to operate over orchard data containing between 100-1000 fruit per image, a tiling approach is introduced for the Faster R-CNN framework. The study has resulted in the best yet detection performance for these orchards relative to previous works, with an F1-score of 0.9 achieved for apples and mangoes. I. INTRODUCTION Vision based fruit detection is a critical component for infield automation in agriculture. With accurate knowledge of individual fruit locations in the field, it is possible to perform yield estimation and mapping, which is important for growers as it facilitates efficient utilisation of resources and improves returns per unit area and time. Precise localisation of the fruit is also a necessary component of an automated robotic harvesting system, which can help mitigate one of the most labour intensive tasks in an orchard [1].


Canelo Alvarez vs. Gennady Golovkin: Start Time, PPV Cost, TV Info

International Business Times

The fight that's been two years in the making and promises to be the best boxing match of 2017 is almost here. Canelo Alvarez and Gennady Golovkin will go head-to-head Saturday night at T-Mobile Arena in Las Vegas with multiple middleweight belts on the line. It won't sell as many pay-per-views as the Aug. 26 bout between Floyd Mayweather and Conor McGregor, though it'd be shocking if it didn't rank second on the year in terms of buys. The PPV starts at 8 p.m. EDT, and watching the fight on TV will cost fans $79.99. Three undercard fights will precede the main event between Alvarez and Golovkin.


Road Friction Estimation for Connected Vehicles using Supervised Machine Learning

arXiv.org Machine Learning

Connected vehicle technology is foreseen to play an important role in reducing the number of traffic accidents while being one of the main enabling components for autonomous driving. One of the application of such connection is to provide accurate information about the road condition such as friction level to drivers or the intelligent systems controlling the car. Road surface friction can be defined as the grip between car tyre and underlying surface. During winter times when the temperature decreases dramatically, friction level reduces substantially, which can increase the risk of car accidents. Studies indicate that road conditions such as surface temperature, type of road, and structure of the road sides play an important role in the measured friction level, and some of these conditions can vary significantly within short distances under specific weather situations. Road friction prediction based on the past sensor measurements available in the cars, e.g., temperature and sun light, has advantages of being independent of the road structure and surrounding infrastructure. Intelligent forecast systems rely on the availability of high quality data in order to allow their multiple actors to make correct decisions in diverse traffic situations. These systems have the potential to increase the safety of roads users by means of the timely sharing of road-related information. With the advances in car-to-car communication technology, today, Volvo cars are equipped with slippery road condition warning system to improve road safety and traffic flow.


Python Tutorial for Beginners: Learn in 3 Days

#artificialintelligence

In the syntax below, we are asking Python to import numpy and pandas package. The'as' is used to alias package name.


Understanding Boosted Trees Models

#artificialintelligence

In the previous post, we learned about tree based learning methods - basics of tree based models and the use of bagging to reduce variance. We also looked at one of the most famous learning algorithms based on the idea of bagging- random forests. In this post, we will look into the details of yet another type of tree-based learning algorithms: boosted trees. Boosting, similar to Bagging, is a general class of learning algorithm where a set of weak learners are combined to get strong learners. For classification problems, a weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification. Recall that bagging involves creating multiple copies of the original training data set via bootstrapping, fitting a separate decision tree to each copy, and then combining all of the trees in order to create a single predictive model.


Towards personalized human AI interaction - adapting the behavior of AI agents using neural signatures of subjective interest

arXiv.org Machine Learning

The use of Artificial Neural Networks (ANNs) towards developing Artificial Intelligence (AI) has undergone a renaissance in the past decade. Out of the many emergent techniques for training ANNs that are collectively referred to as'Deep Learning', Deep Reinforcement Learning (DRL) is proving to be a particularly general and powerful method, with applications ranging from video games [1] to autonomous driving [2]. While most applications of reinforcement learning have traditionally used reinforcement signals derived from performance measures that are explicit to the task - e.g. the score in a game or grammatical errors in a translation, when considering AI systems that are required to have a significant interaction with humans - e.g. the autonomous vehicle - it is critical to consider how the human's preference for objects, events, or actions can be incorporated into the behavioral reinforcement for the AI, particularly in ways that are minimally obtrusive [3], [4]. Such behavioral adaptations occur naturally during social interactions and form the bedrock of social mechanisms that build trust and rapport between strangers [5], [6]. In this paper, we present a novel approach that uses decoded human neurophysiological and ocular time-series data as an implicit reinforcement signal for an AI agent that is driving a virtual automobile.


Network cross-validation by edge sampling

arXiv.org Machine Learning

Statistical methods for network data have received a lot of attention because of the wideranging applications of network analysis. There is now a large body of work on methods and models for networks, including the stochastic block model (SBM) [Holland et al., 1983], the degree-corrected stochastic block model (DCSBM) [Karrer and Newman, 2011], and the latent space model [Hoff et al., 2002], to name a few. While this gives the practitioner plenty of choices, there is a lot less work on the crucial question of how to select the best model for the data, as well as how to choose tuning parameters for the selected model, which is often necessary in order to fit it. In some specific problems, progress has been made recently, for instance, in the much-studied problem of community detection. Community detection is the problem of clustering network nodes into groups, and most of the methods proposed over the last twenty years or so require the number of communities K as input.


AI in AML: Present tensed, but future perfect

#artificialintelligence

Today, Financial Institutions (FIs) face significant legal and reputational risks when it comes to complying with anti-money laundering (AML) requirements (including anti-terrorist financing and obligations to conform). Failure can lead to serious sanctions imposed by regulatory bodies (Recently, Societe Generale fined $5.83 MM for a number of shortcomings in its control for preventing money laundering). Today's financial markets are truly global. Transactions and flow of funds take place through a web of interactions across nations and systems. This makes it difficult to be compliant with thousands of regulations and norms across a large number of jurisdictions.


Weighted Message Passing and Minimum Energy Flow for Heterogeneous Stochastic Block Models with Side Information

arXiv.org Machine Learning

We study the misclassification error for community detection in general heterogeneous stochastic block models (SBM) with noisy or partial label information. We establish a connection between the misclassification rate and the notion of minimum energy on the local neighborhood of the SBM. We develop an optimally weighted message passing algorithm to reconstruct labels for SBM based on the minimum energy flow and the eigenvectors of a certain Markov transition matrix. The general SBM considered in this paper allows for unequal-size communities, degree heterogeneity, and different connection probabilities among blocks. We focus on how to optimally weigh the message passing to improve misclassification.