Performance Analysis
Certifying Decision Trees Against Evasion Attacks by Program Analysis
Calzavara, Stefano, Ferrara, Pietro, Lucchese, Claudio
Machine learning has proved invaluable for a range of different tasks, yet it also proved vulnerable to evasion attacks, i.e., maliciously crafted perturbations of input data designed to force mispredictions. In this paper we propose a novel technique to verify the security of decision tree models against evasion attacks with respect to an expressive threat model, where the attacker can be represented by an arbitrary imperative program. Our approach exploits the interpretability property of decision trees to transform them into imperative programs, which are amenable for traditional program analysis techniques. By leveraging the abstract interpretation framework, we are able to soundly verify the security guarantees of decision tree models trained over publicly available datasets. Our experiments show that our technique is both precise and efficient, yielding only a minimal number of false positives and scaling up to cases which are intractable for a competitor approach.
Boost your model's performance with these fantastic libraries
Quality is determined by Accuracy and completeness. Companies use machine learning models to make practical business decisions, and more accurate model outcomes result in better decisions. The cost of errors can be huge, but optimizing model accuracy mitigates that cost. Machine Learning model accuracy is a measurement used to determine which model is best at identifying relationships and patterns between variables in a dataset based on the input, or training data. The better a model can generalize to'unseen' data, the better predictions and insights it can produce, which in turn deliver more business value. The dataset which I have chosen is the Breast Cancer Prediction dataset.
Building Knowledge on the Customer Through Machine Learning
The cost of acquiring new customers is high, so companies are spending more on customer loyalty and retention. Identifying the total value generated by a customer in the entire customer life cycle would help companies in business campaigns and in other activities. So naturally Customer Relationship Management (CRM) becomes a key element of modern marketing strategies. If we can predict a score that allows us to project, on a given population, quantifiable information then it can be used by the information system (IS) to personalize the customer relationship. KDD (Knowledge Discovery and Data Mining) Cup 2009 challenge consists of three tasks, predicting the churn, appentency and upselling, through the data provided by the telecom company Orange.
Interesting AI/ML Articles You Should Read This Week (July 4)
Would you let a machine learning model that has a failure rate of 98% and a false positive rate of 81% into production? Well, these claimed performance figures are from a facial recognition system that is in use by the policing force in South Wales and other parts of the United Kingdom. Dave Gershgorn article starts with a description akin to the setting of a dystopian future where an overseeing governing system monitors everyone; which is hysterically a foreshadowing of a foreseeable future. South Wales Police have been using facial recognition systems since 2017 and have done this in no secrecy from the public. They've made arrests as a result of the facial recognition system.
Towards Incorporating Contextual Knowledge into the Prediction of Driving Behavior
Wirthmüller, Florian, Schlechtriemen, Julian, Hipp, Jochen, Reichert, Manfred
Predicting the behavior of surrounding traffic participants is crucial for advanced driver assistance systems and autonomous driving. Most researchers however do not consider contextual knowledge when predicting vehicle motion. Extending former studies, we investigate how predictions are affected by external conditions. To do so, we categorize different kinds of contextual information and provide a carefully chosen definition as well as examples for external conditions. More precisely, we investigate how a state-of-the-art approach for lateral motion prediction is influenced by one selected external condition, namely the traffic density. Our investigations demonstrate that this kind of information is highly relevant in order to improve the performance of prediction algorithms. Therefore, this study constitutes the first step towards the integration of such information into automated vehicles. Moreover, our motion prediction approach is evaluated based on the public highD data set showing a maneuver prediction performance with areas under the ROC curve above 97% and a median lateral prediction error of only 0.18m on a prediction horizon of 5s.
Why IBM Decided to Halt all Facial Recognition Development
In a letter to congress sent on June 8th, IBM's CEO Arvind Krishna made a bold statement regarding the company's policy toward facial recognition. "IBM no longer offers general purpose IBM facial recognition or analysis software," says Krishna. "IBM firmly opposes and will not condone uses of any technology, including facial recognition technology offered by other vendors, for mass surveillance, racial profiling, violations of basic human rights and freedoms, or any purpose which is not consistent with our values and Principles of Trust and Transparency." The company has halted all facial recognition development and disapproves or any technology that could lead to racial profiling. The ethics of face recognition technology have been in question for years. However, there has been little to no movement in the enactment of official laws barring the technology.
Detroit police chief cops to 96-percent facial recognition error rate
Detroit's police chief admitted on Monday that facial recognition technology used by the department misidentifies suspects about 96 percent of the time. It's an eye-opening admission given that the Detroit Police Department is facing criticism for arresting a man based on a bogus match from facial recognition software. Last week, the ACLU filed a complaint with the Detroit Police Department on behalf of Robert Williams, a Black man who was wrongfully arrested for stealing five watches worth $3,800 from a luxury retail store. Investigators first identified Williams by doing a facial recognition search with software from a company called DataWorks Plus. Under police questioning, Williams pointed out that the grainy surveillance footage obtained by police didn't actually look like him.
Hawaii Is Finally Making It Easier for Tourists to Visit. Is That Smart?
Hawaii is ready for its midpandemic tourism boom. Starting on Aug. 1, tourists looking to visit Hawaii will be able to bypass the state's two-week quarantine requirement for arrivals by getting a negative COVID-19 test within 72 hours before landing in the state. Visitors can also have their quarantines cut short if they receive negative test results during those two weeks. The same rules will also apply to residents returning to the islands. Hawaii won't pay for the tests; travelers will have to handle that themselves before departure, though screeners will still administer temperature checks at airports.
Examining Redundancy in the Context of Safe Machine Learning
Doran, Hans Dermot, Reif, Monika
This paper describes a set of experiments with neural network classifiers on the MNIST database of digits. The purpose is to investigate na\"ive implementations of redundant architectures as a first step towards safe and dependable machine learning. We report on a set of measurements using the MNIST database which ultimately serve to underline the expected difficulties in using NN classifiers in safe and dependable systems.
High-recall causal discovery for autocorrelated time series with latent confounders
Gerhardus, Andreas, Runge, Jakob
We present a new method for linear and nonlinear, lagged and contemporaneous constraint-based causal discovery from observational time series in the presence of latent confounders. We show that existing causal discovery methods such as FCI and variants suffer from low recall in the autocorrelated time series case and identify low effect size of conditional independence tests as the main reason. Information-theoretical arguments show that effect size can often be increased if causal parents are included in the conditioning sets. To identify parents early on, we suggest an iterative procedure that utilizes novel orientation rules to determine ancestral relationships already during the edge removal phase. We prove that the method is order-independent, and sound and complete in the oracle case. Extensive simulation studies for different numbers of variables, time lags, sample sizes, and further cases demonstrate that our method indeed achieves much higher recall than existing methods while keeping false positives at the desired level. This performance gain grows with stronger autocorrelation. Our method also covers causal discovery for non-time series data as a special case. We provide Python code for all methods involved in the simulation studies.