Farchi, Eitan
Detecting model drift using polynomial relations
Roffe, Eliran, Ackerman, Samuel, Raz, Orna, Farchi, Eitan
Machine learning (ML) models serve critical functions, such as classifying loan applicants as good or bad risks. Each model is trained under the assumption that the data used in training, and the data used in field come from the same underlying unknown distribution. Often this assumption is broken in practice. It is desirable to identify when this occurs in order to minimize the impact on model performance. We suggest a new approach to detect change in the data distribution by identifying polynomial relations between the data features. We measure the strength of each identified relation using its R-square value. A strong polynomial relation captures a significant trait of the data which should remain stable if the data distribution does not change. We thus use a set of learned strong polynomial relations to identify drift. For a set of polynomial relations that are stronger than a given desired threshold, we calculate the amount of drift observed for that relation. The amount of drift is estimated by calculating the Bayes Factor for the polynomial relation likelihood of the baseline data versus field data. We empirically validate the approach by simulating a range of changes in three publicly-available data sets, and demonstrate the ability to identify drift using the Bayes Factor of the polynomial relation likelihood change.
Towards API Testing Across Cloud and Edge
Ackerman, Samuel, Choudhury, Sanjib, Desai, Nirmit, Farchi, Eitan, Gisolfi, Dan, Hicks, Andrew, Route, Saritha, Saha, Diptikalyan
API economy is driving the digital transformation of business applications across the hybrid Cloud and edge environments. For such transformations to succeed, end-to-end testing of the application API composition is required. Testing of API compositions, even in centralized Cloud environments, is challenging as it requires coverage of functional as well as reliability requirements. The combinatorial space of scenarios is huge, e.g., API input parameters, order of API execution, and network faults. Hybrid Cloud and edge environments exacerbate the challenge of API testing due to the need to coordinate test execution across dynamic wide-area networks, possibly across network boundaries. To handle this challenge, we envision a test framework named Distributed Software Test Kit (DSTK). The DSTK leverages Combinatorial Test Design (CTD) to cover the functional requirements and then automatically covers the reliability requirements via under-the-hood closed loop between test execution feedback and AI based search algorithms. In each iteration of the closed loop, the search algorithms generate more reliability test scenarios to be executed next. Specifically, five kinds of reliability tests are envisioned: out-of-order execution of APIs, network delays and faults, API performance and throughput, changes in API call graph patterns, and changes in application topology.
Detection of data drift and outliers affecting machine learning model performance over time
Ackerman, Samuel, Farchi, Eitan, Raz, Orna, Zalmanovici, Marcel, Dube, Parijat
A trained ML model is deployed on another `test' dataset where target feature values (labels) are unknown. Drift is distribution change between the training and deployment data, which is concerning if model performance changes. For a cat/dog image classifier, for instance, drift during deployment could be rabbit images (new class) or cat/dog images with changed characteristics (change in distribution). We wish to detect these changes but can't measure accuracy without deployment data labels. We instead detect drift indirectly by nonparametrically testing the distribution of model prediction confidence for changes. This generalizes our method and sidesteps domain-specific feature representation. We address important statistical issues, particularly Type-1 error control in sequential testing, using Change Point Models (CPMs; see Adams and Ross 2012). We also use nonparametric outlier methods to show the user suspicious observations for model diagnosis, since the before/after change confidence distributions overlap significantly. In experiments to demonstrate robustness, we train on a subset of MNIST digit classes, then insert drift (e.g., unseen digit class) in deployment data in various settings (gradual/sudden changes in the drift proportion). A novel loss function is introduced to compare the performance (detection delay, Type-1 and 2 errors) of a drift detector under different levels of drift class contamination.
Sequential Drift Detection in Deep Learning Classifiers
Ackerman, Samuel, Dube, Parijat, Farchi, Eitan
We utilize neural network embeddings to detect data drift by formulating the drift detection within an appropriate sequential decision framework. This enables control of the false alarm rate although the statistical tests are repeatedly applied. Since change detection algorithms naturally face a tradeoff between avoiding false alarms and quick correct detection, we introduce a loss function which evaluates an algorithm's ability to balance these two concerns, and we use it in a series of experiments.
Defending via strategic ML selection
Farchi, Eitan, Shehory, Onn, Barash, Guy
The results of a learning process depend on the input data. There are cases in which an adversary can strategically tamper with the input data to affect the outcome of the learning process. While some datasets are difficult to attack, many others are susceptible to manipulation. A resourceful attacker can tamper with large portions of the dataset and affect them. An attacker can additionally strategically focus on a preferred subset of the attributes in the dataset to maximize the effectiveness of the attack and minimize the resources allocated to data manipulation. In light of this vulnerability, we introduce a solution according to which the defender implements an array of learners, and their activation is performed strategically. The defender computes the (game theoretic) strategy space and accordingly applies a dominant strategy where possible, and a Nash-stable strategy otherwise. In this paper we provide the details of this approach. We analyze Nash equilibrium in such a strategic learning environment, and demonstrate our solution by specific examples.
Reports of the Workshops of the 32nd AAAI Conference on Artificial Intelligence
Bouchard, Bruno (Université du Québec à Chicoutimi) | Bouchard, Kevin (Université du Québec à Chicoutimi) | Brown, Noam (Carnegie Mellon University) | Chhaya, Niyati (Adobe Research, Bangalore) | Farchi, Eitan (IBM Research, Haifa) | Gaboury, Sebastien (Université du Québec à Chicoutimi) | Geib, Christopher (Smart Information Flow Technologies) | Gyrard, Amelie (Wright State University) | Jaidka, Kokil (University of Pennsylvania) | Keren, Sarah (Technion – Israel Institute of Technology) | Khardon, Roni (Tufts University) | Kordjamshidi, Parisa (Tulane University) | Martinez, David (MIT Lincoln Laboratory) | Mattei, Nicholas (IBM Research, TJ Watson) | Michalowski, Martin (University of Minnesota School of Nursing) | Mirsky, Reuth (Ben Gurion University) | Osborn, Joseph (Pomona College) | Sahin, Cem (MIT Lincoln Laboratory) | Shehory, Onn (Bar Ilan University) | Shaban-Nejad, Arash (University of Tennessee Health Science Center) | Sheth, Amit (Wright State University) | Shimshoni, Ilan (University of Haifa) | Shrobe, Howie (Massachusetts Institute of Technology) | Sinha, Arunesh (University of Southern California.) | Sinha, Atanu R. (Adobe Research, Bangalore) | Srivastava, Biplav (IBM Research, Yorktown Height) | Streilein, William (MIT Lincoln Laboratory) | Theocharous, Georgios (Adobe Research, San Jose) | Venable, K. Brent (Tulane University and IHMC) | Wagner, Neal (MIT Lincoln Laboratory) | Zamansky, Anna (University of Haifa)
The AAAI-18 workshop program included 15 workshops covering a wide range of topics in AI. Workshops were held Sunday and Monday, February 2–7, 2018, at the Hilton New Orleans Riverside in New Orleans, Louisiana, USA. This report contains summaries of the Affective Content Analysis workshop; the Artificial Intelligence Applied to Assistive Technologies and Smart Environments; the AI and Marketing Science workshop; the Artificial Intelligence for Cyber Security workshop; the AI for Imperfect-Information Games; the Declarative Learning Based Programming workshop; the Engineering Dependable and Secure Machine Learning Systems workshop; the Health Intelligence workshop; the Knowledge Extraction from Games workshop; the Plan, Activity, and Intent Recognition workshop; the Planning and Inference workshop; the Preference Handling workshop; the Reasoning and Learning for Human-Machine Dialogues workshop; and the the AI Enhanced Internet of Things Data Processing for Intelligent Applications workshop.