AITopics

2007.04238

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > France > Brittany > Finistère > Brest (0.04)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Verschuuren, Pim, Palazzo, Serena, Powell, Tom, Sutton, Steve, Pilgrim, Alfred, Giannelli, Michele Faucci

Supervised machine learning techniques for data matching based on similarity metrics

arXiv.org Machine LearningJul-8-2020

Businesses, governmental bodies and NGO's have an ever-increasing amount of data at their disposal from which they try to extract valuable information. Often, this needs to be done not only accurately but also within a short time frame. Clean and consistent data is therefore crucial. Data matching is the field that tries to identify instances in data that refer to the same real-world entity. In this study, machine learning techniques are combined with string similarity functions to the field of data matching. A dataset of invoices from a variety of businesses and organizations was preprocessed with a grouping scheme to reduce pair dimensionality and a set of similarity functions was used to quantify similarity between invoice pairs. The resulting invoice pair dataset was then used to train and validate a neural network and a boosted decision tree. The performance was compared with a solution from FISCAL Technologies as a benchmark against currently available deduplication solutions. Both the neural network and boosted decision tree showed equal to better performance.

artificial intelligence, invoice, machine learning, (19 more...)

2007.04001

Country:

North America > United States > Florida > Hillsborough County > Tampa (0.04)
Europe > United Kingdom > England > Berkshire > Reading (0.04)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

#artificialintelligenceJul-7-2020, 15:22:22 GMT

Learner's World

In continuation of my previous posts on various Performance measures for classifiers, here, I've explained the concept of single score measure namely; 'F - score'. In my previous posts, I had discussed four fundamental numbers, namely, true positive, true negative, false positive and false negative and eight basic ratios, namely, sensitivity(or recall or true positive rate) & specificity (or true negative rate), false positive rate (or type-I error) & false negative rates (or type-II error), positive predicted value (or precision) & negative predicted value, and false discovery rate (or q-value) & false omission rate. I had also discussed accuracy paradox, the relationship between various basic ratios and their trade-off to evaluate the performance of a classifier with examples. I'll be using the same confusion matrix for reference. Precision & Recall: First let's briefly revisit the understanding of'Precision (PPV) & Recall (sensitivity)'.

artificial intelligence, machine learning, precision and recall, (14 more...)

#artificialintelligence

Industry: Information Technology > Services (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Gupta, Kishor Datta, Akhtar, Zahid, Dasgupta, Dipankar

Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial Attacks

arXiv.org Artificial IntelligenceJul-7-2020

Developing secure machine learning models from adversarial examples is challenging as various methods are continually being developed to generate adversarial attacks. In this work, we propose an evolutionary approach to automatically determine Image Processing Techniques Sequence (IPTS) for detecting malicious inputs. Accordingly, we first used a diverse set of attack methods including adaptive attack methods (on our defense) to generate adversarial samples from the clean dataset. A detection framework based on a genetic algorithm (GA) is developed to find the optimal IPTS, where the optimality is estimated by different fitness measures such as Euclidean distance, entropy loss, average histogram, local binary pattern and loss functions. The "image difference" between the original and processed images is used to extract the features, which are then fed to a classification scheme in order to determine whether the input sample is adversarial or clean. This paper described our methodology and performed experiments using multiple data-sets tested with several adversarial attacks. For each attack-type and dataset, it generates unique IPTS. A set of IPTS selected dynamically in testing time which works as a filter for the adversarial attack. Our empirical experiments exhibited promising results indicating the approach can efficiently be used as processing for any AI model.

arxiv preprint arxiv, evolutionary algorithm, machine learning, (17 more...)

2007.00337

Country:

North America > United States > Tennessee > Shelby County > Memphis (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

arXiv.org Machine LearningJul-7-2020

README: REpresentation learning by fairness-Aware Disentangling MEthod

Park, Sungho, Kim, Dohyung, Hwang, Sunhee, Byun, Hyeran

Fair representation learning aims to encode invariant representation with respect to the protected attribute, such as gender or age. In this paper, we design Fairness-aware Disentangling Variational AutoEncoder (FD-VAE) for fair representation learning. This network disentangles latent space into three subspaces with a decorrelation loss that encourages each subspace to contain independent information: 1) target attribute information, 2) protected attribute information, 3) mutual attribute information. After the representation learning, this disentangled representation is leveraged for fairer downstream classification by excluding the subspace with the protected attribute information. We demonstrate the effectiveness of our model through extensive experiments on CelebA and UTK Face datasets. Our method outperforms the previous state-of-the-art method by large margins in terms of equal opportunity and equalized odds.

artificial intelligence, information, machine learning, (14 more...)

2007.03775

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
(3 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

arXiv.org Artificial IntelligenceJul-6-2020

Fairness in machine learning: against false positive rate equality as a measure of fairness

Long, Robert

As machine learning informs increasingly consequential decisions, different metrics have been proposed for measuring algorithmic bias or unfairness. Two popular fairness measures are calibration and equality of false positive rate. Each measure seems intuitively important, but notably, it is usually impossible to satisfy both measures. For this reason, a large literature in machine learning speaks of a fairness tradeoff between these two measures. This framing assumes that both measures are, in fact, capturing something important. To date, philosophers have not examined this crucial assumption, and examined to what extent each measure actually tracks a normatively important property. This makes this inevitable statistical conflict, between calibration and false positive rate equality, an important topic for ethics. In this paper, I give an ethical framework for thinking about these measures and argue that, contrary to initial appearances, false positive rate equality does not track anything about fairness, and thus sets an incoherent standard for evaluating the fairness of algorithms.

artificial intelligence, machine learning, threshold, (14 more...)

2007.0289

Country:

North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > Florida > Broward County (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Law > Criminal Law (0.69)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Pitis, Silviu, Creager, Elliot, Garg, Animesh

Counterfactual Data Augmentation using Locally Factored Dynamics

arXiv.org Artificial IntelligenceJul-6-2020

Many dynamic processes, including common scenarios in robotic control and reinforcement learning (RL), involve a set of interacting subprocesses. Though the subprocesses are not independent, their interactions are often sparse, and the dynamics at any given time step can often be decomposed into locally independent causal mechanisms. Such local causal structures can be leveraged to improve the sample efficiency of sequence prediction and off-policy reinforcement learning. We formalize this by introducing local causal models (LCMs), which are induced from a global causal model by conditioning on a subset of the state space. We propose an approach to inferring these structures given an object-oriented state representation, as well as a novel algorithm for model-free Counterfactual Data Augmentation (CoDA). CoDA uses local structures and an experience replay to generate counterfactual experiences that are causally valid in the global model. We find that CoDA significantly improves the performance of RL agents in locally factored tasks, including the batch-constrained and goal-conditioned settings.

arxiv preprint arxiv, machine learning, reinforcement learning, (13 more...)

2007.02863

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Guo, Wei, Caliskan, Aylin

Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases

arXiv.org Artificial IntelligenceJul-6-2020

With the starting point that implicit human biases are reflected in the statistical regularities of language, it is possible to measure biases in static word embeddings. With recent advances in natural language processing, state-of-the-art neural language models generate dynamic word embeddings dependent on the context in which the word appears. Current methods of measuring social and intersectional biases in these contextualized word embeddings rely on the effect magnitudes of bias in a small set of pre-defined sentence templates. We propose a new comprehensive method, Contextualized Embedding Association Test (CEAT), based on the distribution of 10,000 pooled effect magnitudes of bias in embedding variations and a random-effects model, dispensing with templates. Experiments on social and intersectional biases show that CEAT finds evidence of all tested biases and provides comprehensive information on the variability of effect magnitudes of the same bias in different contexts. Furthermore, we develop two methods, Intersectional Bias Detection (IBD) and Emergent Intersectional Bias Detection (EIBD), to automatically identify the intersectional biases and emergent intersectional biases from static word embeddings in addition to measuring them in contextualized word embeddings. We present the first algorithmic bias detection findings on how intersectional group members are associated with unique emergent biases that do not overlap with the biases of their constituent minority identities. IBD achieves an accuracy of 81.6% and 82.7%, respectively, when detecting the intersectional biases of African American females and Mexican American females. EIBD reaches an accuracy of 84.7% and 65.3%, respectively, when detecting the emergent intersectional biases unique to African American females and Mexican American females (random correct identification probability ranges from 1.0% to 25.5%).

large language model, machine learning, natural language, (20 more...)

2006.03955

Country:

North America > United States > District of Columbia > Washington (0.04)
North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.72)

Industry:

Health & Medicine (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

So, Banghee, Boucher, Jean-Philippe, Valdez, Emiliano A.

Cost-sensitive Multi-class AdaBoost for Understanding Driving Behavior with Telematics

arXiv.org Machine LearningJul-6-2020

Powered with telematics technology, insurers can now capture a wide range of data, such as distance traveled, how drivers brake, accelerate or make turns, and travel frequency each day of the week, to better decode driver's behavior. Such additional information helps insurers improve risk assessments for usage-based insurance (UBI), an increasingly popular industry innovation. In this article, we explore how to integrate telematics information to better predict claims frequency. For motor insurance during a policy year, we typically observe a large proportion of drivers with zero claims, a less proportion with exactly one claim, and far lesser with two or more claims. We introduce the use of a cost-sensitive multi-class adaptive boosting (AdaBoost) algorithm, which we call SAMME.C2, to handle such imbalances. To calibrate SAMME.C2 algorithm, we use empirical data collected from a telematics program in Canada and we find improved assessment of driving behavior with telematics relative to traditional risk variables. We demonstrate our algorithm can outperform other models that can handle class imbalances: SAMME, SAMME with SMOTE, RUSBoost, and SMOTEBoost. The sampled data on telematics were observations during 2013-2016 for which 50,301 are used for training and another 21,574 for testing. Broadly speaking, the additional information derived from vehicle telematics helps refine risk classification of drivers of UBI.

algorithm, artificial intelligence, machine learning, (18 more...)

2007.031

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
North America > United States > Connecticut > Tolland County > Storrs (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry: Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Raj, Anant, Bauer, Stefan, Soleymani, Ashkan, Besserve, Michel, Schölkopf, Bernhard

Causal Feature Selection via Orthogonal Search

arXiv.org Machine LearningJul-6-2020

The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. Recent work in the field of causal discovery exploits invariance properties of models across different experimental conditions for detecting direct causal links. However, these approaches generally do not scale well with the number of explanatory variables, are difficult to extend to nonlinear relationships, and require data across different experiments. Inspired by {\em Debiased} machine learning methods, we study a one-vs.-the-rest feature selection approach to discover the direct causal parent of the response. We propose an algorithm that works for purely observational data, while also offering theoretical guarantees, including the case of partially nonlinear relationships. Requiring only one estimation for each variable, we can apply our approach even to large graphs, demonstrating significant improvements compared to established approaches.

artificial intelligence, machine learning, sparsity sparsity sparsity, (14 more...)

2007.02938

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)