AITopics

2206.10959

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.05)
North America > United States > New York > New York County > New York City (0.05)
Asia > Middle East > Oman (0.05)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)

Pentyala, Sikha, Neophytou, Nicola, Nascimento, Anderson, De Cock, Martine, Farnadi, Golnoosh

PrivFairFL: Privacy-Preserving Group Fairness in Federated Learning

arXiv.org Artificial IntelligenceAug-26-2022

Group fairness ensures that the outcome of machine learning (ML) based decision making systems are not biased towards a certain group of people defined by a sensitive attribute such as gender or ethnicity. Achieving group fairness in Federated Learning (FL) is challenging because mitigating bias inherently requires using the sensitive attribute values of all clients, while FL is aimed precisely at protecting privacy by not giving access to the clients' data. As we show in this paper, this conflict between fairness and privacy in FL can be resolved by combining FL with Secure Multiparty Computation (MPC) and Differential Privacy (DP). In doing so, we propose a method for training group-fair ML models in cross-device FL under complete and formal privacy guarantees, without requiring the clients to disclose their sensitive attribute values. Empirical evaluations on real world datasets demonstrate the effectiveness of our solution to train fair and accurate ML models in federated cross-device setups with privacy guarantees to the users.

fairness, information, mpc protocol, (13 more...)

2205.11584

Country:

North America > United States > Washington > Pierce County > Tacoma (0.04)
North America > United States > New York (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Law > Civil Rights & Constitutional Law (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Liu, Zoey, Spence, Justin, Prud'hommeaux, Emily

Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation

arXiv.org Artificial IntelligenceAug-26-2022

Many automatic speech recognition (ASR) data sets include a single pre-defined test set consisting of one or more speakers whose speech never appears in the training set. This "hold-speaker(s)-out" data partitioning strategy, however, may not be ideal for data sets in which the number of speakers is very small. This study investigates ten different data split methods for five languages with minimal ASR training resources. We find that (1) model performance varies greatly depending on which speaker is selected for testing; (2) the average word error rate (WER) across all held-out speakers is comparable not only to the average WER over multiple random splits but also to any given individual random split; (3) WER is also generally comparable when the data is split heuristically or adversarially; (4) utterance duration and intensity are comparatively more predictive factors of variability regardless of the data split. These results suggest that the widely used hold-speakers-out approach to ASR data partitioning can yield results that do not reflect model performance on unseen data or speakers. Random splits can yield more reliable and generalizable estimates when facing data sparsity.

computational linguistic, random split, utterance, (16 more...)

2208.12888

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
North America > United States > California > Yolo County > Davis (0.04)
North America > United States > Arizona (0.04)
(3 more...)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Chowdhury, Somnath Basu Roy, Chaturvedi, Snigdha

Learning Fair Representations via Rate-Distortion Maximization

Text representations learned by machine learning models often encode undesirable demographic information of the user. Predictive models based on these representations can rely on such information, resulting in biased decisions. We present a novel debiasing technique, Fairness-aware Rate Maximization (FaRM), that removes protected information by making representations of instances belonging to the same protected attribute class uncorrelated, using the rate-distortion function. FaRM is able to debias representations with or without a target task at hand. FaRM can also be adapted to remove information about multiple protected attributes simultaneously. Empirical evaluations show that FaRM achieves state-of-the-art performance on several datasets, and learned representations leak significantly less protected attribute information against an attack by a non-linear probing network.

farm, information, representation, (13 more...)

2202.00035

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(18 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Banknote Recognition for Visually Impaired People (Case of Ethiopian note)

Abdelkadir, Nuredin Ali

Currency is used almost everywhere to facilitate business. In most developing countries, especially the ones in Africa, tangible notes are predominantly used in everyday financial transactions. One of these countries, Ethiopia, is believed to have one of the world highest rates of blindness (1.6%) and low vision (3.7%). There are around 4 million visually impaired people; With 1.7 million people being in complete vision loss. Those people face a number of challenges when they are in a bus station, in shopping centers, or anywhere which requires the physical exchange of money. In this paper, we try to provide a solution to this issue using AI/ML applications. We developed an Android and IOS compatible mobile application with a model that achieved 98.9% classification accuracy on our dataset. The application has a voice integrated feature that tells the type of the scanned currency in Amharic, the working language of Ethiopia. The application is developed to be easily accessible by its users. It is build to reduce the burden of visually impaired people in Ethiopia.

application, banknote, ethiopia, (12 more...)

2209.03236

Country: Africa > Ethiopia > Addis Ababa > Addis Ababa (0.05)

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Hyeong, Jihyeon, Kim, Jayoung, Park, Noseong, Jajodia, Sushil

An Empirical Study on the Membership Inference Attack against Tabular Data Synthesis Models

Tabular data typically contains private and important information; thus, precautions must be taken before they are shared with others. Although several methods (e.g., differential privacy and k-anonymity) have been proposed to prevent information leakage, in recent years, tabular data synthesis models have become popular because they can well trade-off between data utility and privacy. However, recent research has shown that generative models for image data are susceptible to the membership inference attack, which can determine whether a given record was used to train a victim synthesis model. In this paper, we investigate the membership inference attack in the context of tabular data synthesis. We conduct experiments on 4 state-of-the-art tabular data synthesis models under two attack scenarios (i.e., one black-box and one white-box attack), and find that the membership inference attack can seriously jeopardize these models. We next conduct experiments to evaluate how well two popular differentially-private deep learning training algorithms, DP-SGD and DP-GAN, can protect the models against the attack. Our key finding is that both algorithms can largely alleviate this threat by sacrificing the generation quality.

generation quality, membership inference attack, white-box attack, (15 more...)

2208.08114

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.06)
(5 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Credit card fraud detection - Classifier selection strategy

Kulatilleke, Gayan K.

Machine learning has opened up new tools for financial fraud detection. Using a sample of annotated transactions, a machine learning classification algorithm learns to detect frauds. With growing credit card transaction volumes and rising fraud percentages there is growing interest in finding appropriate machine learning classifiers for detection. However, fraud data sets are diverse and exhibit inconsistent characteristics. As a result, a model effective on a given data set is not guaranteed to perform on another. Further, the possibility of temporal drift in data patterns and characteristics over time is high. Additionally, fraud data has massive and varying imbalance. In this work, we evaluate sampling methods as a viable pre-processing mechanism to handle imbalance and propose a data-driven classifier selection strategy for characteristic highly imbalanced fraud detection data sets. The model derived based on our selection strategy surpasses peer models, whilst working in more realistic conditions, establishing the effectiveness of the strategy.

card fraud detection, credit card fraud detection, fraud detection, (12 more...)

2208.119

Country:

North America > United States > California > Orange County > Irvine (0.04)
Asia > Taiwan (0.04)
Asia > India (0.04)

Genre: Research Report (0.64)

Industry: Law Enforcement & Public Safety > Fraud (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)

Kulatilleke, Gayan K., Samarakoon, Sugandika

Empirical study of Machine Learning Classifier Evaluation Metrics behavior in Massively Imbalanced and Noisy data

With growing credit card transaction volumes, the fraud percentages are also rising, including overhead costs for institutions to combat and compensate victims. The use of machine learning into the financial sector permits more effective protection against fraud and other economic crime. Suitably trained machine learning classifiers help proactive fraud detection, improving stakeholder trust and robustness against illicit transactions. However, the design of machine learning based fraud detection algorithms has been challenging and slow due the massively unbalanced nature of fraud data and the challenges of identifying the frauds accurately and completely to create a gold standard ground truth. Furthermore, there are no benchmarks or standard classifier evaluation metrics to measure and identify better performing classifiers, thus keeping researchers in the dark. In this work, we develop a theoretical foundation to model human annotation errors and extreme imbalance typical in real world fraud detection data sets. By conducting empirical experiments on a hypothetical classifier, with a synthetic data distribution approximated to a popular real world credit card fraud data set, we simulate human annotation errors and extreme imbalance to observe the behavior of popular machine learning classifier evaluation matrices. We demonstrate that a combined F1 score and g-mean, in that specific order, is the best evaluation metric for typical imbalanced fraud detection model classification.

classifier performance, evaluation metric, minority fraction, (8 more...)

2208.11904

Country: Asia > Taiwan (0.04)

Genre: Research Report (0.40)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Banking & Finance (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Topology Inference for Network Systems: Causality Perspective and Non-asymptotic Performance

Li, Yushan, He, Jianping, Chen, Cailian, Guan, Xinping

Topology inference for network systems (NSs) plays a crucial role in many areas. This paper advocates a causality-based method based on noisy observations from a single trajectory of a NS, which is represented by the state-space model with general directed topology. Specifically, we first prove its close relationships with the ideal Granger estimator for multiple trajectories and the traditional ordinary least squares (OLS) estimator for a single trajectory. Along with this line, we analyze the non-asymptotic inference performance of the proposed method by taking the OLS estimator as a reference, covering both asymptotically and marginally stable systems. The derived convergence rates and accuracy results suggest the proposed method has better performance in addressing potentially correlated observations and achieves zero inference error asymptotically. Besides, an online/recursive version of our method is established for efficient computation or time-varying cases. Extensions on NSs with nonlinear dynamics are also discussed. Comprehensive tests corroborate the theoretical findings and comparisons with other algorithms highlight the superiority of the proposed method.

estimator, matrix, trajectory, (16 more...)

2106.01031

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Heilongjiang Province > Harbin (0.04)
North America > United States > Montana (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

No Language Left Behind: Scaling Human-Centered Machine Translation

NLLB Team, null, Costa-jussà, Marta R., Cross, James, Çelebi, Onur, Elbayad, Maha, Heafield, Kenneth, Heffernan, Kevin, Kalbassi, Elahe, Lam, Janice, Licht, Daniel, Maillard, Jean, Sun, Anna, Wang, Skyler, Wenzek, Guillaume, Youngblood, Al, Akula, Bapi, Barrault, Loic, Gonzalez, Gabriel Mejia, Hansanti, Prangthip, Hoffman, John, Jarrett, Semarley, Sadagopan, Kaushik Ram, Rowe, Dirk, Spruit, Shannon, Tran, Chau, Andrews, Pierre, Ayan, Necip Fazil, Bhosale, Shruti, Edunov, Sergey, Fan, Angela, Gao, Cynthia, Goswami, Vedanuj, Guzmán, Francisco, Koehn, Philipp, Mourachko, Alexandre, Ropers, Christophe, Saleem, Safiyyah, Schwenk, Holger, Wang, Jeff

massively multilingual machine translation model, massively multilingual neural machine translation, statistically significant human evaluation improvement, (14 more...)

Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality results, all while keeping ethical considerations in mind? In No Language Left Behind, we took on this challenge by first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages. More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages. We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks. Critically, we evaluated the performance of over 40,000 different translation directions using a human-translated benchmark, Flores-200, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art, laying important groundwork towards realizing a universal translation system.

2207.04672

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.13)
Europe > Italy > Tuscany > Florence (0.04)
(49 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Overview (1.00)
(2 more...)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
(6 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)