AITopics

2304.11507

Country:

North America > United States > Iowa (0.05)
North America > United States > Maryland (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.95)
(2 more...)

Journal of Artificial Intelligence ResearchApr-22-2023

Measuring Fairness Under Unawareness of Sensitive Attributes: A Quantification-Based Approach

Fabris, Alessandro (University of Padua) | Esuli, Andrea (Consiglio Nazionale delle Ricerche) | Moreo, Alejandro (Consiglio Nazionale delle Ricerche) | Sebastiani, Fabrizio (Consiglio Nazionale delle Ricerche)

Algorithms and models are increasingly deployed to inform decisions about people, inevitably affecting their lives. As a consequence, those in charge of developing these models must carefully evaluate their impact on different groups of people and favour group fairness, that is, ensure that groups determined by sensitive demographic attributes, such as race or sex, are not treated unjustly. To achieve this goal, the availability (awareness) of these demographic attributes to those evaluating the impact of these models is fundamental. Unfortunately, collecting and storing these attributes is often in conflict with industry practices and legislation on data minimisation and privacy. For this reason, it can be hard to measure the group fairness of trained models, even from within the companies developing them. In this work, we tackle the problem of measuring group fairness under unawareness of sensitive attributes, by using techniques from quantification, a supervised learning task concerned with directly providing group-level prevalence estimates (rather than individual-level class labels). We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem, as they are robust to inevitable distribution shifts while at the same time decoupling the (desirable) objective of measuring group fairness from the (undesirable) side effect of allowing the inference of sensitive attributes of individuals. More in detail, we show that fairness under unawareness can be cast as a quantification problem and solved with proven methods from the quantification literature. We show that these methods outperform previous approaches to measure demographic parity in five experimental protocols, corresponding to important challenges that complicate the estimation of classifier fairness under unawareness.

fairness, pacc, svm, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.14033

AI Access Foundation

14033

Journal of Artificial Intelligence Research

Country:

Europe > Germany (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Toronto (0.04)
(11 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.45)

Industry:

Law (1.00)
Health & Medicine (1.00)
Banking & Finance (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Schütt, Barbara, Zipfl, Maximilian, Zöllner, J. Marius, Sax, Eric

Inverse Universal Traffic Quality -- a Criticality Metric for Crowded Urban Traffic Scenes

arXiv.org Artificial IntelligenceApr-21-2023

An essential requirement for scenario-based testing the identification of critical scenes and their associated scenarios. However, critical scenes, such as collisions, occur comparatively rarely. Accordingly, large amounts of data must be examined. A further issue is that recorded real-world traffic often consists of scenes with a high number of vehicles, and it can be challenging to determine which are the most critical vehicles regarding the safety of an ego vehicle. Therefore, we present the inverse universal traffic quality, a criticality metric for urban traffic independent of predefined adversary vehicles and vehicle constellations such as intersection trajectories or car-following scenarios. Our metric is universally applicable for different urban traffic situations, e.g., intersections or roundabouts, and can be adjusted to certain situations if needed. Additionally, in this paper, we evaluate the proposed metric and compares its result to other well-known criticality metrics of this field, such as time-to-collision or post-encroachment time.

artificial intelligence, machine learning, vehicle, (12 more...)

2304.10849

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
Europe > Sweden (0.04)
Europe > Spain > Canary Islands > Gran Canaria (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.89)
Transportation > Infrastructure & Services (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

arXiv.org Artificial IntelligenceApr-21-2023

Optimizing fairness tradeoffs in machine learning with multiobjective meta-models

La Cava, William G.

Improving the fairness of machine learning models is a nuanced task that requires decision makers to reason about multiple, conflicting criteria. The majority of fair machine learning methods transform the error-fairness trade-off into a single objective problem with a parameter controlling the relative importance of error versus fairness. We propose instead to directly optimize the error-fairness tradeoff by using multi-objective optimization. We present a flexible framework for defining the fair machine learning task as a weighted classification problem with multiple cost functions. This framework is agnostic to the underlying prediction model as well as the metrics. We use multiobjective optimization to define the sample weights used in model training for a given machine learner, and adapt the weights to optimize multiple metrics of fairness and accuracy across a set of tasks. To reduce the number of optimized parameters, and to constrain their complexity with respect to population subgroups, we propose a novel meta-model approach that learns to map protected attributes to sample weights, rather than optimizing those weights directly. On a set of real-world problems, this approach outperforms current state-of-the-art methods by finding solution sets with preferable error/fairness trade-offs.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

doi: 10.1145/3583131.3590487

2304.1219

Country:

Europe > Portugal > Lisbon > Lisbon (0.06)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)

Lederer, Isabell, Mayer, Rudolf, Rauber, Andreas

Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and Attacks

arXiv.org Artificial IntelligenceApr-21-2023

The commercial use of Machine Learning (ML) is spreading; at the same time, ML models are becoming more complex and more expensive to train, which makes Intellectual Property Protection (IPP) of trained models a pressing issue. Unlike other domains that can build on a solid understanding of the threats, attacks and defenses available to protect their IP, the ML-related research in this regard is still very fragmented. This is also due to a missing unified view as well as a common taxonomy of these aspects. In this paper, we systematize our findings on IPP in ML, while focusing on threats and attacks identified and defenses proposed at the time of writing. We develop a comprehensive threat model for IP in ML, categorizing attacks and defenses within a unified and consolidated taxonomy, thus bridging research from both the ML and security communities.

artificial intelligence, machine learning, watermark, (20 more...)

doi: 10.1109/TNNLS.2023.3270135

2304.11285

Country:

Europe > Austria > Vienna (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Hong Kong (0.04)
(23 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.66)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Meyer, Anna P., Albarghouthi, Aws, D'Antoni, Loris

The Dataset Multiplicity Problem: How Unreliable Data Impacts Predictions

We introduce dataset multiplicity, a way to study how inaccuracies, uncertainty, and social bias in training datasets impact test-time predictions. The dataset multiplicity framework asks a counterfactual question of what the set of resultant models (and associated test-time predictions) would be if we could somehow access all hypothetical, unbiased versions of the dataset. We discuss how to use this framework to encapsulate various sources of uncertainty in datasets' factualness, including systemic social bias, data collection practices, and noisy labels or features. We show how to exactly analyze the impacts of dataset multiplicity for a specific model architecture and type of uncertainty: linear models with label errors. Our empirical analysis shows that real-world datasets, under reasonable assumptions, contain many test samples whose predictions are affected by dataset multiplicity. Furthermore, the choice of domain-specific dataset multiplicity definition determines what samples are affected, and whether different demographic groups are disparately impacted. Finally, we discuss implications of dataset multiplicity for machine learning practice and research, including considerations for when model outcomes should not be trusted.

artificial intelligence, dataset multiplicity, machine learning, (14 more...)

2304.10655

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Oregon (0.04)
North America > United States > Louisiana (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (0.68)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Multi-module based CVAE to predict HVCM faults in the SNS accelerator

Alanazi, Yasir, Schram, Malachi, Rajput, Kishansingh, Goldenberg, Steven, Vidyaratne, Lasitha, Pappas, Chris, Radaideh, Majdi I., Lu, Dan, Ramuhalli, Pradeep, Cousineau, Sarah

We present a multi-module framework based on Conditional Variational Autoencoder (CVAE) to detect anomalies in the power signals coming from multiple High Voltage Converter Modulators (HVCMs). We condition the model with the specific modulator type to capture different representations of the normal waveforms and to improve the sensitivity of the model to identify a specific type of fault when we have limited samples for a given module type. We studied several neural network (NN) architectures for our CVAE model and evaluated the model performance by looking at their loss landscape for stability and generalization. Our results for the Spallation Neutron Source (SNS) experimental data show that the trained model generalizes well to detecting multiple fault types for several HVCM module types. The results of this study can be used to improve the HVCM reliability and overall SNS uptime

artificial intelligence, data mining, machine learning, (18 more...)

2304.10639

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Virginia > Newport News (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Assunção, Gabriel O., Izbicki, Rafael, Prates, Marcos O.

Is augmentation effective to improve prediction in imbalanced text datasets?

Imbalanced datasets present a significant challenge for machine learning models, often leading to biased predictions. To address this issue, data augmentation techniques are widely used in natural language processing (NLP) to generate new samples for the minority class. However, in this paper, we challenge the common assumption that data augmentation is always necessary to improve predictions on imbalanced datasets. Instead, we argue that adjusting the classifier cutoffs without data augmentation can produce similar results to oversampling techniques. Our study provides theoretical and empirical evidence to support this claim. Our findings contribute to a better understanding of the strengths and limitations of different approaches to dealing with imbalanced data, and help researchers and practitioners make informed decisions about which methods to use for a given task.

machine learning, natural language, text classification, (17 more...)

2304.10283

Country:

North America > United States > Iowa (0.05)
South America > Brazil > Minas Gerais > Belo Horizonte (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Salem, Ahmed, Cherubin, Giovanni, Evans, David, Köpf, Boris, Paverd, Andrew, Suri, Anshuman, Tople, Shruti, Zanella-Béguelin, Santiago

Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy inference risks in machine learning using a similar game-based style. However, adversary capabilities and goals are often stated in subtly different ways from one presentation to the other, which makes it hard to relate and compose results. In this paper, we present a game-based framework to systematize the body of knowledge on privacy inference risks in machine learning. We use this framework to (1) provide a unifying structure for definitions of inference risks, (2) formally establish known relations among definitions, and (3) to uncover hitherto unknown relations that would have been difficult to spot otherwise.

adversary, artificial intelligence, machine learning, (16 more...)

2212.10986

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Cen, Sarah H., Madry, Aleksander, Shah, Devavrat

A User-Driven Framework for Regulating and Auditing Social Media

People form judgments and make decisions based on the information that they observe. A growing portion of that information is not only provided, but carefully curated by social media platforms. Although lawmakers largely agree that platforms should not operate without any oversight, there is little consensus on how to regulate social media. There is consensus, however, that creating a strict, global standard of "acceptable" content is untenable (e.g., in the US, it is incompatible with Section 230 of the Communications Decency Act and the First Amendment). In this work, we propose that algorithmic filtering should be regulated with respect to a flexible, user-driven baseline. We provide a concrete framework for regulating and auditing a social media platform according to such a baseline. In particular, we introduce the notion of a baseline feed: the content that a user would see without filtering (e.g., on Twitter, this could be the chronological timeline). We require that the feeds a platform filters contain "similar" informational content as their respective baseline feeds, and we design a principled way to measure similarity. This approach is motivated by related suggestions that regulations should increase user agency. We present an auditing procedure that checks whether a platform honors this requirement. Notably, the audit needs only black-box access to a platform's filtering algorithm, and it does not access or infer private user information. We provide theoretical guarantees on the strength of the audit. We further show that requiring closeness between filtered and baseline feeds does not impose a large performance cost, nor does it create echo chambers.

artificial intelligence, machine learning, platform, (16 more...)

2304.10525

Country:

North America > United States > California (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Industry:

Media > News (1.00)
Law (1.00)
Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)