model builder
A Data-Based Architecture for Flight Test without Test Points
Harp, D. Isaiah, Ott, Joshua, Alora, John, Asmar, Dylan
The justification for the "test point" derives from the test pilot's obligation to reproduce faithfully the pre-specified conditions of some model prediction. Pilot deviation from those conditions invalidates the model assumptions. Flight test aids have been proposed to increase accuracy on more challenging test points. However, the very existence of databands and tolerances is the problem more fundamental than inadequate pilot skill. We propose a novel approach, which eliminates test points. We start with a high-fidelity digital model of an air vehicle. Instead of using this model to generate a point prediction, we use a machine learning method to produce a reduced-order model (ROM). The ROM has two important properties. First, it can generate a prediction based on any set of conditions the pilot flies. Second, if the test result at those conditions differ from the prediction, the ROM can be updated using the new data. The outcome of flight test is thus a refined ROM at whatever conditions were flown. This ROM in turn updates and validates the high-fidelity model. We present a single example of this "point-less" architecture, using T-38C flight test data. We first use a generic aircraft model to build a ROM of longitudinal pitching motion as a hypersurface. We then ingest unconstrained flight test data and use Gaussian Process Regression to update and condition the hypersurface. By proposing a second-order equivalent system for the T-38C, this hypersurface then generates parameters necessary to assess MIL-STD-1797B compliance for longitudinal dynamics.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > San Luis Obispo County > San Luis Obispo (0.04)
- Transportation > Air (1.00)
- Government > Military > Air Force (1.00)
- Aerospace & Defense > Aircraft (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation
Gu, Yuechun, He, Jiajie, Chen, Keke
Training data privacy has been a top concern in AI modeling. While methods like differentiated private learning allow data contributors to quantify acceptable privacy loss, model utility is often significantly damaged. In practice, controlled data access remains a mainstream method for protecting data privacy in many industrial and research environments. In controlled data access, authorized model builders work in a restricted environment to access sensitive data, which can fully preserve data utility with reduced risk of data leak. However, unlike differential privacy, there is no quantitative measure for individual data contributors to tell their privacy risk before participating in a machine learning task. We developed the demo prototype FT-PrivacyScore to show that it's possible to efficiently and quantitatively estimate the privacy risk of participating in a model fine-tuning task. The demo source code will be available at \url{https://github.com/RhincodonE/demo_privacy_scoring}.
Regulation Games for Trustworthy Machine Learning
Yaghini, Mohammad, Liu, Patty, Boenisch, Franziska, Papernot, Nicolas
Existing work on trustworthy machine learning (ML) often concentrates on individual aspects of trust, such as fairness or privacy. Additionally, many techniques overlook the distinction between those who train ML models and those responsible for assessing their trustworthiness. To address these issues, we propose a framework that views trustworthy ML as a multi-objective multi-agent optimization problem. This naturally lends itself to a game-theoretic formulation we call regulation games. We illustrate a particular game instance, the SpecGame in which we model the relationship between an ML model builder and fairness and privacy regulators. Regulators wish to design penalties that enforce compliance with their specification, but do not want to discourage builders from participation. Seeking such socially optimal (i.e., efficient for all agents) solutions to the game, we introduce ParetoPlay. This novel equilibrium search algorithm ensures that agents remain on the Pareto frontier of their objectives and avoids the inefficiencies of other equilibria. Simulating SpecGame through ParetoPlay can provide policy guidance for ML Regulation. For instance, we show that for a gender classification application, regulators can enforce a differential privacy budget that is on average 4.0 lower if they take the initiative to specify their desired guarantee first.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Government (0.93)
- Leisure & Entertainment > Games (0.88)
- Law (0.66)
Copyright Protection in Generative AI: A Technical Perspective
Ren, Jie, Xu, Han, He, Pengfei, Cui, Yingqian, Zeng, Shenglai, Zhang, Jiankun, Wen, Hongzhi, Ding, Jiayuan, Liu, Hui, Chang, Yi, Tang, Jiliang
Generative AI has witnessed rapid advancement in recent years, expanding their capabilities to create synthesized content such as text, images, audio, and code. The high fidelity and authenticity of contents generated by these Deep Generative Models (DGMs) have sparked significant copyright concerns. There have been various legal debates on how to effectively safeguard copyrights in DGMs. This work delves into this issue by providing a comprehensive overview of copyright protection from a technical perspective. We examine from two distinct viewpoints: the copyrights pertaining to the source data held by the data owners and those of the generative models maintained by the model builders. For data copyright, we delve into methods data owners can protect their content and DGMs can be utilized without infringing upon these rights. For model copyright, our discussion extends to strategies for preventing model theft and identifying outputs generated by specific models. Finally, we highlight the limitations of existing techniques and identify areas that remain unexplored. Furthermore, we discuss prospective directions for the future of copyright protection, underscoring its importance for the sustainable and ethical development of Generative AI.
- North America > United States > Michigan (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (2 more...)
- Overview (1.00)
- Research Report > New Finding (0.45)
Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions
Chung, John Joon Young, Kamar, Ece, Amershi, Saleema
Large language models (LLMs) can be used to generate text data for training and evaluating other models. However, creating high-quality datasets with LLMs can be challenging. In this work, we explore human-AI partnerships to facilitate high diversity and accuracy in LLM-based text data generation. We first examine two approaches to diversify text generation: 1) logit suppression, which minimizes the generation of languages that have already been frequently generated, and 2) temperature sampling, which flattens the token sampling probability. We found that diversification approaches can increase data diversity but often at the cost of data accuracy (i.e., text and labels being appropriate for the target domain). To address this issue, we examined two human interventions, 1) label replacement (LR), correcting misaligned labels, and 2) out-of-scope filtering (OOSF), removing instances that are out of the user's domain of interest or to which no considered label applies. With oracle studies, we found that LR increases the absolute accuracy of models trained with diversified datasets by 14.4%. Moreover, we found that some models trained with data generated with LR interventions outperformed LLM-based few-shot classification. In contrast, OOSF was not effective in increasing model accuracy, implying the need for future work in human-in-the-loop text data generation.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- (12 more...)
- Media > Film (0.46)
- Leisure & Entertainment (0.46)
Model Rollbacks Through Versioning
There's general consensus in the Machine Learning community that models can and have made biased decisions against traditionally marginalized groups. Ethical AI researchers from Dr. Cathy O'Neil to Dr. Joy Buolamwini have gone to great lengths to establish a pattern of faulty decision making rooted in biased and unrepresentative data that result in serious harms. Unfortunately, our "intelligent" learning algorithms are only as smart, capable and ethical as we make them and we are only at the beginning of understanding the long term effects of biased models. Fortunately, there are many strategies already at our disposal that we can use to mitigate harms when they arise. Today, we will focus on a very powerful strategy: Model Rollbacks through Versioning.
ML.NET Updates & Announcing Notebooks in Visual Studio
ML.NET is an open-source, cross-platform machine learning framework for .NET developers that enables integration of custom machine learning into .NET apps. In this post, we'll cover the following items: Interactive Notebooks are used extensively in data science and machine learning. They are great for data exploration and preparation, experimentation, model explainability, and even education. Last year, .NET Interactive Notebooks were announced, and you can currently use .NET Interactive Notebooks in VS Code as an extension. After talking to customers, the team decided to experiment with Interactive Notebooks in Visual Studio which has resulted in the new Notebook Editor extension!
AWS Adds Explainability to SageMaker
Amazon Web Services is adding an automated machine learning tool to its SageMaker machine learning model builder that improves model accuracy via explainable AI. The new SageMaker feature dubbed Autopilot generates a model explainability report via SageMaker Clarify, the Amazon tool used to detect algorithmic bias while increasing the transparency of machine learning models. The reports would help model developers understand how individual attributes of training data contribute to a predicted result. The combination is promoted as helping to identify and limit algorithmic bias and explain predictions, allowing users to make informed decisions based on how models arrived at conclusions, AWS said this week. The reports also include "feature importance values" that allow developers to understand as a percentage the correlation between a training data attribute and how it contributed to a predicted result.
ML.NET Model Builder November Updates
ML.NET is an open-source, cross-platform machine learning framework for .NET developers. It enables integrating machine learning into your .NET apps without requiring you to leave the .NET ecosystem or even have a background in ML or data science. ML.NET provides tooling (Model Builder UI in Visual Studio and the cross platform ML.NET CLI) that automatically trains custom machine learning models for you based on your scenario and data. This release of ML.NET Model Builder brings numerous bug fixes and enhancements as well as new features, including advanced data loading options and streaming training data from SQL. In this post, we'll cover the following items: Previously, Model Builder did not offer any data loading options, relying on AutoML to detect column purpose, header, and separator as well as decimal separator style.
- Education (0.53)
- Transportation > Passenger (0.47)
Tackling Bias and Explainability in Automated Machine Learning
Automated machine learning is likely to introduce two critical problems. Fortunately, vendors are introducing tools to tackle both of them. Adoption of automated machine learning -- tools that help data scientists and business analysts (and even business users) automate the construction of machine learning models -- is expected to increase over the next few years because these tools simplify model building. For example, in some of the tools, all the user needs to do is specify the outcome or target variable of interest along with the attributes believed to be predictive. The automated machine learning (autoML) platform picks the best model.
- Law (0.50)
- Information Technology (0.31)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.51)