Goto

Collaborating Authors

 safety testing


AI industry pours millions into politics as lawsuits and feuds mount

The Guardian

A little over two years ago, OpenAI's founder Sam Altman stood in front of lawmakers at a congressional hearing and asked them for stronger regulations on artificial intelligence. The technology was "risky" and "could cause significant harm to the world", Altman said, calling for the creation of a new regulatory agency to address AI safety. Altman and the AI industry are promoting a very different message today. The AI they once framed as an existential threat to humanity is now key to maintaining American prosperity and hegemony. Regulations that were once a necessity are now criticized as a hindrance that will weaken the US and embolden its adversaries.


The Pentagon is gutting the team that tests AI and weapons systems

MIT Technology Review

It is a significant overhaul of a department that in 40 years has never before been placed so squarely on the chopping block. Here's how today's defense tech companies, which have fostered close connections to the Trump administration, stand to gain, and why safety testing might suffer as a result. The Operational Test and Evaluation office is "the last gate before a technology gets to the field," says Missy Cummings, a former fighter pilot for the US Navy who is now a professor of engineering and computer science at George Mason University. Though the military can do small experiments with new systems without running it by the office, it has to test anything that gets fielded at scale. "In a bipartisan way--up until now--everybody has seen it's working to help reduce waste, fraud, and abuse," she says.


SCALOFT: An Initial Approach for Situation Coverage-Based Safety Analysis of an Autonomous Aerial Drone in a Mine Environment

arXiv.org Artificial Intelligence

The safety of autonomous systems in dynamic and hazardous environments poses significant challenges. This paper presents a testing approach named SCALOFT for systematically assessing the safety of an autonomous aerial drone in a mine. SCALOFT provides a framework for developing diverse test cases, real-time monitoring of system behaviour, and detection of safety violations. Detected violations are then logged with unique identifiers for detailed analysis and future improvement. SCALOFT helps build a safety argument by monitoring situation coverage and calculating a final coverage measure. We have evaluated the performance of this approach by deliberately introducing seeded faults into the system and assessing whether SCALOFT is able to detect those faults. For a small set of plausible faults, we show that SCALOFT is successful in this.


Exclusive: Renowned Experts Pen Support for California's Landmark AI Safety Bill

TIME - Tech

On August 7, a group of renowned professors co-authored a letter urging key lawmakers to support a California AI bill as it enters the final stages of the state's legislative process. In a letter shared exclusively with TIME, Yoshua Bengio, Geoffrey Hinton, Lawrence Lessig, and Stuart Russell argue that the next generation of AI systems pose "severe risks" if "developed without sufficient care and oversight," and describe the bill as the "bare minimum for effective regulation of this technology." The bill, titled the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act, was introduced by Senator Scott Wiener in February of this year. It requires AI companies training large-scale models to conduct rigorous safety testing for potentially dangerous capabilities and implement comprehensive safety measures to mitigate risks. "There are fewer regulations on AI systems that could pose catastrophic risks than on sandwich shops or hairdressers," the four experts write.


OpenAI delays launch of voice assistant, citing safety testing

Washington Post - Technology News

OpenAI first added the ability for ChatGPT to speak in a one of several synthetic voices, or "personas," late last year. The demo in May used one of those voices to show off a newer, more capable AI system called GPT-4o that saw the chatbot speak in expressive tones, respond to a person's tone of voice and facial expressions, and have more complex conversations. One of the voices, which OpenAI called Sky, resembles the voice of an AI bot played by Johansson in the 2013 movie "Her," about a lonely man who falls in love with his AI assistant.


California lawmakers are trying to regulate AI before it's too late. Here's how

Los Angeles Times

For four years, Jacob Hilton worked for one of the most influential startups in the Bay Area -- OpenAI. His research helped test and improve the truthfulness of AI models such as ChatGPT. He believes artificial intelligence can benefit society, but he also recognizes the serious risks if the technology is left unchecked. Hilton was among 13 current and former OpenAI and Google employees who this month signed an open letter that called for more whistleblower protections, citing broad confidentiality agreements as problematic. "The basic situation is that employees, the people closest to the technology, they're also the ones with the most to lose from being retaliated against for speaking up," says Hilton, 33, now a researcher at the nonprofit Alignment Research Center, who lives in Berkeley.


An Approach to Systematic Data Acquisition and Data-Driven Simulation for the Safety Testing of Automated Driving Functions

arXiv.org Artificial Intelligence

With growing complexity and criticality of automated driving functions in road traffic and their operational design domains (ODD), there is increasing demand for covering significant proportions of development, validation, and verification in virtual environments and through simulation models. If, however, simulations are meant not only to augment real-world experiments, but to replace them, quantitative approaches are required that measure to what degree and under which preconditions simulation models adequately represent reality, and thus, using their results accordingly. Especially in R&D areas related to the safety impact of the "open world", there is a significant shortage of real-world data to parameterize and/or validate simulations - especially with respect to the behavior of human traffic participants, whom automated driving functions will meet in mixed traffic. We present an approach to systematically acquire data in public traffic by heterogeneous means, transform it into a unified representation, and use it to automatically parameterize traffic behavior models for use in data-driven virtual validation of automated driving functions.


OpenAI and Other Tech Giants Will Have to Warn the US Government When They Start New AI Projects

WIRED

When OpenAI's ChatGPT took the world by storm last year, it caught many power brokers in both Silicon Valley and Washington, DC, by surprise. The US government should now get advance warning of future AI breakthroughs involving large language models, the technology behind ChatGPT. The Biden administration is preparing to use the Defense Production Act to compel tech companies to inform the government when they train an AI model using a significant amount of computing power. The rule could take effect as soon as next week. The new requirement will give the US government access to key information about some of the most sensitive projects inside OpenAI, Google, Amazon, and other tech companies competing in AI.


Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment

arXiv.org Artificial Intelligence

Abstract--The size of deep learning models in artificial intelligence (AI) software is increasing rapidly, hindering the large-scale deployment on resource-restricted devices (e.g., smartphones). To mitigate this issue, AI software compression plays a crucial role, which aims to compress model size while keeping high performance. However, the intrinsic defects in a big model may be inherited by the compressed one. Such defects may be easily leveraged by adversaries, since a compressed model is usually deployed in a large number of devices without adequate protection. In this article, we aim to address the safe model compression problem from the perspective of safety-performance co-optimization. Specifically, inspired by the test-driven development (TDD) paradigm in software engineering, we propose a test-driven sparse training framework called SafeCompress. Then, considering two kinds of representative and heterogeneous attack mechanisms, i.e., black-box membership inference attack and white-box membership inference attack, we develop two concrete instances called BMIA-SafeCompress and WMIA-SafeCompress. Further, we implement another instance called MMIA-SafeCompress by extending SafeCompress to defend against the occasion when adversaries conduct black-box and white-box membership inference attacks simultaneously. We conduct extensive experiments on five datasets for both computer vision and natural language processing tasks. The results show the effectiveness and generalizability of our framework. We also discuss how to adapt SafeCompress to other attacks besides membership inference attack, demonstrating the flexibility of SafeCompress. Currently, AI software, with DNN as representatives, Model compression aims to compress a big DNN model is recognized as an emerging type of software artifact to a smaller one given specific requirements, e.g., parameter (sometimes known as "software 2.0" [2]). Rashly of DNN-based AI software has increased rapidly in recent compressing a model may lead to severe degeneration in the years (mostly because of a trained deep neural network AI software's task performance such as classification accuracy. For instance, a state-of-the-art model of computer To balance memory storage and task performance, many compression vision contains more than 15 billion parameters [3]. A recent approaches have been proposed and deployed [7], natural language model, GPT-3, is even bigger, surpassing [8]. For example, Han et al. [8] prune AlexNet [1] and reduce 175 billion parameters; this situation requires nearly 1TB of its size by 9 times while losing only 0.01% accuracy in image space to store only the model [4].


Coverage-based Scene Fuzzing for Virtual Autonomous Driving Testing

arXiv.org Artificial Intelligence

Simulation-based virtual testing has become an essential step to ensure the safety of autonomous driving systems. Testers need to handcraft the virtual driving scenes and configure various environmental settings like surrounding traffic, weather conditions, etc. Due to the huge amount of configuration possibilities, the human efforts are subject to the inefficiency in detecting flaws in industry-class autonomous driving system. This paper proposes a coverage-driven fuzzing technique to automatically generate diverse configuration parameters to form new driving scenes. Experimental results show that our fuzzing method can significantly reduce the cost in deriving new risky scenes from the initial setup designed by testers. We expect automated fuzzing will become a common practice in virtual testing for autonomous driving systems.