Goto

Collaborating Authors

 Law


The New York Times says OpenAI deleted evidence in its copyright lawsuit

Engadget

Astrophysicist Stephen Hawking told Last Week Tonight's John Oliver a chilling but memorable hypothetical story a decade ago about the potential dangers of AI. The gist is a group of scientists build a superintelligent computer and ask it, "Is there a God?" The computer answers, "There is now" and a bolt of lightning zaps the plug preventing it from being shut down. Let's hope that's not what happened with OpenAI and some missing evidence from the New York Times' plagiarism lawsuit. Wired reported that a court declaration filed by the New York Times on Wednesday says that OpenAI's engineers accidentally erased evidence of the AI's training data that took a long time to research and compile.


New York Times Says OpenAI Erased Potential Lawsuit Evidence

WIRED

This week, the Times alleged that OpenAI's engineers inadvertently erased data the paper's team spent more than 150 hours extracting as potential evidence. OpenAI was able to recover much of the data, but the Times' legal team says it's still missing the original file names and folder structure. According to a declaration filed to the court Wednesday by Jennifer B. Maisel, a lawyer for the newspaper, this means the information "cannot be used to determine where the news plaintiffs' copied articles" may have been incorporated into OpenAI's artificial intelligence models. "We disagree with the characterizations made and will file our response soon," OpenAI spokesperson Jason Deutrom told WIRED in a statement. The New York Times declined to comment.


Four ways to protect your art from AI

MIT Technology Review

Artists and writers have launched several lawsuits against AI companies, arguing that their work has been scraped into databases for training AI models without consent or compensation. Tech companies have responded that anything on the public internet falls under fair use. But it will be years until we have a legal resolution to the problem. Unfortunately, there is little you can do if your work has been scraped into a data set and used in a model that is already out there. You can, however, take steps to prevent your work from being used in the future.


Google must sell Chrome to end search monopoly, justice department argues in court filing

The Guardian

Alphabet's Google must sell its Chrome browser, share data and search results with competitors and take a range of other measures to end its monopoly on searching the internet, US prosecutors have argued to a judge. Such changes would essentially result in Google being highly regulated for 10 years, subjecting it to oversight by the same Washington federal court that ruled the company maintained an illegal monopoly in online search and related advertising. "Google's unlawful behaviour has deprived rivals not only of critical distribution channels but also distribution partners who could otherwise enable entry into these markets by competitors in new and innovative ways," the US Department of Justice (DoJ) said in a court filing. The court papers filed on Wednesday night expand on an earlier outline on how the US wants to end Google's monopoly. Google called the proposals radical at the time, saying they would harm US consumers and businesses and shake American competitiveness in artificial intelligence.


Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling

arXiv.org Artificial Intelligence

Predicting future international events from textual information, such as news articles, has tremendous potential for applications in global policy, strategic decision-making, and geopolitics. However, existing datasets available for this task are often limited in quality, hindering the progress of related research. In this paper, we introduce WORLDREP (WORLD Relationship and Event Prediction), a novel dataset designed to address these limitations by leveraging the advanced reasoning capabilities of large-language models (LLMs). Our dataset features high-quality scoring labels generated through advanced prompt modeling and rigorously validated by domain experts in political science. We showcase the quality and utility of WORLDREP for real-world event prediction tasks, demonstrating its effectiveness through extensive experiments and analysis. Furthermore, we publicly release our dataset along with the full automation source code for data collection, labeling, and benchmarking, aiming to support and advance research in text-based event prediction.


Single-Model Attribution for Spoofed Speech via Vocoder Fingerprints in an Open-World Setting

arXiv.org Artificial Intelligence

As speech generation technology advances, so do the potential threats of misusing spoofed speech signals. One way to address these threats is by attributing the signals to their source generative model. In this work, we are the first to tackle the single-model attribution task in an open-world setting, that is, we aim at identifying whether spoofed speech signals from unknown sources originate from a specific vocoder. We show that the standardized average residual between audio signals and their low-pass filtered or EnCodec filtered versions can serve as powerful vocoder fingerprints. The approach only requires data from the target vocoder and allows for simple but highly accurate distance-based model attribution. We demonstrate its effectiveness on LJSpeech and JSUT, achieving an average AUROC of over 99% in most settings. The accompanying robustness study shows that it is also resilient to noise levels up to a certain degree.


GPT versus Humans: Uncovering Ethical Concerns in Conversational Generative AI-empowered Multi-Robot Systems

arXiv.org Artificial Intelligence

The emergence of generative artificial intelligence (GAI) and large language models (LLMs) such ChatGPT has enabled the realization of long-harbored desires in software and robotic development. The technology however, has brought with it novel ethical challenges. These challenges are compounded by the application of LLMs in other machine learning systems, such as multi-robot systems. The objectives of the study were to examine novel ethical issues arising from the application of LLMs in multi-robot systems. Unfolding ethical issues in GPT agent behavior (deliberation of ethical concerns) was observed, and GPT output was compared with human experts. The article also advances a model for ethical development of multi-robot systems. A qualitative workshop-based method was employed in three workshops for the collection of ethical concerns: two human expert workshops (N=16 participants) and one GPT-agent-based workshop (N=7 agents; two teams of 6 agents plus one judge). Thematic analysis was used to analyze the qualitative data. The results reveal differences between the human-produced and GPT-based ethical concerns. Human experts placed greater emphasis on new themes related to deviance, data privacy, bias and unethical corporate conduct. GPT agents emphasized concerns present in existing AI ethics guidelines. The study contributes to a growing body of knowledge in context-specific AI ethics and GPT application. It demonstrates the gap between human expert thinking and LLM output, while emphasizing new ethical concerns emerging in novel technology.


Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction

arXiv.org Artificial Intelligence

Training multimodal generative models on large, uncurated datasets can result in users being exposed to harmful, unsafe and controversial or culturally-inappropriate outputs. While model editing has been proposed to remove or filter undesirable concepts in embedding and latent spaces, it can inadvertently damage learned manifolds, distorting concepts in close semantic proximity. We identify limitations in current model editing techniques, showing that even benign, proximal concepts may become misaligned. To address the need for safe content generation, we propose a modular, dynamic solution that leverages safety-context embeddings and a dual reconstruction process using tunable weighted summation in the latent space to generate safer images. Our method preserves global context without compromising the structural integrity of the learned manifolds. We achieve state-of-the-art results on safe image generation benchmarks, while offering controllable variation of model safety. We identify trade-offs between safety and censorship, which presents a necessary perspective in the development of ethical AI models. We will release our code. Keywords: Text-to-Image Models, Generative AI, Safety, Reliability, Model Editing


Biden admin warns AI in schools may exhibit racial bias, anti-trans discrimination and trigger investigations

FOX News

Many people in Nashville say they don't trust artificial intelligence chatbots to give them unbiased information amid the backlash Google faces over its Gemini program. On Tuesday, the Department of Education's Office for Civil Rights (OCR) released presidentially-mandated guidance that lays out how schools' use of artificial intelligence (AI) can be discriminatory toward minority and transgender students, "likely" opening them up to federal investigations. President Biden signed Executive Order 14110 last year mandating that the Education Department develop resources, policies and guidance regarding AI in schools to help ensure responsible and non-discriminatory use, "including the impact AI systems have on vulnerable and underserved communities." "The growing use of AI in schools, including for instructional and school safety purposes, and AI's ability to operate on a mass scale can create or contribute to discrimination," the Education Department's guidance states. "This resource provides information regarding federal civil rights laws in OCR's jurisdiction and includes examples of types of incidents that could, depending on the facts and circumstances, present OCR with sufficient reason to open an investigation."


Founder of company that created LAUSD chatbot charged with fraud

Los Angeles Times

The head of an education technology startup that created a highly touted chatbot for the Los Angeles school system has been arrested and charged with fraud. Federal prosecutors, in an indictment unsealed Tuesday, accused Joanna Smith-Griffin of defrauding investors and charged her with securities fraud, wire fraud and aggravated identity theft. Smith-Griffin, 33, is the founder and former chief executive of AllHere, the Boston-based company that created "Ed," an artificial-intelligence tool billed as revolutionary for students' education and the interaction between the L.A. Unified School District and the families it serves. After unveiling the chatbot with great fanfare in March, L.A. school officials, months later, quietly disconnected the tool -- which was supposed to respond to any question from students or parents in an accurate, helpful and private manner. LAUSD board members at Tuesday's meeting will consider resolutions on immigration sanctuary, LGBTQ protection and accelerating the teaching of current events.