Generative AI
OpenAI deems its voice cloning tool too risky for general release
A new tool from OpenAI that can generate a convincing clone of anyone's voice using just 15 seconds of recorded audio has been deemed too risky for general release, as the AI lab seeks to minimise the threat of damaging misinformation in a global year of elections. Voice Engine was first developed in 2022 and an initial version was used for the text-to-speech feature built into ChatGPT, the organisation's leading AI tool. But its power has never been revealed publicly, in part because of the "cautious and informed" approach that OpenAI is taking to release it more widely. "We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities," OpenAI said in an unsigned blogpost. "Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale."
LAESI: Leaf Area Estimation with Synthetic Imagery
Kałużny, Jacek, Schreckenberg, Yannik, Cyganik, Karol, Annighöfer, Peter, Pirk, Sören, Michels, Dominik L., Cieslak, Mikolaj, Assaad-Gerbert, Farhah, Benes, Bedrich, Pałubicki, Wojciech
We introduce LAESI, a Synthetic Leaf Dataset of 100,000 synthetic leaf images on millimeter paper, each with semantic masks and surface area labels. This dataset provides a resource for leaf morphology analysis primarily aimed at beech and oak leaves. We evaluate the applicability of the dataset by training machine learning models for leaf surface area prediction and semantic segmentation, using real images for validation. Our validation shows that these models can be trained to predict leaf surface area with a relative error not greater than an average human annotator. LAESI also provides an efficient framework based on 3D procedural models and generative AI for the large-scale, controllable generation of data with potential further applications in agriculture and biology. We evaluate the inclusion of generative AI in our procedural data generation pipeline and show how data filtering based on annotation consistency results in datasets which allow training the highest performing vision models.
Rapid Mobile App Development for Generative AI Agents on MIT App Inventor
Gao, Jaida, Su, Calab, Miller, Etai, Lu, Kevin, Meng, Yu
The evolution of Artificial Intelligence (AI) stands as a pivotal force shaping our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications using the development platform provided by MIT App Inventor. To demonstrate its efficacy, we share the development journey of three distinct mobile applications: SynchroNet for fostering sustainable communities; ProductiviTeams for addressing procrastination; and iHELP for enhancing community safety. All three applications seamlessly integrate a spectrum of generative AI features, leveraging OpenAI APIs. Furthermore, we offer insights gleaned from overcoming challenges in integrating diverse tools and AI functionalities, aiming to inspire young developers to join our efforts in building practical AI agent applications.
Uncertain Boundaries: Multidisciplinary Approaches to Copyright Issues in Generative AI
Dzuong, Jocelyn, Wang, Zichong, Zhang, Wenbin
In the rapidly evolving landscape of generative artificial intelligence (AI), the increasingly pertinent issue of copyright infringement arises as AI advances to generate content from scraped copyrighted data, prompting questions about ownership and protection that impact professionals across various careers. With this in mind, this survey provides an extensive examination of copyright infringement as it pertains to generative AI, aiming to stay abreast of the latest developments and open problems. Specifically, it will first outline methods of detecting copyright infringement in mediums such as text, image, and video. Next, it will delve an exploration of existing techniques aimed at safeguarding copyrighted works from generative models. Furthermore, this survey will discuss resources and tools for users to evaluate copyright violations. Finally, insights into ongoing regulations and proposals for AI will be explored and compared. Through combining these disciplines, the implications of AI-driven content and copyright are thoroughly illustrated and brought into question.
OpenAI Can Re-Create Human Voices--but Won't Release the Tech Yet
Voice synthesis has come a long way since 1978's Speak & Spell toy, which once wowed people with its state-of-the-art ability to read words aloud using an electronic voice. Now, using deep-learning AI models, software can create not only realistic-sounding voices but can also convincingly imitate existing voices using small samples of audio. Along those lines, OpenAI this week announced Voice Engine, a text-to-speech AI model for creating synthetic voices based on a 15-second segment of recorded audio. It has provided audio samples of the Voice Engine in action on its website. This story originally appeared on Ars Technica, a trusted source for technology news, tech policy analysis, reviews, and more.
Microsoft Copilot has reportedly been blocked on all Congress-owned devices
The publication said it obtained a memo from House Chief Administrative Officer Catherine Szpindor, telling Congress personnel that the AI chatbot is now officially prohibited. Apparently, the Office of Cybersecurity has deemed Copilot to be a risk "due to the threat of leaking House data to non-House approved cloud services." While there's nothing stopping them from using Copilot on their own phones and laptops, it will now be blocked on all Windows devices owned by the Congress. Almost a year ago, the Congress also set a strict limit on the use of ChatGPT, which is powered by OpenAI's large language models, just like Copilot. It banned staffers from using the chatbot's free version on House computers, but it allowed them to continue using the paid (ChatGPT Plus) version for research and evaluation due to its tighter privacy controls.
OpenAI previews new audio tool that can read text and mimic voices
OpenAI is sharing early results from a test for a feature that can read words aloud in a convincing human voice -- highlighting a new frontier for artificial intelligence and raising the specter of deepfake risks. The company is sharing early demos and use cases from a small-scale preview of the text-to-speech model, called Voice Engine, which it has shared with about 10 developers so far, a spokesperson said. OpenAI decided against a wider rollout of the feature, which it briefed reporters on earlier this month. A spokesperson for OpenAI said the company decided to scale back the release after receiving feedback from stakeholders such as policymakers, industry experts, educators and creatives. The company had initially planned to release the tool to as many as 100 developers through an application process, according to the earlier press briefing.
MONAL: Model Autophagy Analysis for Modeling Human-AI Interactions
Yang, Shu, Ali, Muhammad Asif, Yu, Lu, Hu, Lijie, Wang, Di
The increasing significance of large models and their multi-modal variants in societal information processing has ignited debates on social safety and ethics. However, there exists a paucity of comprehensive analysis for: (i) the interactions between human and artificial intelligence systems, and (ii) understanding and addressing the associated limitations. To bridge this gap, we propose Model Autophagy Analysis (MONAL) for large models' self-consumption explanation. MONAL employs two distinct autophagous loops (referred to as ``self-consumption loops'') to elucidate the suppression of human-generated information in the exchange between human and AI systems. Through comprehensive experiments on diverse datasets, we evaluate the capacities of generated models as both creators and disseminators of information. Our key findings reveal (i) A progressive prevalence of model-generated synthetic information over time within training datasets compared to human-generated information; (ii) The discernible tendency of large models, when acting as information transmitters across multiple iterations, to selectively modify or prioritize specific contents; and (iii) The potential for a reduction in the diversity of socially or human-generated information, leading to bottlenecks in the performance enhancement of large models and confining them to local optima.
Generative AI for Architectural Design: A Literature Review
Li, Chengyuan, Zhang, Tianyu, Du, Xusheng, Zhang, Ye, Xie, Haoran
Generative Artificial Intelligence (AI) has pioneered new methodological paradigms in architectural design, significantly expanding the innovative potential and efficiency of the design process. This paper explores the extensive applications of generative AI technologies in architectural design, a trend that has benefited from the rapid development of deep generative models. This article provides a comprehensive review of the basic principles of generative AI and large-scale models and highlights the applications in the generation of 2D images, videos, and 3D models. In addition, by reviewing the latest literature from 2020, this paper scrutinizes the impact of generative AI technologies at different stages of architectural design, from generating initial architectural 3D forms to producing final architectural imagery. The marked trend of research growth indicates an increasing inclination within the architectural design community towards embracing generative AI, thereby catalyzing a shared enthusiasm for research. These research cases and methodologies have not only proven to enhance efficiency and innovation significantly but have also posed challenges to the conventional boundaries of architectural creativity. Finally, we point out new directions for design innovation and articulate fresh trajectories for applying generative AI in the architectural domain. This article provides the first comprehensive literature review about generative AI for architectural design, and we believe this work can facilitate more research work on this significant topic in architecture.
OpenAI says it can clone a voice from just 15 seconds of audio
OpenAI just announced that it recently conducted a small-scale preview of a new tool called Voice Engine. This is a voice cloning technology that can mimic any speaker by analyzing a 15-second audio sample. The company says it generates "natural-sounding speech" with "emotive and realistic voices." The technology is based on the company's pre-existing text-to-speech API and it has been in the works since 2022. OpenAI has already been using a version of the toolset to power the preset voices available in the current text-to-speech API and the Read Aloud feature. There are a bunch of samples on the company's official blog and they sound eerily close to the real thing.