AITopics | Generative AI

Collaborating Authors

Generative AI

News Overviews Instructional Materials AI-Alerts Classics

OpenAI deems its voice cloning tool too risky for general release

The GuardianMar-31-2024, 16:53:48 GMT

A new tool from OpenAI that can generate a convincing clone of anyone's voice using just 15 seconds of recorded audio has been deemed too risky for general release, as the AI lab seeks to minimise the threat of damaging misinformation in a global year of elections. Voice Engine was first developed in 2022 and an initial version was used for the text-to-speech feature built into ChatGPT, the organisation's leading AI tool. But its power has never been revealed publicly, in part because of the "cautious and informed" approach that OpenAI is taking to release it more widely. "We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities," OpenAI said in an unsigned blogpost. "Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale."

clone, general release, openai, (2 more...)

The Guardian

Country: North America > United States > Rhode Island (0.06)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

LAESI: Leaf Area Estimation with Synthetic Imagery

Kałużny, Jacek, Schreckenberg, Yannik, Cyganik, Karol, Annighöfer, Peter, Pirk, Sören, Michels, Dominik L., Cieslak, Mikolaj, Assaad-Gerbert, Farhah, Benes, Bedrich, Pałubicki, Wojciech

arXiv.org Artificial IntelligenceMar-31-2024

We introduce LAESI, a Synthetic Leaf Dataset of 100,000 synthetic leaf images on millimeter paper, each with semantic masks and surface area labels. This dataset provides a resource for leaf morphology analysis primarily aimed at beech and oak leaves. We evaluate the applicability of the dataset by training machine learning models for leaf surface area prediction and semantic segmentation, using real images for validation. Our validation shows that these models can be trained to predict leaf surface area with a relative error not greater than an average human annotator. LAESI also provides an efficient framework based on 3D procedural models and generative AI for the large-scale, controllable generation of data with potential further applications in agriculture and biology. We evaluate the inclusion of generative AI in our procedural data generation pipeline and show how data filtering based on annotation consistency results in datasets which allow training the highest performing vision models.

dataset, experiment, synthetic data, (16 more...)

arXiv.org Artificial Intelligence

2404.00593

Country:

Asia > Middle East > Jordan (0.04)
North America > United States (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.82)

Industry: Food & Agriculture > Agriculture (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

Rapid Mobile App Development for Generative AI Agents on MIT App Inventor

Gao, Jaida, Su, Calab, Miller, Etai, Lu, Kevin, Meng, Yu

arXiv.org Artificial IntelligenceMar-31-2024

The evolution of Artificial Intelligence (AI) stands as a pivotal force shaping our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications using the development platform provided by MIT App Inventor. To demonstrate its efficacy, we share the development journey of three distinct mobile applications: SynchroNet for fostering sustainable communities; ProductiviTeams for addressing procrastination; and iHELP for enhancing community safety. All three applications seamlessly integrate a spectrum of generative AI features, leveraging OpenAI APIs. Furthermore, we offer insights gleaned from overcoming challenges in integrating diverse tools and AI functionalities, aiming to inspire young developers to join our efforts in building practical AI agent applications.

application, information science and technology 2, mit app inventor, (9 more...)

arXiv.org Artificial Intelligence

doi: 10.5281/zenodo.10899798

2405.01561

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Texas (0.04)

Genre: Research Report (0.64)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.48)
Information Technology (0.47)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.91)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.83)

Add feedback

Uncertain Boundaries: Multidisciplinary Approaches to Copyright Issues in Generative AI

Dzuong, Jocelyn, Wang, Zichong, Zhang, Wenbin

arXiv.org Artificial IntelligenceMar-31-2024

In the rapidly evolving landscape of generative artificial intelligence (AI), the increasingly pertinent issue of copyright infringement arises as AI advances to generate content from scraped copyrighted data, prompting questions about ownership and protection that impact professionals across various careers. With this in mind, this survey provides an extensive examination of copyright infringement as it pertains to generative AI, aiming to stay abreast of the latest developments and open problems. Specifically, it will first outline methods of detecting copyright infringement in mediums such as text, image, and video. Next, it will delve an exploration of existing techniques aimed at safeguarding copyrighted works from generative models. Furthermore, this survey will discuss resources and tools for users to evaluate copyright violations. Finally, insights into ongoing regulations and proposals for AI will be explored and compared. Through combining these disciplines, the implications of AI-driven content and copyright are thoroughly illustrated and brought into question.

international conference, wenbin zhang, zhang, (15 more...)

arXiv.org Artificial Intelligence

2404.08221

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(8 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

OpenAI Can Re-Create Human Voices--but Won't Release the Tech Yet

WIREDMar-30-2024, 17:30:00 GMT

Voice synthesis has come a long way since 1978's Speak & Spell toy, which once wowed people with its state-of-the-art ability to read words aloud using an electronic voice. Now, using deep-learning AI models, software can create not only realistic-sounding voices but can also convincingly imitate existing voices using small samples of audio. Along those lines, OpenAI this week announced Voice Engine, a text-to-speech AI model for creating synthetic voices based on a 15-second segment of recorded audio. It has provided audio samples of the Voice Engine in action on its website. This story originally appeared on Ars Technica, a trusted source for technology news, tech policy analysis, reviews, and more.

openai, re-create human voice, voice engine, (3 more...)

WIRED

Country: North America > United States > Ohio (0.06)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

Microsoft Copilot has reportedly been blocked on all Congress-owned devices

EngadgetMar-30-2024, 03:49:46 GMT

The publication said it obtained a memo from House Chief Administrative Officer Catherine Szpindor, telling Congress personnel that the AI chatbot is now officially prohibited. Apparently, the Office of Cybersecurity has deemed Copilot to be a risk "due to the threat of leaking House data to non-House approved cloud services." While there's nothing stopping them from using Copilot on their own phones and laptops, it will now be blocked on all Windows devices owned by the Congress. Almost a year ago, the Congress also set a strict limit on the use of ChatGPT, which is powered by OpenAI's large language models, just like Copilot. It banned staffers from using the chatbot's free version on House computers, but it allowed them to continue using the paid (ChatGPT Plus) version for research and evaluation due to its tighter privacy controls.

congress-owned device, copilot, microsoft copilot, (4 more...)

Engadget

Country: North America > United States (0.20)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.44)

Add feedback

OpenAI previews new audio tool that can read text and mimic voices

The Japan TimesMar-30-2024, 00:23:00 GMT

OpenAI is sharing early results from a test for a feature that can read words aloud in a convincing human voice -- highlighting a new frontier for artificial intelligence and raising the specter of deepfake risks. The company is sharing early demos and use cases from a small-scale preview of the text-to-speech model, called Voice Engine, which it has shared with about 10 developers so far, a spokesperson said. OpenAI decided against a wider rollout of the feature, which it briefed reporters on earlier this month. A spokesperson for OpenAI said the company decided to scale back the release after receiving feedback from stakeholders such as policymakers, industry experts, educators and creatives. The company had initially planned to release the tool to as many as 100 developers through an application process, according to the earlier press briefing.

openai preview new audio tool, read text and mimic voice, spokesperson

The Japan Times

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

MONAL: Model Autophagy Analysis for Modeling Human-AI Interactions

Yang, Shu, Ali, Muhammad Asif, Yu, Lu, Hu, Lijie, Wang, Di

arXiv.org Artificial IntelligenceMar-30-2024

The increasing significance of large models and their multi-modal variants in societal information processing has ignited debates on social safety and ethics. However, there exists a paucity of comprehensive analysis for: (i) the interactions between human and artificial intelligence systems, and (ii) understanding and addressing the associated limitations. To bridge this gap, we propose Model Autophagy Analysis (MONAL) for large models' self-consumption explanation. MONAL employs two distinct autophagous loops (referred to as ``self-consumption loops'') to elucidate the suppression of human-generated information in the exchange between human and AI systems. Through comprehensive experiments on diverse datasets, we evaluate the capacities of generated models as both creators and disseminators of information. Our key findings reveal (i) A progressive prevalence of model-generated synthetic information over time within training datasets compared to human-generated information; (ii) The discernible tendency of large models, when acting as information transmitters across multiple iterations, to selectively modify or prioritize specific contents; and (iii) The potential for a reduction in the diversity of socially or human-generated information, leading to bottlenecks in the performance enhancement of large models and confining them to local optima.

experiment, information, language model, (16 more...)

arXiv.org Artificial Intelligence

2402.11271

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

Generative AI for Architectural Design: A Literature Review

Li, Chengyuan, Zhang, Tianyu, Du, Xusheng, Zhang, Ye, Xie, Haoran

arXiv.org Artificial IntelligenceMar-30-2024

Generative Artificial Intelligence (AI) has pioneered new methodological paradigms in architectural design, significantly expanding the innovative potential and efficiency of the design process. This paper explores the extensive applications of generative AI technologies in architectural design, a trend that has benefited from the rapid development of deep generative models. This article provides a comprehensive review of the basic principles of generative AI and large-scale models and highlights the applications in the generation of 2D images, videos, and 3D models. In addition, by reviewing the latest literature from 2020, this paper scrutinizes the impact of generative AI technologies at different stages of architectural design, from generating initial architectural 3D forms to producing final architectural imagery. The marked trend of research growth indicates an increasing inclination within the architectural design community towards embracing generative AI, thereby catalyzing a shared enthusiasm for research. These research cases and methodologies have not only proven to enhance efficiency and innovation significantly but have also posed challenges to the conventional boundaries of architectural creativity. Finally, we point out new directions for design innovation and articulate fresh trajectories for applying generative AI in the architectural domain. This article provides the first comprehensive literature review about generative AI for architectural design, and we believe this work can facilitate more research work on this significant topic in architecture.

design process, generative ai, str, (17 more...)

arXiv.org Artificial Intelligence

2404.01335

Country:

Oceania > New Zealand > North Island > Wellington Region > Wellington (0.04)
Europe > Serbia (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(9 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)

Industry: Construction & Engineering (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

OpenAI says it can clone a voice from just 15 seconds of audio

EngadgetMar-29-2024, 19:03:56 GMT

OpenAI just announced that it recently conducted a small-scale preview of a new tool called Voice Engine. This is a voice cloning technology that can mimic any speaker by analyzing a 15-second audio sample. The company says it generates "natural-sounding speech" with "emotive and realistic voices." The technology is based on the company's pre-existing text-to-speech API and it has been in the works since 2022. OpenAI has already been using a version of the toolset to power the preset voices available in the current text-to-speech API and the Read Aloud feature. There are a bunch of samples on the company's official blog and they sound eerily close to the real thing.

clone, just 15, openai, (3 more...)

Engadget

Industry: Information Technology > Security & Privacy (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.95)

Add feedback