Generative AI
When AI Eats Itself: On the Caveats of Data Pollution in the Era of Generative AI
Xing, Xiaodan, Shi, Fadong, Huang, Jiahao, Wu, Yinzhe, Nan, Yang, Zhang, Sheng, Fang, Yingying, Roberts, Mike, Schönlieb, Carola-Bibiane, Del Ser, Javier, Yang, Guang
Generative artificial intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimize training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution. However, not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimize outcomes. Currently, the previously well-controlled integration of real and synthetic data is becoming uncontrollable. The widespread and unregulated dissemination of synthetic data online leads to the contamination of datasets traditionally compiled through web scraping, now mixed with unlabeled synthetic data. This trend portends a future where generative AI systems may increasingly rely blindly on consuming self-generated data, raising concerns about model performance and ethical issues. What will happen if generative AI continuously consumes itself without discernment? What measures can we take to mitigate the potential adverse effects? There is a significant gap in the scientific literature regarding the impact of synthetic data use in generative AI, particularly in terms of the fusion of multimodal information. To address this research gap, this review investigates the consequences of integrating synthetic data blindly on training generative AI on both image and text modalities and explores strategies to mitigate these effects. The goal is to offer a comprehensive view of synthetic data's role, advocating for a balanced approach to its use and exploring practices that promote the sustainable development of generative AI technologies in the era of large models.
Intelligent Tutor: Leveraging ChatGPT and Microsoft Copilot Studio to Deliver a Generative AI Student Support and Feedback System within Teams
This study explores the integration of the ChatGPT API with GPT-4 model and Microsoft Copilot Studio on the Microsoft Teams platform to develop an intelligent tutoring system. Designed to provide instant support to students, the system dynamically adjusts educational content in response to the learners' progress and feedback. Utilizing advancements in natural language processing and machine learning, it interprets student inquiries, offers tailored feedback, and facilitates the educational journey. Initial implementation highlights the system's potential in boosting students' motivation and engagement, while equipping educators with critical insights into the learning process, thus promoting tailored educational experiences and enhancing instructional effectiveness.
The Horseshoe Theory of Google Search
Earlier today, Google presented a new vision for its flagship search engine, one that is uniquely tailored to the generative-AI moment. With advanced technology at its disposal, "Google will do the Googling for you," Liz Reid, the company's head of search, declared onstage at the company's annual software conference. Googling something rarely yields an immediate, definitive answer. You enter a query, confront a wall of blue links, open a zillion tabs, and wade through them to find the most relevant information. If that doesn't work, you refine the search and start again.
Engadget Podcast: The good, the bad and the AI of Google I/O 2024
We just wrapped up coverage on Google's I/O 2024 keynote, and we're just so tired of hearing about AI. While some of the announcements seem potentially useful, it's still tough to tell if the move towards AI will actually help consumers, or if Google is just fighting to stay ahead of OpenAI. Listen below or subscribe on your podcast app of choice. If you've got suggestions or topics you'd like covered on the show, be sure to email us or drop a note in the comments! And be sure to check out our other podcast, Engadget News!
Google rolls out AI-generated, summarized search results in US
Google will use artificial intelligence to return summarized responses to search engine queries from US users as it continues to infuse generative AI into its most widely used products. The company has been testing "AI overviews" that appear at the tops of search results, summaries created by its Gemini AI model that appear alongside the traditional link-based search results. The featured has also been tested in the UK but will be rolled out across the US beginning on Tuesday, Google announced at its annual I/O developer conference Tuesday in California. Google Search head Liz Reid said AI Overviews would become available to "more than a billion people" by the end of the year. Google also announced a text-to-video artificial intelligence model called Veo, allowing for the creation of computer-generated footage based only on written prompts.
OpenAI's new GPT-4o model offers promise of improved smartphone assistants
In the year and a half since the launch of ChatGPT, one nagging question has only got more pressing: if AI can do this, why is my phone's assistant still so bad? On Monday, the gulf grew larger still, as OpenAI announced a new model called GPT-4o – the'o' stands for Omni – which gives the chatbot new abilities to understand and create audio, video, and still images. The system is uncanny to behold. It can engage in prolonged conversations about the world seen through a camera lens, carry out live translation between two different languages, and even laugh at appropriate points. The shine will inevitably wear off after users find the shortcomings in the system, but its creators are more confident than ever.
Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram
Min, Aehong, Wang, Xuan, Correia, Rion Brattig, Rozum, Jordan, Miller, Wendy R., Rocha, Luis M.
We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. OpenAI's GPT series models were compared against human annotation. Frequent terms with a high false-positive rate were removed from the dictionary. Analysis of the estimated false-positive rates of the annotated terms revealed 8 ambiguous terms (plus synonyms) used in Instagram posts, which were removed from the original dictionary. To study the effect of removing those terms, we constructed knowledge networks using the refined and the original dictionaries and performed an eigenvector-centrality analysis on both networks. We show that the refined dictionary thus produced leads to a significantly different rank of important terms, as measured by their eigenvector-centrality of the knowledge networks. Furthermore, the most important terms obtained after refinement are of greater medical relevance. In addition, we show that OpenAI's GPT series models fare worse than human annotators in this task.
PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset
Hou, Yang, Fu, Haitao, Chen, Chuankai, Li, Zida, Zhang, Haoyu, Zhao, Jianjun
With the rapid advancement of generative AI, multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern. Currently, deepfake detection has emerged as a crucial strategy in countering these growing threats. However, as a key factor in training and validating deepfake detectors, most existing deepfake datasets primarily focus on the visual modal, and the few that are multimodal employ outdated techniques, and their audio content is limited to a single language, thereby failing to represent the cutting-edge advancements and globalization trends in current deepfake technologies. To address this gap, we propose a novel, multilingual, and multimodal deepfake dataset: PolyGlotFake. It includes content in seven languages, created using a variety of cutting-edge and popular Text-to-Speech, voice cloning, and lip-sync technologies. We conduct comprehensive experiments using state-of-the-art detection methods on PolyGlotFake dataset. These experiments demonstrate the dataset's significant challenges and its practical value in advancing research into multimodal deepfake detection.
UnMarker: A Universal Attack on Defensive Watermarking
Kassis, Andre, Hengartner, Urs
Reports regarding the misuse of $\textit{Generative AI}$ ($\textit{GenAI}$) to create harmful deepfakes are emerging daily. Recently, defensive watermarking, which enables $\textit{GenAI}$ providers to hide fingerprints in their images to later use for deepfake detection, has been on the rise. Yet, its potential has not been fully explored. We present $\textit{UnMarker}$ -- the first practical $\textit{universal}$ attack on defensive watermarking. Unlike existing attacks, $\textit{UnMarker}$ requires no detector feedback, no unrealistic knowledge of the scheme or similar models, and no advanced denoising pipelines that may not be available. Instead, being the product of an in-depth analysis of the watermarking paradigm revealing that robust schemes must construct their watermarks in the spectral amplitudes, $\textit{UnMarker}$ employs two novel adversarial optimizations to disrupt the spectra of watermarked images, erasing the watermarks. Evaluations against the $\textit{SOTA}$ prove its effectiveness, not only defeating traditional schemes while retaining superior quality compared to existing attacks but also breaking $\textit{semantic}$ watermarks that alter the image's structure, reducing the best detection rate to $43\%$ and rendering them useless. To our knowledge, $\textit{UnMarker}$ is the first practical attack on $\textit{semantic}$ watermarks, which have been deemed the future of robust watermarking. $\textit{UnMarker}$ casts doubts on the very penitential of this countermeasure and exposes its paradoxical nature as designing schemes for robustness inevitably compromises other robustness aspects.
ChatGPT got an upgrade to make it seem more human
OpenAI's latest model offers a more human-like conversational experience OpenAI announced its newest artificial intelligence model, called GPT-4o, which will soon power some versions of the company's ChatGPT product. The upgraded ChatGPT can swiftly respond to text, audio and video inputs from its real-time conversational partner – all while speaking with inflections and wording that convey a strong sense of emotion and personality. The company demonstrated the emotional mimicry of the new voice mode during a supposedly live OpenAI presentation, featuring both the ChatGPT mobile app and a new desktop app, on 13 May. Speaking in a female-sounding voice and responding to the name ChatGPT, the new AI's conversational capabilities seemed more akin to the personable AI voiced by Scarlett Johansson in the 2013 science fiction film Her than to the more canned and robotic responses of typical voice assistant technologies. How this moment for AI will change society forever (and how it won't) "The new GPT-4o voice-to-voice interaction more closely parallels human-human interaction," says Michelle Cohn at the University of California, Davis.