Goto

Collaborating Authors

 Generative AI


Humans Aren't Mentally Ready for an AI-Saturated 'Post-Truth World'

WIRED

Artificial intelligence is arguably the most rapidly advancing technology humans have ever developed. A year ago you wouldn't often hear AI come up in a regular conversation, but today it seems there's constant talk about how generative AI tools like ChatGPT and DALL-E will affect the future of work, the spread of information, and more. A major question that has thus far been almost entirely unexamined is how this AI-dominated future will affect people's minds. There's been some research into how using AI in their jobs will affect people mentally, but there isn't yet an understanding of how simply living amongst so much AI-generated content and systems will affect people's sense of the world. How is AI going to change individuals and society in the not-too-distant future?


News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking

arXiv.org Artificial Intelligence

This study aimed to evaluate the proficiency of prominent Large Language Models (LLMs), namely OpenAI's ChatGPT 3.5 and 4.0, Google's Bard(LaMDA), and Microsoft's Bing AI in discerning the truthfulness of news items using black box testing. A total of 100 fact-checked news items, all sourced from independent fact-checking agencies, were presented to each of these LLMs under controlled conditions. Their responses were classified into one of three categories: True, False, and Partially True/False. The effectiveness of the LLMs was gauged based on the accuracy of their classifications against the verified facts provided by the independent agencies. The results showed a moderate proficiency across all models, with an average score of 65.25 out of 100. Among the models, OpenAI's GPT-4.0 stood out with a score of 71, suggesting an edge in newer LLMs' abilities to differentiate fact from deception. However, when juxtaposed against the performance of human fact-checkers, the AI models, despite showing promise, lag in comprehending the subtleties and contexts inherent in news information. The findings highlight the potential of AI in the domain of fact-checking while underscoring the continued importance of human cognitive skills and the necessity for persistent advancements in AI capabilities. Finally, the experimental data produced from the simulation of this work is openly available on Kaggle.


UniSG^GA: A 3D scenegraph powered by Geometric Algebra unifying geometry, behavior and GNNs towards generative AI

arXiv.org Artificial Intelligence

This work presents the introduction of UniSG^GA, a novel integrated scenegraph structure, that to incorporates behavior and geometry data on a 3D scene. It is specifically designed to seamlessly integrate Graph Neural Networks (GNNs) and address the challenges associated with transforming a 3D scenegraph (3D-SG) during generative tasks. To effectively capture and preserve the topological relationships between objects in a simplified way, within the graph representation, we propose UniSG^GA, that seamlessly integrates Geometric Algebra (GA) forms. This novel approach enhances the overall performance and capability of GNNs in handling generative and predictive tasks, opening up new possibilities and aiming to lay the foundation for further exploration and development of graph-based generative AI models that can effectively incorporate behavior data for enhanced scene generation and synthesis.


Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era

arXiv.org Artificial Intelligence

With various AI tools such as ChatGPT becoming increasingly popular, we are entering a true AI era. We can foresee that exceptional AI tools will soon reap considerable profits. A crucial question arise: should AI tools share revenue with their training data providers in additional to traditional stakeholders and shareholders? The answer is Yes. Large AI tools, such as large language models, always require more and better quality data to continuously improve, but current copyright laws limit their access to various types of data. Sharing revenue between AI tools and their data providers could transform the current hostile zero-sum game relationship between AI tools and a majority of copyrighted data owners into a collaborative and mutually beneficial one, which is necessary to facilitate the development of a virtuous cycle among AI tools, their users and data providers that drives forward AI technology and builds a healthy AI ecosystem. However, current revenue-sharing business models do not work for AI tools in the forthcoming AI era, since the most widely used metrics for website-based traffic and action, such as clicks, will be replaced by new metrics such as prompts and cost per prompt for generative AI tools. A completely new revenue-sharing business model, which must be almost independent of AI tools and be easily explained to data providers, needs to establish a prompt-based scoring system to measure data engagement of each data provider. This paper systematically discusses how to build such a scoring system for all data providers for AI tools based on classification and content similarity models, and outlines the requirements for AI tools or third parties to build it. Sharing revenue with data providers using such a scoring system would encourage more data owners to participate in the revenue-sharing program. This will be a utilitarian AI era where all parties benefit.


Congress is racing to regulate AI. Silicon Valley is eager to teach them how.

Washington Post - Technology News

Other industry leaders are taking a different tact, blitzing Congress with their vision for how Washington should regulate their companies. Altman in May had private meetings and a dinner with lawmakers, where he demonstrated -- to their amusement -- how ChatGPT could write a speech for them to deliver on the chamber floor. Smith has given legislators a lesson on the technical stack that underpins generative AI models like ChatGPT, including computing infrastructure and applications. And Smith recently unveiled his blueprint for AI regulation at a speech in Washington attended by half a dozen lawmakers.


Meta's Voicebox AI is a Dall-E for text-to-speech

Engadget

Today, we are one step closer to the immortal celebrity future we have long been promised (since April). Meta has unveiled Voicebox, its generative text-to-speech model that promises to do for the spoken word what ChatGPT and Dall-E, respectfully, did for text and image generation. Essentially, its a text-to-output generator just like GPT or Dall-E -- just instead of creating prose or pretty pictures, it spits out audio clips. Meta defines the system as "a non-autoregressive flow-matching model trained to infill speech, given audio context and text." It's been trained on more than 50,000 hours of unfiltered audio.


Anyone can Photoshop now, thanks to AI's latest leap

Washington Post - Technology News

But this AI leap means anyone can pull off at least a goofy Photoshop job now. The same AI tool can both transform pictures into joyful fun and be used to manipulate or even exploit. And that adds new urgency to a question first raised by Photoshop more than 30 years ago: How much longer will we be able to trust what we see? For a glimpse of what's coming to your photos -- both the ones you see and the ones you take -- we've been testing a beta version of Photoshop with its new AI function called "generative fill." It's a response by Photoshop's maker Adobe to a flurry of new AI image-creation tools that threaten to make it redundant, including DALL-E 2, Midjourney and Stable Diffusion.


Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback

arXiv.org Artificial Intelligence

Generative models capable of capturing nuanced clinical features in medical images hold great promise for facilitating clinical data sharing, enhancing rare disease datasets, and efficiently synthesizing annotated medical images at scale. Despite their potential, assessing the quality of synthetic medical images remains a challenge. While modern generative models can synthesize visually-realistic medical images, the clinical validity of these images may be called into question. Domain-agnostic scores, such as FID score, precision, and recall, cannot incorporate clinical knowledge and are, therefore, not suitable for assessing clinical sensibility. Additionally, there are numerous unpredictable ways in which generative models may fail to synthesize clinically plausible images, making it challenging to anticipate potential failures and manually design scores for their detection. To address these challenges, this paper introduces a pathologist-in-the-loop framework for generating clinically-plausible synthetic medical images. Starting with a diffusion model pretrained using real images, our framework comprises three steps: (1) evaluating the generated images by expert pathologists to assess whether they satisfy clinical desiderata, (2) training a reward model that predicts the pathologist feedback on new samples, and (3) incorporating expert knowledge into the diffusion model by using the reward model to inform a finetuning objective. We show that human feedback significantly improves the quality of synthetic images in terms of fidelity, diversity, utility in downstream applications, and plausibility as evaluated by experts.


Structured Thoughts Automaton: First Formalized Execution Model for Auto-Regressive Language Models

arXiv.org Artificial Intelligence

In recent months, Language Models (LMs) have become a part of daily discourse, with focus on OpenAI and the potential of Artificial General Intelligence (AGI). Furthermore, the leaking of LLama's weights to the public has led to an influx of innovations demonstrating the impressive capabilities of generative LMs. While we believe that AGI is still a distant goal, we recognize the potential of LMs in solving tasks such as searching complex documents, compiling reports with basic analysis, and providing assistance in problem-solving. In this paper, we propose formalizing the execution model of language models. We investigate current execution models, to find that this formalism has received little attention, and present our contribution: the first formalized execution model for LMs. We introduce a new algorithm for sampling the predictions of LMs, which we use to build a reliable and inspectable execution model. We introduce a low-level language to write "cognitive program" for this execution model. We hope to shed light on the need for execution models for LMs and encourage further research in this area.


Drag-guided diffusion models for vehicle image generation

arXiv.org Artificial Intelligence

Denoising diffusion models trained at web-scale have revolutionized image generation. The application of these tools to engineering design is an intriguing possibility, but is currently limited by their inability to parse and enforce concrete engineering constraints. In this paper, we take a step towards this goal by proposing physics-based guidance, which enables optimization of a performance metric (as predicted by a surrogate model) during the generation process. As a proof-of-concept, we add drag guidance to Stable Diffusion, which allows this tool to generate images of novel vehicles while simultaneously minimizing their predicted drag coefficients.