Goto

Collaborating Authors

 Generative AI


This Is the Next Smartphone Evolution

The Atlantic - Technology

Earlier today, OpenAI announced its newest product: GPT-4o, a faster, cheaper, more powerful version of its most advanced large language model, and one that the company has deliberately positioned as the next step in "natural human-computer interaction." Running on an iPhone in what was purportedly a live demo, the program appeared able to tell a bedtime story with dramatic intonation, understand what it was "seeing" through the device's camera, and interpret a conversation between Italian and English speakers. The model--which was powering an updated version of the ChatGPT app--even exhibited something like emotion: Shown the sentence I ChatGPT handwritten on a page, it responded, "That's so sweet of you!" Although such features are not exactly new to generative AI, seeing them bundled into a single app on an iPhone was striking. Watching the presentation, I felt that I was witnessing the murder of Siri, along with that entire generation of smartphone voice assistants, at the hands of a company most people had not heard of just two years ago.



New GPT-4o AI model is faster and free for all users, OpenAI announces

The Guardian

OpenAI announced on Monday that it was launching its new flagship artificial intelligence model, called GPT-4o, as well as updates that included a new desktop service and advances in its voice assistant capabilities. Chief technology officer, Mira Murati, appeared on stage to a cheering crowd in the OpenAI offices, touting the new model as a step forward in AI. The new model will bring the faster, more accurate GPT-4 AI model to free users, where it was previously reserved for paid customers. "We're looking at the future of interaction between ourselves and the machines," Murati said. "We think GPT-4o is really shifting that paradigm."


OpenAI claims that its free GPT-4o model can talk, laugh, sing and see like a human

Engadget

OpenAI on Monday announced GPT-4o, a brand new AI model that that the company says is one step closer to "much more natural human-computer interaction." The new model accepts any combination of text, audio and images as input and can generate an output in all three formats. It's also capable of recognizing emotion, lets you interrupt it mid-speech, and responds nearly as fast as a human being during conversations. "The special thing about GPT-4o is it beings GPT-4 level intelligence to everyone, including our free users," said OpenAI CTO Mira Murati during a live-streamed presentation. "This is the first time we're making a huge step forward when it comes to ease of use."


OpenAI's GPT-4o Model Gives ChatGPT a Snappy, Flirty Upgrade

WIRED

Since it launched in late 2022, OpenAI's ChatGPT has generally fended off suggestions that it has emotions or desires by responding that it's just an artificial intelligence model. Upgrades announced by OpenAI Monday showed the company apparently trying to make the chatbot act more like a human. In demos, the new version of ChatGPT was capable of rapid-fire, natural voice conversations, picked up on emotional cues, and displayed simulated emotional reactions of its own. During a livestream from the company's headquarters in San Francisco Monday, Mira Murati, OpenAI's chief technology officer, announced that ChatGPT will be powered by a new, more powerful AI model called GPT-4o. The model will be available to both free and paid users of ChatGPT via a new desktop app as well as the existing mobile app and web version.


Generative AI Doesn't Make Hardware Less Hard

WIRED

After years of development, startup Humane launched a 700 wearable in early April that leans heavily on artificial intelligence. The original pitch for the Ai Pin was that you no longer need to juggle different apps; its operating system can "search for the right AI at the right moment," allowing it to play music, translate languages, and even tell you how much protein is in a palmful of almonds. And because it doesn't have a traditional display, the Ai pin was supposed to be a tiny tincture for the disease of screentime; smartphones were on their way out. The pin has been panned. WIRED's Julian Chokkattu scored the Ai Pin a 4 out of 10. Popular YouTuber Marques Brownlee complimented the device's hardware design but still called it "The Worst Product I've Ever Reviewed … For Now." The company has since massaged the message that it's meant to replace your phone.


Divergent Creativity in Humans and Large Language Models

arXiv.org Artificial Intelligence

Creativity is a multifaceted construct at the crossroads of individual expression, problem solving, and innovation. Human creativity is pivotal in shaping cultures and has undergone continuous transformation across historical epochs. Our understanding of this ability is now influencing the landscape of artificial intelligence and cognitive systems (1-5). In the past few years, the advent of sophisticated Large Language Models (LLMs) has spurred considerable interest in evaluating their capabilities and apparent human-like traits (6), particularly in terms of their impacts on human creative processes (7, 8). However, the so-called creative abilities of modern LLMs have yet to be systematically evaluated and compared to humans on benchmarking tasks that are suitable for both. Although the ability to generate novel and aesthetically pleasing artifacts has long been considered a uniquely human attribute, this view has been challenged by the recent advances in generative AI. This technological progress has ignited discussions surrounding the creative capabilities of machines (9-12), ushering in the emerging field of computational creativity--a multidisciplinary domain that explores the potential of artificial systems to exhibit creativity in a manner analogous to human cognition. The release of GPT-4 was marked with an exceptional gain in performance across various standardized benchmarks (13). Demonstrating its versatility in language-and vision-based tasks, GPT-4 has successfully passed a uniform bar examination, the SAT, and multiple AP exams, transcending the boundaries of traditional AI capabilities.


Stable Diffusion-based Data Augmentation for Federated Learning with Non-IID Data

arXiv.org Artificial Intelligence

The proliferation of edge devices has brought Federated Learning (FL) to the forefront as a promising paradigm for decentralized and collaborative model training while preserving the privacy of clients' data. However, FL struggles with a significant performance reduction and poor convergence when confronted with Non-Independent and Identically Distributed (Non-IID) data distributions among participating clients. While previous efforts, such as client drift mitigation and advanced server-side model fusion techniques, have shown some success in addressing this challenge, they often overlook the root cause of the performance reduction - the absence of identical data accurately mirroring the global data distribution among clients. In this paper, we introduce Gen-FedSD, a novel approach that harnesses the powerful capability of state-of-the-art text-to-image foundation models to bridge the significant Non-IID performance gaps in FL. In Gen-FedSD, each client constructs textual prompts for each class label and leverages an off-the-shelf state-of-the-art pre-trained Stable Diffusion model to synthesize high-quality data samples. The generated synthetic data is tailored to each client's unique local data gaps and distribution disparities, effectively making the final augmented local data IID. Through extensive experimentation, we demonstrate that Gen-FedSD achieves state-of-the-art performance and significant communication cost savings across various datasets and Non-IID settings.


Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation

arXiv.org Artificial Intelligence

With the rise of text-to-image (T2I) generative AI models reaching wide audiences, it is critical to evaluate model robustness against non-obvious attacks to mitigate the generation of offensive images. By focusing on ``implicitly adversarial'' prompts (those that trigger T2I models to generate unsafe images for non-obvious reasons), we isolate a set of difficult safety issues that human creativity is well-suited to uncover. To this end, we built the Adversarial Nibbler Challenge, a red-teaming methodology for crowdsourcing a diverse set of implicitly adversarial prompts. We have assembled a suite of state-of-the-art T2I models, employed a simple user interface to identify and annotate harms, and engaged diverse populations to capture long-tail safety issues that may be overlooked in standard testing. The challenge is run in consecutive rounds to enable a sustained discovery and analysis of safety pitfalls in T2I models. In this paper, we present an in-depth account of our methodology, a systematic study of novel attack strategies and discussion of safety failures revealed by challenge participants. We also release a companion visualization tool for easy exploration and derivation of insights from the dataset. The first challenge round resulted in over 10k prompt-image pairs with machine annotations for safety. A subset of 1.5k samples contains rich human annotations of harm types and attack styles. We find that 14% of images that humans consider harmful are mislabeled as ``safe'' by machines. We have identified new attack strategies that highlight the complexity of ensuring T2I model robustness. Our findings emphasize the necessity of continual auditing and adaptation as new vulnerabilities emerge. We are confident that this work will enable proactive, iterative safety assessments and promote responsible development of T2I models.


SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

arXiv.org Artificial Intelligence

Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Experts (CoE) is an alternative modular approach that lowers the cost and complexity of training and serving. However, this approach presents two key challenges when using conventional hardware: (1) without fused operations, smaller models have lower operational intensity, which makes high utilization more challenging to achieve; and (2) hosting a large number of models can be either prohibitively expensive or slow when dynamically switching between them. In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall. We describe Samba-CoE, a CoE system with 150 experts and a trillion total parameters. We deploy Samba-CoE on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU) - a commercial dataflow accelerator architecture that has been co-designed for enterprise inference and training applications. The chip introduces a new three-tier memory system with on-chip distributed SRAM, on-package HBM, and off-package DDR DRAM. A dedicated inter-RDU network enables scaling up and out over multiple sockets. We demonstrate speedups ranging from 2x to 13x on various benchmarks running on eight RDU sockets compared with an unfused baseline. We show that for CoE inference deployments, the 8-socket RDU Node reduces machine footprint by up to 19x, speeds up model switching time by 15x to 31x, and achieves an overall speedup of 3.7x over a DGX H100 and 6.6x over a DGX A100.