Goto

Collaborating Authors

 Generative AI


The Generative AI Copyright Fight Is Just Getting Started

WIRED

The biggest fight of the generative AI revolution is headed to the courtroom--and no, it's not about the latest boardroom drama at OpenAI. Book authors, artists, and coders are challenging the practice of teaching AI models to replicate their skills using their own work as a training manual. But as image generators and other tools have proven able to impressively mimic works in their training data, and the scale and value of training data has become clear, creators are increasingly crying foul. At LiveWIRED in San Francisco, the 30th anniversary event for WIRED magazine, two leaders of that nascent resistance sparred with a defender of the rights of AI companies to develop the technology unencumbered. From left to right: WIRED senior writer Kate Knibbs discussed creators' rights and AI with Mike Masnick, Mary Rasenberger, and Matthew Butterick at LiveWIRED in San Francisco,.


OpenAI Cofounder Reid Hoffman Gives Sam Altman a Vote of Confidence

WIRED

OpenAI cofounder Reid Hoffman says the company is better off with Sam Altman restored as CEO, and he was shocked that board members he used to serve alongside would think otherwise. Hoffman, who left OpenAI's board in March after cofounding the competitor Inflection AI, offered his first comments on the recent chaos at OpenAI on stage at WIRED's LiveWIRED 30th anniversary event in San Francisco on Tuesday. "Surprise would be an understatement," he said about his reaction to learning of Altman's firing. After employees and investors revolted, Altman got his job back days later. "We are in a much better place for the world to have Sam as CEO. He's very competent in that," said Hoffman, who with Elon Musk and other wealthy tech luminaries formed the earliest vision for OpenAI when it was founded in 2015.


Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

arXiv.org Artificial Intelligence

This paper presents CyberSecEval, a comprehensive benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. As what we believe to be the most extensive unified cybersecurity safety benchmark to date, CyberSecEval provides a thorough evaluation of LLMs in two crucial security domains: their propensity to generate insecure code and their level of compliance when asked to assist in cyberattacks. Through a case study involving seven models from the Llama 2, Code Llama, and OpenAI GPT large language model families, CyberSecEval effectively pinpointed key cybersecurity risks. More importantly, it offered practical insights for refining these models. A significant observation from the study was the tendency of more advanced models to suggest insecure code, highlighting the critical need for integrating security considerations in the development of sophisticated LLMs. CyberSecEval, with its automated test case generation and evaluation pipeline covers a broad scope and equips LLM designers and researchers with a tool to broadly measure and enhance the cybersecurity safety properties of LLMs, contributing to the development of more secure AI systems.


A Low-Overhead Incorporation-Extrapolation based Few-Shot CSI Feedback Framework for Massive MIMO Systems

arXiv.org Artificial Intelligence

Accurate channel state information (CSI) is essential for downlink precoding at the base station (BS), especially for frequency FDD wideband massive MIMO systems with OFDM. In FDD systems, CSI is attained through CSI feedback from the user equipment (UE). However, large-scale antennas and large number of subcarriers significantly increase CSI feedback overhead. Deep learning-based CSI feedback methods have received tremendous attention in recent years due to their great capability of compressing CSI. Nonetheless, large amounts of collected samples are required to train deep learning models, which is severely challenging in practice. Besides, with the rapidly increasing number of antennas and subcarriers, most of these deep learning methods' CSI feedback overhead also grow dramatically, owing to their focus on full-dimensional CSI feedback. To address this issue, in this paper, we propose a low-overhead Incorporation-Extrapolation based Few-Shot CSI feedback Framework (IEFSF) for massive MIMO systems. To further reduce the feedback overhead, a low-dimensional eigenvector-based CSI matrix is first formed with the incorporation process at the UE, and then recovered to the full-dimensional eigenvector-based CSI matrix at the BS via the extrapolation process. After that, to alleviate the necessity of the extensive collected samples and enable few-shot CSI feedback, we further propose a knowledge-driven data augmentation method and an artificial intelligence-generated content (AIGC) -based data augmentation method by exploiting the domain knowledge of wireless channels and by exploiting a novel generative model, respectively. Numerical results demonstrate that the proposed IEFSF can significantly reduce CSI feedback overhead by 16 times compared with existing CSI feedback methods while maintaining higher feedback accuracy using only several hundreds of collected samples.


Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

arXiv.org Artificial Intelligence

We introduce Llama Guard, an LLM-based input-output safeguard model geared towards Human-AI conversation use cases. Our model incorporates a safety risk taxonomy, a valuable tool for categorizing a specific set of safety risks found in LLM prompts (i.e., prompt classification). This taxonomy is also instrumental in classifying the responses generated by LLMs to these prompts, a process we refer to as response classification. For the purpose of both prompt and response classification, we have meticulously gathered a dataset of high quality. Llama Guard, a Llama2-7b model that is instruction-tuned on our collected dataset, albeit low in volume, demonstrates strong performance on existing benchmarks such as the OpenAI Moderation Evaluation dataset and ToxicChat, where its performance matches or exceeds that of currently available content moderation tools. Llama Guard functions as a language model, carrying out multi-class classification and generating binary decision scores. Furthermore, the instruction fine-tuning of Llama Guard allows for the customization of tasks and the adaptation of output formats. This feature enhances the model's capabilities, such as enabling the adjustment of taxonomy categories to align with specific use cases, and facilitating zero-shot or few-shot prompting with diverse taxonomies at the input. We are making Llama Guard model weights available and we encourage researchers to further develop and adapt them to meet the evolving needs of the community for AI safety.


Exploring the Limits of ChatGPT in Software Security Applications

arXiv.org Artificial Intelligence

Large language models (LLMs) have undergone rapid evolution and achieved remarkable results in recent times. OpenAI's ChatGPT, backed by GPT-3.5 or GPT-4, has gained instant popularity due to its strong capability across a wide range of tasks, including natural language tasks, coding, mathematics, and engaging conversations. However, the impacts and limits of such LLMs in system security domain are less explored. In this paper, we delve into the limits of LLMs (i.e., ChatGPT) in seven software security applications including vulnerability detection/repair, debugging, debloating, decompilation, patching, root cause analysis, symbolic execution, and fuzzing. Our exploration reveals that ChatGPT not only excels at generating code, which is the conventional application of language models, but also demonstrates strong capability in understanding user-provided commands in natural languages, reasoning about control and data flows within programs, generating complex data structures, and even decompiling assembly code. Notably, GPT-4 showcases significant improvements over GPT-3.5 in most security tasks. Also, certain limitations of ChatGPT in security-related tasks are identified, such as its constrained ability to process long code contexts.


On Sarcasm Detection with OpenAI GPT-based Models

arXiv.org Artificial Intelligence

Sarcasm is a form of irony that requires readers or listeners to interpret its intended meaning by considering context and social cues. Machine learning classification models have long had difficulty detecting sarcasm due to its social complexity and contradictory nature. This paper explores the applications of the Generative Pretrained Transformer (GPT) models, including GPT-3, InstructGPT, GPT-3.5, and GPT-4, in detecting sarcasm in natural language. It tests fine-tuned and zero-shot models of different sizes and releases. The GPT models were tested on the political and balanced (pol-bal) portion of the popular Self-Annotated Reddit Corpus (SARC 2.0) sarcasm dataset. In the fine-tuning case, the largest fine-tuned GPT-3 model achieves accuracy and $F_1$-score of 0.81, outperforming prior models. In the zero-shot case, one of GPT-4 models yields an accuracy of 0.70 and $F_1$-score of 0.75. Other models score lower. Additionally, a model's performance may improve or deteriorate with each release, highlighting the need to reassess performance after each release.


MAUVE Scores for Generative Models: Theory and Practice

arXiv.org Artificial Intelligence

Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target distribution is central to diagnosing existing models and developing better ones. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore three approaches to statistically estimate these scores: vector quantization, non-parametric estimation, and classifier-based estimation. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We demonstrate in the vision domain that MAUVE can identify known properties of generated images on par with or better than existing metrics. In conclusion, we present practical recommendations for using MAUVE effectively with language and image modalities.


Generative AI's iPhone Moment

The Atlantic - Technology

After nearly seven months of rumors and delays, Google has finally released its most advanced generative-AI model to date: Gemini 1.0, a program the company is advertising as one of the most capable pieces of software ever. It can purportedly solve calculus problems, explain memes, write code, and--in a real example offered by the company--provide feedback on cooking photos to help you decide when your omelet is done. Google is even billing Gemini as "a first step toward a truly universal AI model," one that is designed from the ground up to engage with images, video, text, audio, and computer code in a range of contexts. And, somehow, it all feels a bit underwhelming. Perhaps that is because today's announcement feels like any other Silicon Valley product launch.


Meta's AI image generator is available as a standalone website

Engadget

Meta has launched a standalone version of its image generator as it tests dozens of new generative AI features across Facebook, Instagram and WhatsApp. The image generator, called Imagine, was first previewed at the company's Connect event in November and has been available as part of Meta's AI chatbot. Now, with its own dedicated website at imagine.meta.com, the tool will be available outside of the company's messaging apps. Like other generative AI tools, Imagine allows users to create images from simple text prompts. Imagine, which relies on Meta's Emu model, will generate four images for each prompt.