Generative AI
Sarah Silverman's copyright infringement suit against OpenAI will advance in pared-down form
Sarah Silverman's lawsuit against OpenAI will advance with some of her legal team's claims dismissed. The comedian sued OpenAI and Meta in July 2023, claiming they trained their AI models on her books and other work without consent. Bloomberg reported on Tuesday that the unfair competition portion of the lawsuit will proceed. Judge Martínez-Olguín gave the plaintiffs until March 13 to amend the suit. US District Judge Araceli Martínez-Olguín threw out portions of the complaint from Silverman's legal team Monday, including negligence, unjust enrichment, DMCA violations and accusations of vicarious infringement.
ChatGPT is getting a digital memory to recall your past conversations
One of the big drawbacks of talking to an AI chatbot is that everything resets once the conversation is done. It won't remember who you are or what you previously queried. This is by design, for privacy reasons, but it really hampers the tech from growing into a true digital assistant that knows you well enough to actually help with stuff. OpenAI is trying to fix this issue and is finally adding a memory feature to ChatGPT. This will allow the bot to remember important personal details from prior conversations and apply that context to current queries.
OpenAI Gives ChatGPT a Memory
The promise and peril of the internet has always been a memory greater than our own, a permanent recall of information and events that our brains can't store. More recently, tech companies have promised that virtual assistants and chatbots could handle some of the mnemonic load, by both remembering and reminding. That's what OpenAI's latest release is supposed to provide. The company is starting to roll out long-term memory in ChatGPT--a function that maintains a memory of who you are, how you work, and what you like to chat about. Called simply Memory, it's an AI personalization feature that turbocharges the "custom instructions" tool OpenAI released last July.
Why Big Tech's watermarking plans are some welcome good news
On February 6, Meta said it was going to label AI-generated images on Facebook, Instagram, and Threads. When someone uses Meta's AI tools to create images, the company will add visible markers to the image, as well as invisible watermarks and metadata in the image file. The company says its standards are in line with best practices laid out by the Partnership on AI, an AI research nonprofit. Big Tech is also throwing its weight behind a promising technical standard that could add a "nutrition label" to images, video, and audio. Called C2PA, it's an open-source internet protocol that relies on cryptography to encode details about the origins of a piece of content, or what technologists refer to as "provenance" information.
Large Language Models for the Automated Analysis of Optimization Algorithms
Sartori, Camilo Chacón, Blum, Christian, Ochoa, Gabriela
The ability of Large Language Models (LLMs) to generate high-quality text and code has fuelled their rise in popularity. In this paper, we aim to demonstrate the potential of LLMs within the realm of optimization algorithms by integrating them into STNWeb. This is a web-based tool for the generation of Search Trajectory Networks (STNs), which are visualizations of optimization algorithm behavior. Although visualizations produced by STNWeb can be very informative for algorithm designers, they often require a certain level of prior knowledge to be interpreted. In an attempt to bridge this knowledge gap, we have incorporated LLMs, specifically GPT-4, into STNWeb to produce extensive written reports, complemented by automatically generated plots, thereby enhancing the user experience and reducing the barriers to the adoption of this tool by the research community. Moreover, our approach can be expanded to other tools from the optimization community, showcasing the versatility and potential of LLMs in this field.
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review
The advent of generative artificial intelligence and the widespread adoption of it in society engendered intensive debates about its ethical implications and risks. These risks often differ from those associated with traditional discriminative machine learning. To synthesize the recent discourse and map its normative concepts, we conducted a scoping review on the ethics of generative artificial intelligence, including especially large language models and text-to-image models. Our analysis provides a taxonomy of 378 normative issues in 19 topic areas and ranks them according to their prevalence in the literature. The study offers a comprehensive overview for scholars, practitioners, or policymakers, condensing the ethical debates surrounding fairness, safety, harmful content, hallucinations, privacy, interaction risks, security, alignment, societal impacts, and others. We discuss the results, evaluate imbalances in the literature, and explore unsubstantiated risk scenarios.
Combining Insights From Multiple Large Language Models Improves Diagnostic Accuracy
Barabucci, Gioele, Shia, Victor, Chu, Eugene, Harack, Benjamin, Fu, Nathan
Background: Large language models (LLMs) such as OpenAI's GPT-4 or Google's PaLM 2 are proposed as viable diagnostic support tools or even spoken of as replacements for "curbside consults". However, even LLMs specifically trained on medical topics may lack sufficient diagnostic accuracy for real-life applications. Methods: Using collective intelligence methods and a dataset of 200 clinical vignettes of real-life cases, we assessed and compared the accuracy of differential diagnoses obtained by asking individual commercial LLMs (OpenAI GPT-4, Google PaLM 2, Cohere Command, Meta Llama 2) against the accuracy of differential diagnoses synthesized by aggregating responses from combinations of the same LLMs. Results: We find that aggregating responses from multiple, various LLMs leads to more accurate differential diagnoses (average accuracy for 3 LLMs: $75.3\%\pm 1.6pp$) compared to the differential diagnoses produced by single LLMs (average accuracy for single LLMs: $59.0\%\pm 6.1pp$). Discussion: The use of collective intelligence methods to synthesize differential diagnoses combining the responses of different LLMs achieves two of the necessary steps towards advancing acceptance of LLMs as a diagnostic support tool: (1) demonstrate high diagnostic accuracy and (2) eliminate dependence on a single commercial vendor.
ChatGPT vs LLaMA: Impact, Reliability, and Challenges in Stack Overflow Discussions
Da Silva, Leuson, Samhi, Jordan, Khomh, Foutse
Since its release in November 2022, ChatGPT has shaken up Stack Overflow, the premier platform for developers' queries on programming and software development. Demonstrating an ability to generate instant, human-like responses to technical questions, ChatGPT has ignited debates within the developer community about the evolving role of human-driven platforms in the age of generative AI. Two months after ChatGPT's release, Meta released its answer with its own Large Language Model (LLM) called LLaMA: the race was on. We conducted an empirical study analyzing questions from Stack Overflow and using these LLMs to address them. This way, we aim to (ii) measure user engagement evolution with Stack Overflow over time; (ii) quantify the reliability of LLMs' answers and their potential to replace Stack Overflow in the long term; (iii) identify and understand why LLMs fails; and (iv) compare LLMs together. Our empirical results are unequivocal: ChatGPT and LLaMA challenge human expertise, yet do not outperform it for some domains, while a significant decline in user posting activity has been observed. Furthermore, we also discuss the impact of our findings regarding the usage and development of new LLMs.
A Survey of Generative AI for De Novo Drug Design: New Frontiers in Molecule and Protein Generation
Tang, Xiangru, Dai, Howard, Knight, Elizabeth, Wu, Fang, Li, Yunyang, Li, Tianxiao, Gerstein, Mark
Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative models for de novo drug design, in particular, focus on the creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development in the field, combined with the inherent complexity of the drug design process, creates a difficult landscape for new researchers to enter. In this survey, we organize de novo drug design into two overarching themes: small molecule and protein generation. Within each theme, we identify a variety of subtasks and applications, highlighting important datasets, benchmarks, and model architectures and comparing the performance of top models. We take a broad approach to AI-driven drug design, allowing for both micro-level comparisons of various methods within each subtask and macro-level observations across different fields. We discuss parallel challenges and approaches between the two applications and highlight future directions for AI-driven de novo drug design as a whole. An organized repository of all covered sources is available at https://github.com/gersteinlab/GenAI4Drug.
Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot
Fagadau, Ionut Daniel, Mariani, Leonardo, Micucci, Daniela, Riganelli, Oliviero
Generative AI is changing the way developers interact with software systems, providing services that can produce and deliver new content, crafted to satisfy the actual needs of developers. For instance, developers can ask for new code directly from within their IDEs by writing natural language prompts, and integrated services based on generative AI, such as Copilot, immediately respond to prompts by providing ready-to-use code snippets. Formulating the prompt appropriately, and incorporating the useful information while avoiding any information overload, can be an important factor in obtaining the right piece of code. The task of designing good prompts is known as prompt engineering. In this paper, we systematically investigate the influence of eight prompt features on the style and the content of prompts, on the level of correctness, complexity, size, and similarity to the developers' code of the generated code. We specifically consider the task of using Copilot with 124,800 prompts obtained by systematically combining the eight considered prompt features to generate the implementation of 200 Java methods. Results show how some prompt features, such as the presence of examples and the summary of the purpose of the method, can significantly influence the quality of the result.