Goto

Collaborating Authors

 Generative AI


OpenAI will block people in China from using its services

Engadget

OpenAI plans to block people from using ChatGPT in China, a country where its services aren't officially available, but where users and developers access it via the company's API anyway. Securities Times, a Chinese state-owned newspaper reported on Tuesday that OpenAI had started sending emails to users in China outlining its plans to block access starting July 9, according to Reuters. "We are taking additional taps to block API traffic from regions where we do not support access to OpenAI's services," an OpenAI spokesperson told the publication. The move could impact several Chinese startups which have built applications using OpenAI's large language models. Although OpenAI's services are available in more than 160 countries, China isn't one of them.


Claude 3.5 suggests AI's looming ubiquity could be a good thing

The Guardian

The frontier of AI just got pushed a little further forward. On Friday, Anthropic, the AI lab set up by a team of disgruntled OpenAI staffers, released the latest version of its Claude LLM. The company said Thursday that the new model – the technology that underpins its popular chatbot Claude – is twice as fast as its most powerful previous version. Anthropic said in its evaluations, the model outperforms leading competitors like OpenAI on several key intelligence capabilities, such as coding and text-based reasoning. Anthropic only released the previous version of Claude, 3.0, in March.


Generative AI Systems: A Systems-based Perspective on Generative AI

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have revolutionized AI systems by enabling communication with machines using natural language. Recent developments in Generative AI (GenAI) like Vision-Language Models (GPT-4V) and Gemini have shown great promise in using LLMs as multimodal systems. This new research line results in building Generative AI systems, GenAISys for short, that are capable of multimodal processing and content creation, as well as decision-making. GenAISys use natural language as a communication means and modality encoders as I/O interfaces for processing various data sources. They are also equipped with databases and external specialized tools, communicating with the system through a module for information retrieval and storage. This paper aims to explore and state new research directions in Generative AI Systems, including how to design GenAISys (compositionality, reliability, verifiability), build and train them, and what can be learned from the system-based perspective. Cross-disciplinary approaches are needed to answer open questions about the inner workings of GenAI systems.


Transforming Software Development: Evaluating the Efficiency and Challenges of GitHub Copilot in Real-World Projects

arXiv.org Artificial Intelligence

Generative AI technologies promise to transform the product development lifecycle. This study evaluates the efficiency gains, areas for improvement, and emerging challenges of using GitHub Copilot, an AI-powered coding assistant. We identified 15 software development tasks and assessed Copilot's benefits through real-world projects on large proprietary code bases. Our findings indicate significant reductions in developer toil, with up to 50% time saved in code documentation and autocompletion, and 30-40% in repetitive coding tasks, unit test generation, debugging, and pair programming. However, Copilot struggles with complex tasks, large functions, multiple files, and proprietary contexts, particularly with C/C++ code. We project a 33-36% time reduction for coding-related tasks in a cloud-first software development lifecycle. This study aims to quantify productivity improvements, identify underperforming scenarios, examine practical benefits and challenges, investigate performance variations across programming languages, and discuss emerging issues related to code quality, security, and developer experience.


Accelerating Clinical Evidence Synthesis with Large Language Models

arXiv.org Artificial Intelligence

Automatic medical discovery by AI is a dream of many. One step toward that goal is to create an AI model to understand clinical studies and synthesize clinical evidence from the literature. Clinical evidence synthesis currently relies on systematic reviews of clinical trials and retrospective analyses from medical literature. However, the rapid expansion of publications presents challenges in efficiently identifying, summarizing, and updating evidence. We introduce TrialMind, a generative AI-based pipeline for conducting medical systematic reviews, encompassing study search, screening, and data extraction phases. We utilize large language models (LLMs) to drive each pipeline component while incorporating human expert oversight to minimize errors. To facilitate evaluation, we also create a benchmark dataset TrialReviewBench, a custom dataset with 870 annotated clinical studies from 25 meta-analysis papers across various medical treatments. Our results demonstrate that TrialMind significantly improves the literature review process, achieving high recall rates (0.897-1.000) in study searching from over 20 million PubMed studies and outperforming traditional language model embeddings-based methods in screening (Recall@20 of 0.227-0.246 vs. 0.000-0.102). Furthermore, our approach surpasses direct GPT-4 performance in result extraction, with accuracy ranging from 0.65 to 0.84. We also support clinical evidence synthesis in forest plots, as validated by eight human annotators who preferred TrialMind over the GPT-4 baseline with a winning rate of 62.5%-100% across the involved reviews. Our findings suggest that an LLM-based clinical evidence synthesis approach, such as TrialMind, can enable reliable and high-quality clinical evidence synthesis to improve clinical research efficiency.


AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies

arXiv.org Artificial Intelligence

We present a comprehensive AI risk taxonomy derived from eight government policies from the European Union, United States, and China and 16 company policies worldwide, making a significant step towards establishing a unified language for generative AI safety evaluation. We identify 314 unique risk categories, organized into a four-tiered taxonomy. At the highest level, this taxonomy encompasses System & Operational Risks, Content Safety Risks, Societal Risks, and Legal & Rights Risks. The taxonomy establishes connections between various descriptions and approaches to risk, highlighting the overlaps and discrepancies between public and private sector conceptions of risk. By providing this unified framework, we aim to advance AI safety through information sharing across sectors and the promotion of best practices in risk mitigation for generative AI models and systems.


Roundtables: The Future of AI Games

MIT Technology Review

Watch the ondemand video of the Roundtables session: The Future of AI Games. Available only to MIT Alumni and subscribers. Featured speakers are Niall Firth, executive editor, and Allison Arieff, editorial director. Learn how generative AI is opening up new possibilities in gaming.


Synthesia's hyperrealistic deepfakes will soon have full bodies

MIT Technology Review

No one else is able to do that," says Jack Saunders, a researcher at the University of Bath, who was not involved in Synthesia's work. The full-body avatars he previewed are very good, he says, despite small errors such as hands "slicing" into each other at times. But "chances are you're not really going to be looking that close to notice it," Saunders says. Synthesia launched its first version of hyperrealistic AI avatars, also known as deepfakes, in April. These avatars use large language models to match expressions and tone of voice to the sentiment of spoken text.


Towards a Science Exocortex

arXiv.org Artificial Intelligence

Artificial intelligence (AI) methods are poised to revolutionize intellectual work, with generative AI enabling automation of text analysis, text generation, and simple decision making or reasoning. The impact to science is only just beginning, but the opportunity is significant since scientific research relies fundamentally on extended chains of cognitive work. Here, we review the state of the art in agentic AI systems, and discuss how these methods could be extended to have even greater impact on science. We propose the development of an exocortex, a synthetic extension of a person's cognition. A science exocortex could be designed as a swarm of AI agents, with each agent individually streamlining specific researcher tasks, and whose inter-communication leads to emergent behavior that greatly extend the researcher's cognition and volition.


Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models

arXiv.org Artificial Intelligence

The rapid advancement of Text-to-Image(T2I) generative models has enabled the synthesis of high-quality images guided by textual descriptions. Despite this significant progress, these models are often susceptible in generating contents that contradict the input text, which poses a challenge to their reliability and practical deployment. To address this problem, we introduce a novel diffusion-based framework to significantly enhance the alignment of generated images with their corresponding descriptions, addressing the inconsistency between visual output and textual input. Our framework is built upon a comprehensive analysis of inconsistency phenomena, categorizing them based on their manifestation in the image. Leveraging a state-of-the-art large language module, we first extract objects and construct a knowledge graph to predict the locations of these objects in potentially generated images. We then integrate a state-of-the-art controllable image generation model with a visual text generation module to generate an image that is consistent with the original prompt, guided by the predicted object locations. Through extensive experiments on an advanced multimodal hallucination benchmark, we demonstrate the efficacy of our approach in accurately generating the images without the inconsistency with the original prompt.