Goto

Collaborating Authors

 Generative AI


The Critique of Critique

arXiv.org Artificial Intelligence

Critique, as a natural language description for assessing the quality of model-generated content, has been proven to play an essential role in the training, evaluation, and refinement of Large Language Models (LLMs). However, there is a lack of principled understanding in evaluating the quality of the critique itself. In this paper, we pioneer the critique of critique, termed MetaCritique, which is a framework to evaluate the critique from two aspects, i.e., factuality as precision score and comprehensiveness as recall score. We calculate the harmonic mean of precision and recall as the overall rating called F1 score. To obtain a reliable evaluation outcome, we propose Atomic Information Units (AIUs), which describe the critique in a more fine-grained manner. MetaCritique takes each AIU into account and aggregates each AIU's judgment for the overall score. Moreover, given the evaluation process involves intricate reasoning, our MetaCritique provides a natural language rationale to support each judgment. We construct a meta-evaluation dataset containing 300 critiques (2653 AIUs) across four tasks (question answering, reasoning, entailment, and summarization), and we conduct a comparative study to demonstrate the feasibility and effectiveness. Experiments also show superior critique judged by MetaCritique leads to better refinement, indicating generative artificial intelligence indeed has the potential to be significantly advanced with our MetaCritique. We will release relevant code and meta-evaluation datasets at https://github.com/GAIR-NLP/MetaCritique.


Large-scale Generative AI Models Lack Visual Number Sense

arXiv.org Artificial Intelligence

Humans can readily judge the number of objects in a visual scene, even without counting, and such a skill has been documented in a variety of animal species and in babies prior to language development and formal schooling. Numerical judgments are error-free for small sets, while for larger collections responses become approximate, with variability increasing proportionally to the target number. This response pattern is observed for items of all kinds, despite variation in object features (such as color or shape), suggesting that our visual number sense relies on abstract representations of numerosity. Here, we investigated whether generative Artificial Intelligence (AI) models based on large-scale transformer architectures can reliably name the number of objects in simple visual stimuli or generate images containing a target number of items in the 1-10 range. Surprisingly, none of the foundation models considered performed in a human-like way: They all made striking errors even with small numbers, the response variability often did not increase in a systematic way, and the pattern of errors varied with object category. Our findings demonstrate that advanced AI systems still lack a basic ability that supports an intuitive understanding of numbers, which in humans is foundational for numeracy and mathematical development.


Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

arXiv.org Machine Learning

Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation. We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases like education, healthcare, banking, and human resources, spanning different data modalities such as tabular, time-series, vision, and natural language. This holistic assessment is essential for compliance with regulatory safeguards. We introduce a trustworthiness index to rank synthetic datasets based on their safeguards trade-offs. Furthermore, we present a trustworthiness-driven model selection and cross-validation process during training, exemplified with "TrustFormers" across various data types. This approach allows for controllable trustworthiness trade-offs in synthetic data creation. Our auditing framework fosters collaboration among stakeholders, including data scientists, governance experts, internal reviewers, external certifiers, and regulators. This transparent reporting should become a standard practice to prevent bias, discrimination, and privacy violations, ensuring compliance with policies and providing accountability, safety, and performance guarantees.


California wants to reduce traffic. The Newsom administration thinks AI can help

Los Angeles Times

Being stuck in traffic is a familiar problem for many Californians, but state officials want to harness the power of artificial intelligence to discover new solutions. The California Department of Transportation, teaming up with other state agencies, is asking technology companies by Jan. 25 to propose generative AI tools that could help California reduce traffic and make roads safer, especially for pedestrians, cyclists and scooter riders. Generative AI tools such as ChatGPT can quickly produce text, images and other content, but the technology can also help workers brainstorm ideas. The request shows how California is trying to tap into AI to improve government services at a time when lawmakers seek to safeguard against the technology's potential risks. California politicians set the stage for more AI regulation in 2024, but they'll also face challenges as they try to place more guardrails around AI's impact on jobs, safety and discrimination.


'Impossible' to create AI tools like ChatGPT without copyrighted material, OpenAI says

The Guardian

Last month, the New York Times sued OpenAI and Microsoft, which is a leading investor in OpenAI and uses its tools in its products, accusing them of "unlawful use" of its work to create their products. Responding to the NYT lawsuit last month, OpenAI had said it respected "the rights of content creators and owners". The NYT lawsuit has followed numerous other legal complaints against OpenAI. John Grisham, Jodi Picoult and George RR Martin were among 17 authors who sued OpenAI in September alleging "systematic theft on a mass scale". Get set for the working day โ€“ we'll point you to all the business news and analysis you need every morning Elsewhere in its House of Lords submission, in response to a question about AI safety, OpenAI said it supported independent analysis of its security measures.


Staying One Step Ahead of Hackers When It Comes to AI

WIRED

If you've been creeping around underground tech forums lately, you might have seen advertisements for a new program called WormGPT. The program is an AI-powered tool for cybercriminals to automate the creation of personalized phishing emails; although it sounds a bit like ChatGPT, WormGPT is not your friendly neighborhood AI. ChatGPT launched in November 2022 and, since then, generative AI has taken the world by storm. But few consider how its sudden rise will shape the future of cybersecurity. In 2024, generative AI is poised to facilitate new kinds of transnational--and translingual--cybercrime.


Synthetic Data Is a Dangerous Teacher

WIRED

In April 2022, when Dall-E, a text-to-image visio-linguistic model, was released, it purportedly attracted over a million users within the first three months. This was followed by ChatGPT, in January 2023, which apparently reached 100 million monthly active users just two months after launch. Both mark notable moments in the development of generative AI, which in turn has brought forth an explosion of AI-generated content into the web. The bad news is that, in 2024, this means we will also see an explosion of fabricated, nonsensical information, mis- and disinformation, and the exacerbation of social negative stereotypes encoded in these AI models. The AI revolution wasn't spurred by any recent theoretical breakthrough--indeed, most of the foundational work underlying artificial neural networks has been around for decades--but by the "availability" of massive data sets.


AI for everything: 10 Breakthrough Technologies 2024

MIT Technology Review

Microsoft and Google have since moved beyond search to put chatbot-based assistants into the hands of billions of people via their office software. The tech promises to summarize emails and meetings; draft reports and replies; generate whole slide decks--titles, bullet points, and pictures--in seconds. Microsoft and Meta released image-making models that let users generate shareable images of anything with a click. Google's new phones now use AI to let you edit photos to a degree never seen before, exchanging sad faces for happy ones and overcast afternoons for perfect sunsets. Never has such radical new technology gone from experimental prototype to consumer product so fast and at such scale.


Catalyzing Equity in STEM Teams: Harnessing Generative AI for Inclusion and Diversity

arXiv.org Artificial Intelligence

Yiwen Lin, University of California, Irvine Lauren Snow, University of California, Irvine Acknowledgments: This work was partially supported by the National Science Foundation (Grant Number 1535300), and National Institutes of Health (Grant Number 5UC2NS128361-02). Abstract Collaboration is key to STEM, where multidisciplinary team research can solve complex problems. However, inequality in STEM fields hinders their full potential, due to persistent psychological barriers in underrepresented students' experience. This paper documents teamwork in STEM and explores the transformative potential of computational modeling and generative AI in promoting STEM-team diversity and inclusion. Leveraging generative AI, this paper outlines two primary areas for advancing diversity, equity, and inclusion. First, formalizing collaboration assessment with inclusive analytics can capture fine-grained learner behavior. Second, adaptive, personalized AI systems can support diversity and inclusion in STEM teams. Four policy recommendations highlight AI's capacity: formalized collaborative skill assessment, inclusive analytics, funding for socio-cognitive research, human-AI teaming for inclusion training.


AI and Generative AI for Research Discovery and Summarization

arXiv.org Artificial Intelligence

AI and generative AI tools, including chatbots like ChatGPT that rely on large language models (LLMs), have burst onto the scene this year, creating incredible opportunities to increase work productivity and improve our lives. Statisticians and data scientists have begun experiencing the benefits from the availability of these tools in numerous ways, such as the generation of programming code from text prompts to analyze data or fit statistical models. One area that these tools can make a substantial impact is in research discovery and summarization. Standalone tools and plugins to chatbots are being developed that allow researchers to more quickly find relevant literature than pre-2023 search tools. Furthermore, generative AI tools have improved to the point where they can summarize and extract the key points from research articles in succinct language. Finally, chatbots based on highly parameterized LLMs can be used to simulate abductive reasoning, which provides researchers the ability to make connections among related technical topics, which can also be used for research discovery. We review the developments in AI and generative AI for research discovery and summarization, and propose directions where these types of tools are likely to head in the future that may be of interest to statistician and data scientists.