Goto

Collaborating Authors

 Large Language Model


ChatGPT maker OpenAI faces class action over how it used people's data

Washington Post - Technology News

The lawsuit goes to the heart of a major unresolved question hanging over the surge in "generative" AI tools such as chatbots and image generators. The technology works by ingesting billions of words from the open internet and learning to build inferences between them. After consuming enough data, the resulting "large language models" can predict what to say in response to a prompt, giving them the ability to write poetry, have complex conversations and pass professional exams. But the humans who wrote those billions of words never signed off on having a company such as OpenAI use them for its own profit.


Humans may be more likely to believe disinformation generated by AI

MIT Technology Review

That credibility gap, while small, is concerning given that the problem of AI-generated disinformation seems poised to grow significantly, says Giovanni Spitale, the researcher at the University of Zurich who led the study, which appeared in Science Advances today. "The fact that AI-generated disinformation is not only cheaper and faster, but also more effective, gives me nightmares," he says. He believes that if the team repeated the study with the latest large language model from OpenAI, GPT-4, the difference would be even bigger, given how much more powerful GPT-4 is. To test our susceptibility to different types of text, the researchers chose common disinformation topics, including climate change and covid. Then they asked OpenAI's large language model GPT-3 to generate 10 true tweets and 10 false ones, and collected a random sample of both true and false tweets from Twitter.


AI is filling up the internet with garbage spam sites

PCWorld

A new wave of artificial intelligence tools like ChatGPT and Google Bard may or may not change the way humans interact with technology forever. But before it does that, it's going to make the internet even more annoying. According to a new report, AI is being used to generate a huge amount of websites filled with random, garbage strings of text targeted at search engines, then plastered with advertising to generate revenue. NewsGuard reports that AI text generation tools are being combined with software that auto-generates new sites, creating masses of domains filled with a huge amount of text. The sites are then filled with programmatic advertising slots, which serve up real ads over the fake content.


How to Tackle AI--and Cheating--in the Classroom

WIRED

This past spring, as I closed out my 18th year of teaching, I felt anxiety that I'd never before felt at the end of a school year. By the time grades are submitted and signs of summer arrive, teachers are typically able to breathe for the first time in nine months. Instead of the relaxation, joy, and accomplishment that typically awaits the end of an academic year, I was consumed with worry that this might be the last time in a nearly two-decade career that I taught a class without having to worry about AI. I get itโ€“AI has technically been around forever, and natural language processing tools such as OpenAI's ChatGPT are built on decades of research. Anyone who has used spellcheck or language translation apps or heard a spoken text message has used language processing tools driven by AI technology.


China lures billionaires into race to catch U.S. in AI

The Japan Times

China's tech sector has a new obsession: competing with U.S. titans like Google and Microsoft in the breakneck global artificial intelligence race. Billionaire entrepreneurs, midlevel engineers and veterans of foreign firms alike now harbor a remarkably consistent ambition: to outdo China's geopolitical rival in a technology that may determine the global power stakes. Among them is internet mogul Wang Xiaochuan, who entered the field after OpenAI's ChatGPT debuted to a social media firestorm in November. He joins the ranks of Chinese scientists, programmers and financiers -- including former employees of ByteDance, e-commerce platform JD.com and Google -- expected to propel some $15 billion of spending on AI technology this year. For Wang, who founded the search engine Sogou that Tencent bought out in a $3.5 billion deal less than two years ago, the opportunity came fast.


Rethinking Model Evaluation as Narrowing the Socio-Technical Gap

arXiv.org Artificial Intelligence

The recent development of generative and large language models (LLMs) poses new challenges for model evaluation that the research community and industry are grappling with. While the versatile capabilities of these models ignite excitement, they also inevitably make a leap toward homogenization: powering a wide range of applications with a single, often referred to as ``general-purpose'', model. In this position paper, we argue that model evaluation practices must take on a critical task to cope with the challenges and responsibilities brought by this homogenization: providing valid assessments for whether and how much human needs in downstream use cases can be satisfied by the given model (socio-technical gap). By drawing on lessons from the social sciences, human-computer interaction (HCI), and the interdisciplinary field of explainable AI (XAI), we urge the community to develop evaluation methods based on real-world socio-requirements and embrace diverse evaluation methods with an acknowledgment of trade-offs between realism to socio-requirements and pragmatic costs to conduct the evaluation. By mapping HCI and current NLG evaluation methods, we identify opportunities for evaluation methods for LLMs to narrow the socio-technical gap and pose open questions.


VisText: A Benchmark for Semantically Rich Chart Captioning

arXiv.org Artificial Intelligence

Captions that describe or explain charts help improve recall and comprehension of the depicted data and provide a more accessible medium for people with visual disabilities. However, current approaches for automatically generating such captions struggle to articulate the perceptual or cognitive features that are the hallmark of charts (e.g., complex trends and patterns). In response, we introduce VisText: a dataset of 12,441 pairs of charts and captions that describe the charts' construction, report key statistics, and identify perceptual and cognitive phenomena. In VisText, a chart is available as three representations: a rasterized image, a backing data table, and a scene graph -- a hierarchical representation of a chart's visual elements akin to a web page's Document Object Model (DOM). To evaluate the impact of VisText, we fine-tune state-of-the-art language models on our chart captioning task and apply prefix-tuning to produce captions that vary the semantic content they convey. Our models generate coherent, semantically rich captions and perform on par with state-of-the-art chart captioning models across machine translation and text generation metrics. Through qualitative analysis, we identify six broad categories of errors that our models make that can inform future work.


ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles

arXiv.org Artificial Intelligence

Automatically generating textual content with desired attributes is an ambitious task that people have pursued long. Existing works have made a series of progress in incorporating unimodal controls into language models (LMs), whereas how to generate controllable sentences with multimodal signals and high efficiency remains an open question. To tackle the puzzle, we propose a new paradigm of zero-shot controllable text generation with multimodal signals (\textsc{ZeroGen}). Specifically, \textsc{ZeroGen} leverages controls of text and image successively from token-level to sentence-level and maps them into a unified probability space at decoding, which customizes the LM outputs by weighted addition without extra training. To achieve better inter-modal trade-offs, we further introduce an effective dynamic weighting mechanism to regulate all control weights. Moreover, we conduct substantial experiments to probe the relationship of being in-depth or in-width between signals from distinct modalities. Encouraging empirical results on three downstream tasks show that \textsc{ZeroGen} not only outperforms its counterparts on captioning tasks by a large margin but also shows great potential in multimodal news generation with a higher degree of control. Our code will be released at https://github.com/ImKeTT/ZeroGen.


Evaluating ChatGPT's Decimal Skills and Feedback Generation in a Digital Learning Game

arXiv.org Artificial Intelligence

While open-ended self-explanations have been shown to promote robust learning in multiple studies, they pose significant challenges to automated grading and feedback in technology-enhanced learning, due to the unconstrained nature of the students' input. Our work investigates whether recent advances in Large Language Models, and in particular ChatGPT, can address this issue. Using decimal exercises and student data from a prior study of the learning game Decimal Point, with more than 5,000 open-ended self-explanation responses, we investigate ChatGPT's capability in (1) solving the in-game exercises, (2) determining the correctness of students' answers, and (3) providing meaningful feedback to incorrect answers. Our results showed that ChatGPT can respond well to conceptual questions, but struggled with decimal place values and number line problems. In addition, it was able to accurately assess the correctness of 75% of the students' answers and generated generally high-quality feedback, similar to human instructors. We conclude with a discussion of ChatGPT's strengths and weaknesses and suggest several venues for extending its use cases in digital teaching and learning.


A negation detection assessment of GPTs: analysis with the xNot360 dataset

arXiv.org Artificial Intelligence

Negation is a fundamental aspect of natural language, playing a critical role in communication and comprehension. Our study assesses the negation detection performance of Generative Pre-trained Transformer (GPT) models, specifically GPT-2, GPT-3, GPT-3.5, and GPT-4. We focus on the identification of negation in natural language using a zero-shot prediction approach applied to our custom xNot360 dataset. Our approach examines sentence pairs labeled to indicate whether the second sentence negates the first. Our findings expose a considerable performance disparity among the GPT models, with GPT-4 surpassing its counterparts and GPT-3.5 displaying a marked performance reduction. The overall proficiency of the GPT models in negation detection remains relatively modest, indicating that this task pushes the boundaries of their natural language understanding capabilities. We not only highlight the constraints of GPT models in handling negation but also emphasize the importance of logical reliability in high-stakes domains such as healthcare, science, and law.