Large Language Model
Apple co-founder warns AI could make it harder to spot scams
Apple co-founder Steve Wozniak has warned that artificial intelligence could be used by "bad actors" and make it harder to spot scams and misinformation. Wozniak, who was one of Apple's co-founders with the late Steve Jobs and invented the company's first computer, said AI content should be clearly labelled, and called for regulation for the sector. The Silicon Valley entrepreneur was among more than 1,800 people who signed a letter in March, alongside the Tesla chief executive, Elon Musk, to call for a six-month pause in the development of powerful AI systems, arguing that they posed profound risks to humanity. Some signatories to the letter were later revealed to be fake, and others backed out on their support. Wozniak, known in the tech world as Woz, talked about the benefits of AI and the dangers.
Doctors are using AI to draft messages without telling patients
A small but growing number of people in the US are receiving messages from their doctors drafted with the help of artificial intelligence – and some may not even know it. It is the first step in a larger plan to use OpenAI's large language models – the line of technology powering chatbots such as ChatGPT – within one of the largest US electronic health records systems operated by the company Epic.
How To Delete Your Data From ChatGPT
There's a chance that ChatGPT knows personal details about you--and if it doesn't, it might just make something up. As OpenAI's generative text chatbot has boomed in popularity over the past six months, the risks of the system being trained on data vacuumed up from the web have become clearer. Data regulators around the world are investigating issues with how OpenAI gathered the data it uses to train its large language models, the accuracy of answers it provides about people, and other legal concerns about the use of its generative text systems. Europe's data regulators have joined forces to look at OpenAI after Italy temporarily banned ChatGPT from the country. And Canada is also investigating the technology's potential privacy risks.
GPTZero app seeks to thwart AI plagiarism in schools and online media
Journalists, screenwriters and college professors are among widening groups of people who are concerned about eventually losing their livelihoods to artificial intelligence programs like ChatGPT, which can produce copy faster and possibly better than humans. But one entrepreneur is pursuing technology to make it easier to distinguish between text written by people and that composed by a machine. Edward Tian, a 22-year-old Princeton University student studying computer science and journalism, developed an app called GPTZero to deter the misuse of the viral chatbot ChatGPT in classrooms. The app has racked up 1.2 million registered users since January. He's now launching a new program called Origin aimed at "saving journalism," by distinguishing AI-generated disinformation from fact in online media.
On the Impossible Safety of Large AI Models
El-Mhamdi, El-Mahdi, Farhadkhani, Sadegh, Guerraoui, Rachid, Gupta, Nirupam, Hoang, Lê-Nguyên, Pinot, Rafael, Rouault, Sébastien, Stephan, John
Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging features of many of today's machine learning settings. Namely, high accuracy seems to require memorizing large training datasets, which are often user-generated and highly heterogeneous, with both sensitive information and fake users. We then survey statistical lower bounds that, we argue, constitute a compelling case against the possibility of designing high-accuracy LAIMs with strong security guarantees.
Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM
Bawden, Rachel, Yvon, François
The NLP community recently saw the release of a new large open-access multilingual language model, BLOOM (BigScience et al., 2022) covering 46 languages. We focus on BLOOM's multilingual ability by evaluating its machine translation performance across several datasets (WMT, Flores-101 and DiaBLa) and language pairs (high- and low-resourced). Our results show that 0-shot performance suffers from overgeneration and generating in the wrong language, but this is greatly improved in the few-shot setting, with very good results for a number of language pairs. We study several aspects including prompt design, model sizes, cross-lingual transfer and the use of discursive context.
The Case Records of ChatGPT: Language Models and Complex Clinical Questions
Poterucha, Timothy, Elias, Pierre, Haggerty, Christopher M.
Background: Artificial intelligence language models have shown promise in various applications, including assisting with clinical decision-making as demonstrated by strong performance of large language models on medical licensure exams. However, their ability to solve complex, open-ended cases, which may be representative of clinical practice, remains unexplored. Methods: In this study, the accuracy of large language AI models GPT4 and GPT3.5 in diagnosing complex clinical cases was investigated using published Case Records of the Massachusetts General Hospital. A total of 50 cases requiring a diagnosis and diagnostic test published from January 1, 2022 to April 16, 2022 were identified. For each case, models were given a prompt requesting the top three specific diagnoses and associated diagnostic tests, followed by case text, labs, and figure legends. Model outputs were assessed in comparison to the final clinical diagnosis and whether the model-predicted test would result in a correct diagnosis. Results: GPT4 and GPT3.5 accurately provided the correct diagnosis in 26% and 22% of cases in one attempt, and 46% and 42% within three attempts, respectively. GPT4 and GPT3.5 provided a correct essential diagnostic test in 28% and 24% of cases in one attempt, and 44% and 50% within three attempts, respectively. No significant differences were found between the two models, and multiple trials with identical prompts using the GPT3.5 model provided similar results. Conclusions: In summary, these models demonstrate potential usefulness in generating differential diagnoses but remain limited in their ability to provide a single unifying diagnosis in complex, open-ended cases. Future research should focus on evaluating model performance in larger datasets of open-ended clinical challenges and exploring potential human-AI collaboration strategies to enhance clinical decision-making.
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Chen, Lingjiao, Zaharia, Matei, Zou, James
There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation, 2) LLM approximation, and 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently.
COLA: Contextualized Commonsense Causal Reasoning from the Causal Inference Perspective
Wang, Zhaowei, Do, Quyet V., Zhang, Hongming, Zhang, Jiayao, Wang, Weiqi, Fang, Tianqing, Song, Yangqiu, Wong, Ginny Y., See, Simon
Detecting commonsense causal relations (causation) between events has long been an essential yet challenging task. Given that events are complicated, an event may have different causes under various contexts. Thus, exploiting context plays an essential role in detecting causal relations. Meanwhile, previous works about commonsense causation only consider two events and ignore their context, simplifying the task formulation. This paper proposes a new task to detect commonsense causation between two events in an event sequence (i.e., context), called contextualized commonsense causal reasoning. We also design a zero-shot framework: COLA (Contextualized Commonsense Causality Reasoner) to solve the task from the causal inference perspective. This framework obtains rich incidental supervision from temporality and balances covariates from multiple timestamps to remove confounding effects. Our extensive experiments show that COLA can detect commonsense causality more accurately than baselines.
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
Wen, Congcong, Hu, Yuan, Li, Xiang, Yuan, Zhenghang, Zhu, Xiao Xiang
The remarkable achievements of ChatGPT and GPT-4 have sparked a wave of interest and research in the field of large language models for Artificial General Intelligence (AGI). These models provide us with intelligent solutions that are more similar to human thinking, enabling us to use general artificial intelligence to solve problems in various applications. However, in the field of remote sensing, the scientific literature on the implementation of AGI remains relatively scant. Existing AI-related research primarily focuses on visual understanding tasks while neglecting the semantic understanding of the objects and their relationships. This is where vision-language models excel, as they enable reasoning about images and their associated textual descriptions, allowing for a deeper understanding of the underlying semantics. Vision-language models can go beyond recognizing the objects in an image and can infer the relationships between them, as well as generate natural language descriptions of the image. This makes them better suited for tasks that require both visual and textual understanding, such as image captioning, text-based image retrieval, and visual question answering. This paper provides a comprehensive review of the research on vision-language models in remote sensing, summarizing the latest progress, highlighting the current challenges, and identifying potential research opportunities. Specifically, we review the application of vision-language models in several mainstream remote sensing tasks, including image captioning, text-based image generation, text-based image retrieval, visual question answering, scene classification, semantic segmentation, and object detection. For each task, we briefly describe the task background and review some representative works. Finally, we summarize the limitations of existing work and provide some possible directions for future development.