Goto

Collaborating Authors

 Large Language Model


ChatGPT's web browser was too good, so its creators blocked it

PCWorld

It's a rare day when a software developer blocks a feature for being too good, but that's exactly what's happened to OpenAI's ChatGPT AI chatbot –its abilities to browse the web with Bing were simply too effective at dodging paywalls. Recall that in March, OpenAI added support for the Bing browser to ChatGPT, specifically to give it knowledge of current events. Until then (and right now) the AI algorithm had only been trained up through the fall of 2021. If you asked it for the result of a recent sporting event, for example, ChatGPT would plead ignorance. When Bing support was added, ChatGPT could then ferret out those answers, providing answers that were either current or just a day or so old. But OpenAI discovered that Bing was too good at its job -- it was circumventing paywalls to provide the answers that users asked for.


Authors file a lawsuit against OpenAI for unlawfully 'ingesting' their books

The Guardian

Mona Awad, whose books include Bunny and 13 Ways of Looking at a Fat Girl, and Paul Tremblay, author of The Cabin at the End of the World, filed the class action complaint to a San Francisco federal court last week. ChatGPT allows users to ask questions and type commands into a chatbot and responds with text that resembles human language patterns. The model underlying ChatGPT is trained with data that is publicly available on the internet. Sample summaries are included in the lawsuit as exhibits. The lawsuit will explore the uncertain "borders of the legality" of actions within the generative AI space, he adds.


5 Uses for ChatGPT that Aren't Fan Fiction or Cheating at School

WIRED

AI is so powerful that it will inevitably destroy the world--at least, that's what the people who sell AI software keep saying, and I can't think of any reason why they might lie about how amazing they are. Still, I can't help but wonder: What is AI useful for right now, before it ends civilization? I've done some experimenting and talked to my friends on LinkedIn and Mastodon. Here's the best use cases I could personally find. I spend hours crafting an article but most people will only ever see the few words I choose to put at the top.


As AI cheating booms, so does the industry detecting it: 'We couldn't keep up with demand'

The Guardian

Since its release last November, ChatGPT has shaken the education world. The chatbot and other sophisticated AI tools are reportedly being used everywhere from college essays to high school art projects. This is a problem for schools, educators and students – but a boon for a small but growing cohort of companies in the AI-detection business. Players like Winston AI, Content at Scale and Turnitin are billing for their ability to detect AI-involvement in student work, offering subscription services where teachers can run their students' work through a web dashboard and receive a probability score that grades how "human" or "AI" the text is. At this stage, most clients are teachers acting on their own initiative, although Winston AI says it is beginning talks with school administrators at the district level as the problem grows. And with only one full academic semester since ChatGPT was released, the disruption and headaches are only beginning.


AI Could Change How Blind People See the World

WIRED

For her 38th birthday, Chela Robles and her family made a trek to One House, her favorite bakery in Benicia, California, for a brisket sandwich and brownies. On the car ride home, she tapped a small touchscreen on her temple and asked for a description of the world outside. "A cloudy sky," the response came back through her Google Glass. Robles lost the ability to see in her left eye when she was 28, and in her right eye a year later. Blindness, she says, denies you small details that help people connect with one another, like facial cues and expressions.


How elite schools like Stanford became fixated on the AI apocalypse

Washington Post - Technology News

To prevent this theoretical but cataclysmic outcome, mission-driven labs like DeepMind, OpenAI and Anthropic are racing to build a good kind of AI programmed not to lie, deceive or kill us. Meanwhile, donors such as Tesla CEO Elon Musk, disgraced FTX founder Sam Bankman-Fried, Skype founder Jaan Tallinn and ethereum co-founder Vitalik Buterin -- as well as institutions like Open Philanthropy, a charitable organization started by billionaire Facebook co-founder Dustin Moskovitz -- have worked to push doomsayers from the tech industry's margins into the mainstream.


Building Cooperative Embodied Agents Modularly with Large Language Models

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated impressive planning abilities in single-agent embodied tasks across various domains. However, their capacity for planning and communication in multi-agent cooperation remains unclear, even though these are crucial skills for intelligent embodied agents. In this paper, we present a novel framework that utilizes LLMs for multi-agent cooperation and tests it in various embodied environments. Our framework enables embodied agents to plan, communicate, and cooperate with other embodied agents or humans to accomplish long-horizon tasks efficiently. We demonstrate that recent LLMs, such as GPT-4, can surpass strong planning-based methods and exhibit emergent effective communication using our framework without requiring fine-tuning or few-shot prompting. We also discover that LLM-based agents that communicate in natural language can earn more trust and cooperate more effectively with humans. Our research underscores the potential of LLMs for embodied AI and lays the foundation for future research in multi-agent cooperation. Videos can be found on the project website https://vis-www.cs.umass.edu/Co-LLM-Agents/.


Comparative Analysis of GPT-4 and Human Graders in Evaluating Praise Given to Students in Synthetic Dialogues

arXiv.org Artificial Intelligence

Research suggests that providing specific and timely feedback to human tutors enhances their performance. However, it presents challenges due to the time-consuming nature of assessing tutor performance by human evaluators. Large language models, such as the AI-chatbot ChatGPT, hold potential for offering constructive feedback to tutors in practical settings. Nevertheless, the accuracy of AI-generated feedback remains uncertain, with scant research investigating the ability of models like ChatGPT to deliver effective feedback. In this work-in-progress, we evaluate 30 dialogues generated by GPT-4 in a tutor-student setting. We use two different prompting approaches, the zero-shot chain of thought and the few-shot chain of thought, to identify specific components of effective praise based on five criteria. These approaches are then compared to the results of human graders for accuracy. Our goal is to assess the extent to which GPT-4 can accurately identify each praise criterion. We found that both zero-shot and few-shot chain of thought approaches yield comparable results. GPT-4 performs moderately well in identifying instances when the tutor offers specific and immediate praise. However, GPT-4 underperforms in identifying the tutor's ability to deliver sincere praise, particularly in the zero-shot prompting scenario where examples of sincere tutor praise statements were not provided. Future work will focus on enhancing prompt engineering, developing a more general tutoring rubric, and evaluating our method using real-life tutoring dialogues.


Improving Automatic Parallel Training via Balanced Memory Workload Optimization

arXiv.org Artificial Intelligence

Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts to design distributed training plans or limit parallelism combinations to a constrained search space. In this paper, we present Galvatron-BMW, a novel system framework that integrates multiple prevalent parallelism dimensions and automatically identifies the most efficient hybrid parallelism strategy. To effectively navigate this vast search space, we employ a decision tree approach for decomposition and pruning based on intuitive insights. We further utilize a dynamic programming search algorithm to derive the optimal plan. Moreover, to improve resource utilization and enhance system efficiency, we propose a bi-objective optimization workflow that focuses on workload balance. Our evaluations on different Transformer models demonstrate the capabilities of Galvatron-BMW in automating distributed training under varying GPU memory constraints. Across all tested scenarios, Galvatron-BMW consistently achieves superior system throughput, surpassing previous approaches that rely on limited parallelism strategies.


LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

arXiv.org Artificial Intelligence

In recent years, there has been significant progress in developing pre-trained language models for NLP. However, these models often struggle when fine-tuned on small datasets. To address this issue, researchers have proposed various adaptation approaches. Prompt-based tuning is arguably the most common way, especially for larger models. Previous research shows that adding contrastive learning to prompt-based fine-tuning is effective as it helps the model generate embeddings that are more distinguishable between classes, and it can also be more sample-efficient as the model learns from positive and negative examples simultaneously. One of the most important components of contrastive learning is data augmentation, but unlike computer vision, effective data augmentation for NLP is still challenging. This paper proposes LM-CPPF, Contrastive Paraphrasing-guided Prompt-based Fine-tuning of Language Models, which leverages prompt-based few-shot paraphrasing using generative language models, especially large language models such as GPT-3 and OPT-175B, for data augmentation. Our experiments on multiple text classification benchmarks show that this augmentation method outperforms other methods, such as easy data augmentation, back translation, and multiple templates.