Goto

Collaborating Authors

 Generative AI


Fine-tuning ChatGPT for Automatic Scoring

arXiv.org Artificial Intelligence

This study highlights the potential of fine-tuned ChatGPT (GPT-3.5) for automatically scoring student written constructed responses using example assessment tasks in science education. Recent studies on OpenAI's generative model GPT-3.5 proved its superiority in predicting the natural language with high accuracy and human-like responses. GPT-3.5 has been trained over enormous online language materials such as journals and Wikipedia; therefore, more than direct usage of pre-trained GPT-3.5 is required for automatic scoring as students utilize a different language than trained material. These imply that a domain-specific model, fine-tuned over data for specific tasks, can enhance model performance. In this study, we fine-tuned GPT-3.5 on six assessment tasks with a diverse dataset of middle-school and high-school student responses and expert scoring. The six tasks comprise two multi-label and four multi-class assessment tasks. We compare the performance of fine-tuned GPT-3.5 with the fine-tuned state-of-the-art Google's generated language model, BERT. The results show that in-domain training corpora constructed from science questions and responses for BERT achieved average accuracy = 0.838, SD = 0.069. GPT-3.5 shows a remarkable average increase (9.1%) in automatic scoring accuracy (mean = 9.15, SD = 0.042) for the six tasks, p =0.001 < 0.05. Specifically, for multi-label tasks (item 1 with 5 labels; item 2 with 10 labels), GPT-3.5 achieved significantly higher scoring accuracy than BERT across all the labels, with the second item achieving a 7.1% increase. The average scoring increase for the four multi-class items for GPT-3.5 was 10.6% compared to BERT. Our study confirmed the effectiveness of fine-tuned GPT-3.5 for automatic scoring of student responses on domain-specific data in education with high accuracy. We have released fine-tuned models for public use and community engagement.


AI Is Telling Bedtime Stories to Your Kids Now

WIRED

The problem with Bluey is there's not enough of it. Even with 151 seven-minute-long episodes of the popular children's animated show out there, parents of toddlers still desperately wait for Australia's Ludo Studio to release another season. The only way to get more Bluey more quickly is if they create their own stories starring the Brisbane-based family of blue heeler dogs. The London-based developer and father used OpenAI's latest tool, customizable bots called GPTs, to create a story generator for his young daughter. The bot, which he calls Bluey-GPT, begins each session by asking people their name, age, and a bit about their day, then churns out personalized tales starring Bluey and her sister Bingo.



An In-depth Look at Gemini's Language Abilities

arXiv.org Artificial Intelligence

The recently released Google Gemini class of models are the first to comprehensively report results that rival the OpenAI GPT series across a wide variety of tasks. In this paper, we do an in-depth exploration of Gemini's language abilities, making two contributions. First, we provide a third-party, objective comparison of the abilities of the OpenAI GPT and Google Gemini models with reproducible code and fully transparent results. Second, we take a closer look at the results, identifying areas where one of the two model classes excels. We perform this analysis over 10 datasets testing a variety of language abilities, including reasoning, answering knowledge-based questions, solving math problems, translating between languages, generating code, and acting as instruction-following agents. From this analysis, we find that Gemini Pro achieves accuracy that is close but slightly inferior to the corresponding GPT 3.5 Turbo on all English-language tasks that we benchmarked, but find that Gemini Pro excels in translation into other languages for the languages that it supports. We further provide explanations for some of the under-performing tasks, including failures in mathematical reasoning with many digits, sensitivity to multiple-choice answer ordering, and others. We also identify areas where Gemini Pro demonstrates comparably high performance, such as handling longer and more complex reasoning chains.


Elon Musk promised an anti-'woke' chatbot. It's not going as planned.

Washington Post - Technology News

Artificial intelligence systems of all kinds are prone to biases ingrained in their design or the data they've learned from. In the past year, the rise of OpenAI's ChatGPT and other AI chatbots and image generators has sparked debate over how they represent minority groups or respond to prompts about politics and culture-war issues such as race and gender identity. While many tech ethicists and AI experts warn that these systems can absorb and reinforce harmful stereotypes, efforts by tech firms to counter those tendencies have provoked a backlash from some on the right who see them as overly censorial.


"King of the cannibals": How Sam Altman took over Silicon Valley

Washington Post - Technology News

He and Elon Musk, the founder of Tesla and owner of what used to be Twitter, created OpenAI as a nonprofit with the aim of warning and protecting the world against a technology Musk believed could wipe out humanity by accident. Altman appeared to agree: "Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity," he wrote on his personal blog before the company's launch in 2015, adding that it "does not have to be the inherently evil sci-fi version to kill us all." But the technology's promise was too brilliant to pass up. It just needed the right regulation, and he wanted to set up a global governing board to erect boundaries for the tool's use.


OpenAI founder Sam Altman's sprawling network of investments

Washington Post - Technology News

The self-driving car company went through Y Combinator when Altman worked there, and he made a personal investment in 2015. The next year, General Motors acquired the start-up. Cruise is now one of the most prominent self-driving car companies, and it was the first to provide a driverless ride-hailing service in San Francisco. But the company is now in crisis. In October, a human driver hit a pedestrian, flinging her into the path of a Cruise car, which then rolled over the person and dragged her for 20 feet. California authorities accused Cruise of trying to cover up the details of the accident.


Apple is reportedly looking to team up with news publishers to train its AI

Engadget

Apple has been noticeably missing in the list of companies with their own generative AI product, but based on a new report by The New York Times, it's looking to change that real soon. In recent weeks, Apple has reportedly started negotiating with major publishers and news organizations to ask for permission to use their content to train the generative AI system it's developing. The company doesn't expect to get its hands on their content for free, though, and The Times says it's offering them multi-year deals worth at least $50 million for access to their news archives. Apparently, some of the publishers it approached are concerned about the repercussions of letting Apple use their news articles throughout the years. They think a broad licensing deal for their archives could lead to legal issues along the way.


A Survey on Generative Diffusion Model

arXiv.org Artificial Intelligence

Deep generative models have unlocked another profound realm of human creativity. By capturing and generalizing patterns within data, we have entered the epoch of all-encompassing Artificial Intelligence for General Creativity (AIGC). Notably, diffusion models, recognized as one of the paramount generative models, materialize human ideation into tangible instances across diverse domains, encompassing imagery, text, speech, biology, and healthcare. To provide advanced and comprehensive insights into diffusion, this survey comprehensively elucidates its developmental trajectory and future directions from three distinct angles: the fundamental formulation of diffusion, algorithmic enhancements, and the manifold applications of diffusion. Each layer is meticulously explored to offer a profound comprehension of its evolution. Structured and summarized approaches are presented in https://github.com/chq1155/A-Survey-on-Generative-Diffusion-Model.


Dual Use Concerns of Generative AI and Large Language Models

arXiv.org Artificial Intelligence

Gif-sur-Yvette 91191 Abstract We suggest the implementation of the Dual Use Research of Concern (DURC) framework, originally designed for life sciences, to the domain of generative AI, with a specific focus on Large Language Models (LLMs). With its demonstrated advantages and drawbacks in biological research, we believe the DURC criteria can be effectively redefined for LLMs, potentially contributing to improved AI governance. Acknowledging the balance that must be struck when employing the DURC framework, we highlight its crucial political role in enhancing societal awareness of the impact of generative AI. As a final point, we offer a series of specific recommendations for applying the DURC approach to LLM research. Keywords: Dual Use Research of Concern (DURC), Generative AI, Large Language Models (LLMs), AI Ethics Conflict of interest No conflict of interest to report. Funding This research was supported through projects TechEthos (grant number 101006249) and MultiRATE (grant number 101073929) funded by the European Commission Horizon program. Ethics approval No human subjects were involved in the study. Consent No data needing consent has been used. Data availability statement In this article, we do not analyze or generate any datasets. Author Contribution All authors contributed to the study conception and design. Sections 1 and 4 were written with equal contribution. Sections 2 and 3 were conceived by Adomaitis and later edited by Grinbaum.