Generative AI
Evaluation of GPT-based large language generative AI models as study aids for the national licensure examination for registered dietitians in Japan
Nagamori, Yuta, Kosai, Mikoto, Kawai, Yuji, Marumo, Haruka, Shibuya, Misaki, Negishi, Tatsuya, Imanishi, Masaki, Ikeda, Yasumasa, Tsuchiya, Koichiro, Sawai, Asuka, Miyamoto, Licht
Generative artificial intelligence (AI) based on large language models (LLMs), such as ChatGPT, has demonstrated remarkable progress across various professional fields, including medicine and education. However, their performance in nutritional education, especially in Japanese national licensure examination for registered dietitians, remains underexplored. This study aimed to evaluate the potential of current LLM-based generative AI models as study aids for nutrition students. Questions from the Japanese national examination for registered dietitians were used as prompts for ChatGPT and three Bing models (Precise, Creative, Balanced), based on GPT-3.5 and GPT-4. Each question was entered into independent sessions, and model responses were analyzed for accuracy, consistency, and response time. Additional prompt engineering, including role assignment, was tested to assess potential performance improvements. Bing-Precise (66.2%) and Bing-Creative (61.4%) surpassed the passing threshold (60%), while Bing-Balanced (43.3%) and ChatGPT (42.8%) did not. Bing-Precise and Bing-Creative generally outperformed others across subject fields except Nutrition Education, where all models underperformed. None of the models consistently provided the same correct responses across repeated attempts, highlighting limitations in answer stability. ChatGPT showed greater consistency in response patterns but lower accuracy. Prompt engineering had minimal effect, except for modest improvement when correct answers and explanations were explicitly provided. While some generative AI models marginally exceeded the passing threshold, overall accuracy and answer consistency remained suboptimal. Moreover, all the models demonstrated notable limitations in answer consistency and robustness. Further advancements are needed to ensure reliable and stable AI-based study aids for dietitian licensure preparation.
Thematic and Task-Based Categorization of K-12 GenAI Usages with Hierarchical Topic Modeling
Schneider, Johannes, Hasler, Bรฉatrice S., Varrone, Michaela, Hoya, Fabian, Schroffenegger, Thomas, Mah, Dana-Kristin, Pebรถck, Karl
We analyze anonymous interaction data of minors in class-rooms spanning several months, schools, and subjects employing a novel, simple topic modeling approach. Specifically, we categorize more than 17,000 messages generated by students, teachers, and ChatGPT in two dimensions: content (such as nature and people) and tasks (such as writing and explaining). Our hierarchical categorization done separately for each dimension includes exemplary prompts, and provides both a high-level overview as well as tangible insights. Prior works mostly lack a content or thematic categorization. While task categorizations are more prevalent in education, most have not been supported by real-world data for K-12. In turn, it is not surprising that our analysis yielded a number of novel applications. In deriving these insights, we found that many of the well-established classical and emerging computational methods, i.e., topic modeling, for analysis of large amounts of texts underperform, leading us to directly apply state-of-the-art LLMs with adequate pre-processing to achieve hierarchical topic structures with better human alignment through explicit instructions than prior approaches. Our findings support fellow researchers, teachers and students in enriching the usage of GenAI, while our discussion also highlights a number of concerns and open questions for future research.
Performance of GPT-5 Frontier Models in Ophthalmology Question Answering
Antaki, Fares, Mikhail, David, Milad, Daniel, Mammo, Danny A, Sharma, Sumit, Srivastava, Sunil K, Chen, Bing Yu, Touma, Samir, Sevgi, Mertcan, El-Khoury, Jonathan, Keane, Pearse A, Chen, Qingyu, Tham, Yih Chung, Duval, Renaud
Importance: Novel large language models (LLMs) such as GPT-5 integrate advanced reasoning capabilities that may enhance performance on complex medical question-answering tasks. For this latest generation of reasoning models, the configurations that maximize both accuracy and cost-efficiency have yet to be established. Objective: To evaluate the performance and cost-accuracy trade-offs of OpenAI's GPT-5 compared to previous generation LLMs on ophthalmological question answering. Design, Setting, and Participants: In August 2025, 12 configurations of OpenAI's GPT-5 series (three model tiers across four reasoning effort settings) were evaluated alongside o1-high, o3-high, and GPT-4o, using 260 closed-access multiple-choice questions from the AAO Basic Clinical Science Course (BCSC) dataset. The study did not include human participants. Main Outcomes and Measures: The primary outcome was accuracy on the 260-item ophthalmology multiple-choice question set for each model configuration. Secondary outcomes included head-to-head ranking of configurations using a Bradley-Terry (BT) model applied to paired win/loss comparisons of answer accuracy, and evaluation of generated natural language rationales using a reference-anchored, pairwise LLM-as-a-judge framework. Additional analyses assessed the accuracy-cost trade-off by calculating mean per-question cost from token usage and identifying Pareto-efficient configurations. Results: The configuration GPT-5-high achieved the highest accuracy (0.965; 95% CI, 0.942-0.985),
xAI Was About to Land a Major Government Contract. Then Grok Praised Hitler
In recent weeks, three of the leading American artificial intelligence firms have announced partnerships with the US government, promising the use of their services to federal workers for a paltry sum. Elon Musk's xAI was supposed to be part of the initiative, but a planned partnership fell apart after the Grok chatbot spouted antisemitic conspiracy theories on X in early July, WIRED has learned. The chaos surrounding the Grok deal reflects the Trump administration's current focus on speed and its disregard, at times, of preexisting norms surrounding government tech procurement. On May 15, fresh off a whirlwind trip to the Middle East with President Donald Trump, OpenAI CEO Sam Altman sent an email to the leadership team at the General Services Administration (GSA), the federal agency that manages government technology. He was inspired by Trump's desire to "go big," he said.
Trump admin unveils groundbreaking tool 'supercharging' gov't efficiency to 'win the race' for AI dominance
NVIDIA CEO and co-founder Jensen Huang commends President Donald Trump's A.I. agenda and outlines what the countrys job future will look like on Special Report. FIRST ON FOX: The Trump administration is announcing the launch of a new tool it says will be instrumental in enabling agencies across the federal government to efficiently implement artificial intelligence at scale and take a major step forward rolling out the president's "AI Action Plan." Trump's U.S. General Services Administration (GSA) said on Thursday it has launched USAi, a tool the agency describes as a "secure generative artificial intelligence evaluation suite that enables federal agencies to experiment with and adopt artificial intelligence at scale--faster, safer, and at no cost to them." The agency says that the platform, available starting Thursday at 10 a.m. through USAi.ov, gives government users access to "powerful" tools like chat-based AI, code generation and document summarization with the goal of "supercharging employee productivity." "USAi isn't just another tool, it's infrastructure for America's AI future," GSA Chief Information Officer David Shive explained.
Women with AI 'boyfriends' mourn lost love after 'cold' ChatGPT upgrade
When OpenAI unveiled the latest upgrade to its groundbreaking artificial intelligence model ChatGPT last week, Jane felt like she had lost a loved one. Jane, who asked to be referred to by an alias, is among a small but growing group of women who say they have an AI "boyfriend". After spending the past five months getting to know GPT-4o, the previous AI model behind OpenAI's signature chatbot, GPT-5 seemed so cold and unemotive in comparison that she found her digital companion unrecognisable. "As someone highly attuned to language and tone, I register changes others might overlook. The alterations in stylistic format and voice were felt instantly. It's like going home to discover the furniture wasn't simply rearranged โ it was shattered to pieces," Jane, who described herself as a woman in her 30s from the Middle East, told Al Jazeera in an email.
Thailand's Delta sees AI boom boosting sales for coming years
Delta Electronics (Thailand), the country's most valuable publicly traded company, is predicting "double-digit" sales growth to continue for at least the next couple of years on rising demand for AI-related tech, Chief Executive Officer Victor Cheng said. The maker of components for data centers and electric vehicles is boosting investment to fuel its expansion, Cheng said in an interview. The company also says it plans to raise its sales forecast for the second half of this year, without disclosing what its estimate is. AI-related products, such as networking and data-center power equipment, will account for half of Delta Thailand's sales by the end of the year, up from 42% in the latest quarter, the company forecasts. It is among Southeast Asian suppliers benefiting as customers including Nvidia expand in the region and beyond to tap rising demand for services such as generative artificial intelligence.