Generative AI
Artificial Intelligence health advice accuracy varies across languages and contexts
Garg, Prashant, Fetzer, Thiemo
Using basic health statements authorized by UK and EU registers and ~9,100 journalist - vetted public - health assertions on topics such as abortion, COVID - 19 and politics from sources ranging from peer - reviewed journals and government advisories to social med ia and news across the political spectrum, we benchmark six leading large language models from in 21 languages, finding that -- despite high accuracy on English - centric textbook claims -- performance falls in multiple non - European languages and fluctuates by top ic and source, highlighting the urgency of comprehensive multilingual, domain - aware validation before deploying AI in global health communication. Main Text: Recent evidence suggests that 17 % of U.S. adults -- and a striking 25 % of those aged 18 - 29 -- now consult AI chatbots for health questions at least once a month (1), while in Australia nearly 10 % of adults did so in just the first half of 2024 (2). Beyond mere curiosity, these tools can substantially improve comprehension: running standard d ischarge notes through GPT - 4 reduced the average reading grade level from 11th to 6th and boosted patient - understandability scores from 13 % to 81 % (3). Yet as fluently as large language models (LLMs) can rephrase medical text, they lack formal clinical v etting and still rely on statistical patterns in their training data. When generative AI echoes unverified or dangerous claims, it risks amplifying harm.
Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family
Langlais, Pierre-Carl, Chizhov, Pavel, Nee, Mattia, Hinostroza, Carlos Rosas, Delsart, Matthieu, Girard, Irรจne, Hicheur, Othman, Stasenko, Anastasia, Yamshchikov, Ivan P.
We introduce a new generation of small reasoning models for RAG, search, and source summarization. Pleias-RAG-350m and Pleias-RAG-1B are mid-trained on a large synthetic dataset emulating the retrieval of a wide variety of multilingual open sources from the Common Corpus. They provide native support for citation and grounding with literal quotes and reintegrate multiple features associated with RAG workflows, such as query routing, query reformulation, and source reranking. Pleias-RAG-350m and Pleias-RAG-1B outperform SLMs below 4 billion parameters on standardized RAG benchmarks (HotPotQA, 2wiki) and are competitive with popular larger models, including Qwen-2.5-7B, Llama-3.1-8B, and Gemma-3-4B. They are the only SLMs to date maintaining consistent RAG performance across leading European languages and ensuring systematic reference grounding for statements. Due to their size and ease of deployment on constrained infrastructure and higher factuality by design, the models unlock a range of new use cases for generative AI.
Using customized GPT to develop prompting proficiency in architectural AI-generated images
Rodriguez, Juan David Salazar, Joyce, Sam Conrad, Julfendi, null
This research investigates the use of customized GPT models to enhance prompting proficiency among architecture students when generating AI-driven images. Prompt engineering is increasingly essential in architectural education due to the widespread adoption of generative AI tools. This study utilized a mixed-methods experimental design involving architecture students divided into three distinct groups: a control group receiving no structured support, a second group provided with structured prompting guides, and a third group supported by both structured guides and interactive AI personas. Students engaged in reverse engineering tasks, first guessing provided image prompts and then generating their own prompts, aiming to boost critical thinking and prompting skills. Variables examined included time spent prompting, word count, prompt similarity, and concreteness. Quantitative analysis involved correlation assessments between these variables and a one-way ANOVA to evaluate differences across groups. While several correlations showed meaningful relationships, not all were statistically significant. ANOVA results indicated statistically significant improvements in word count, similarity, and concreteness, especially in the group supported by AI personas and structured prompting guides. Qualitative feedback complemented these findings, revealing enhanced confidence and critical thinking skills in students. These results suggest tailored GPT interactions substantially improve students' ability to communicate architectural concepts clearly and effectively.
How to watch LlamaCon 2025, Meta's first generative AI developer conference
After a couple years of having its open-source Llama AI model be just a part of its Connect conferences, Meta is breaking things out and hosting an entirely generative AI-focused developer conference called LlamaCon on April 29. The event is entirely virtual, and you'll be able to watch along live on the Meta for Developers Facebook page. LlamaCon kicks off at 1PM ET / 10AM PT with a keynote address from Meta's Chief Product Officer Chris Cox, Vice President of AI Manohar Paluri and research scientist Angela Fan. The keynote is supposed to cover developments in the company's open-source AI community, "the latest on the Llama collection of models and tools" and offer a glimpse at yet-to-be released AI features. The keynote address will be followed by a conversation at 1:45PM ET / 10:45PM ET between Meta CEO Mark Zuckerberg and Databricks CEO Ali Ghodsi on "building AI-powered applications," followed by a chat at 7PM ET / 4PM PT about "the latest trends in AI" between Zuckerberg and Microsoft CEO Satya Nadella. It doesn't seem like either conversation will be used to break news, but Microsoft and Meta have collaborated before, so anything is possible.
It seems like most Windows users don't care for Copilot
Copilot, Microsoft's AI assistant, appears to be struggling to match its competition in terms of popularity. The number of people using Copilot has remained around 20 million weekly users for the last year, according to tech newsletter Newcomer, while OpenAI's ChatGPT has hit as high as 400 million weekly users. The data was shared at an annual executive meeting in March by Microsoft's chief financial officer Amy Hood, Newcomer reports, and raise some concerns about the AI future Microsoft is pitching. Microsoft uses OpenAI's models to power Copilot, and the assistant offers similar features to ChatGPT, but they clearly don't draw the same interest from users. The company has also built Copilot into Windows 11, Microsoft 365 and the Edge browser, without apparently reaping the benefit of additional user growth.
AI Executives Promise Cancer Cures. Here's the Reality
To hear Silicon Valley tell it, the end of disease is well on its way. Demis Hassabis, a Nobel laureate for his AI research and the CEO of Google DeepMind, said on Sunday that he hopes that AI will be able to solve important scientific problems and help "cure all disease" within five to 10 years. Earlier this month, OpenAI released new models and touted their ability to "generate and critically evaluate novel hypotheses" in biology, among other disciplines. These are all executives marketing their products, obviously, but is there even a kernel of possibility in these predictions? If generative AI could contribute in the slightest to such discoveries--as has been promised since the start of the AI boom--where would the technology and scientists using it even begin?
OpenAI's Deep Research tool is coming to free accounts
OpenAI is giving free ChatGPT users limited access to its Deep Research tool without the need to pay for it. In addition, the company has expanded the tool's limits for all users by rolling out a lightweight version of it powered by its o4-mini model. It says the o4-mini Deep Research feature produces slightly shorter responses, but is "nearly as smart, more cost-efficient and delivers similarly high-quality results" as the original version. OpenAI previously released the tool for use by paying Pro, Plus, Team, Edu and Enterprise subscribers. But even they have a limited number of Deep Research queries per month.
Deep Generative Model-Based Generation of Synthetic Individual-Specific Brain MRI Segmentations
Wang, Ruijie, Rossetto, Luca, Mรฉrillat, Susan, Rรถcke, Christina, Martin, Mike, Bernstein, Abraham
To the best of our knowledge, all existing methods that can generate synthetic brain magnetic resonance imaging (MRI) scans for a specific individual require detailed structural or volumetric information about the individual's brain. However, such brain information is often scarce, expensive, and difficult to obtain. In this paper, we propose the first approach capable of generating synthetic brain MRI segmentations -- specifically, 3D white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) segmentations -- for individuals using their easily obtainable and often readily available demographic, interview, and cognitive test information. Our approach features a novel deep generative model, CSegSynth, which outperforms existing prominent generative models, including conditional variational autoencoder (C-VAE), conditional generative adversarial network (C-GAN), and conditional latent diffusion model (C-LDM). We demonstrate the high quality of our synthetic segmentations through extensive evaluations. Also, in assessing the effectiveness of the individual-specific generation, we achieve superior volume prediction, with mean absolute errors of only 36.44mL, 29.20mL, and 35.51mL between the ground-truth WM, GM, and CSF volumes of test individuals and those volumes predicted based on generated individual-specific segmentations, respectively.
Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT
Tayal, Anuja, Salunke, Devika, Di Eugenio, Barbara, Allen-Meares, Paula, Abril, Eulalia Puig, Garcia, Olga, Dickens, Carolyn, Boyd, Andrew
Conversational assistants are becoming more and more popular, including in healthcare, partly because of the availability and capabilities of Large Language Models. There is a need for controlled, probing evaluations with real stakeholders which can highlight advantages and disadvantages of more traditional architectures and those based on generative AI. We present a within-group user study to compare two versions of a conversational assistant that allows heart failure patients to ask about salt content in food. One version of the system was developed in-house with a neurosymbolic architecture, and one is based on ChatGPT. The evaluation shows that the in-house system is more accurate, completes more tasks and is less verbose than the one based on ChatGPT; on the other hand, the one based on ChatGPT makes fewer speech errors and requires fewer clarifications to complete the task. Patients show no preference for one over the other.
Energy Considerations of Large Language Model Inference and Efficiency Optimizations
Fernandez, Jared, Na, Clara, Tiwari, Vashisth, Bisk, Yonatan, Luccioni, Sasha, Strubell, Emma
As large language models (LLMs) scale in size and adoption, their computational and environmental costs continue to rise. Prior benchmarking efforts have primarily focused on latency reduction in idealized settings, often overlooking the diverse real-world inference workloads that shape energy use. In this work, we systematically analyze the energy implications of common inference efficiency optimizations across diverse Natural Language Processing (NLP) and generative Artificial Intelligence (AI) workloads, including conversational AI and code generation. We introduce a modeling approach that approximates real-world LLM workflows through a binning strategy for input-output token distributions and batch size variations. Our empirical analysis spans software frameworks, decoding strategies, GPU architectures, online and offline serving settings, and model parallelism configurations. We show that the effectiveness of inference optimizations is highly sensitive to workload geometry, software stack, and hardware accelerators, demonstrating that naive energy estimates based on FLOPs or theoretical GPU utilization significantly underestimate real-world energy consumption. Our findings reveal that the proper application of relevant inference efficiency optimizations can reduce total energy use by up to 73% from unoptimized baselines. These insights provide a foundation for sustainable LLM deployment and inform energy-efficient design strategies for future AI infrastructure.