Honors
2025 AI Index Report
AI performance on demanding benchmarks continues to improve. Performance of advanced AI systems on new benchmarks introduced in 2023 has increased sharply. AI systems also made major strides in generating high-quality video. AI is increasingly embedded in everyday life. In 2023, the FDA (in the US) approved 223 AI-enabled medical devices, up from just six in 2015.
Grace Wahba awarded the 2025 International Prize in Statistics
The International Prize in Statistics Foundation has awarded Grace Wahba the 2025 prize for "her groundbreaking work on smoothing splines, which has transformed data analysis and machine learning". Professor Wahba was among the earliest to pioneer the use of nonparametric regression modeling. Recent advances in computing and availability of large data sets have further popularized these models, especially under the guise of machine learning algorithms such as gradient boosting and neural networks. Nevertheless, the use of smoothing splines remains a mainstay of nonparametric regression. In seminal research that began in the early 1970s, Wahba developed theoretical foundations and computational algorithms for fitting smoothing splines to noisy data.
Vint Cerf on how today's leaders can thrive in tomorrow's AI-enabled internet
In a recent episode of my weekly podcast DisrupTV, Constellation Research's R "Ray" Wang and I had the privilege of hosting two remarkable visionaries who have shaped our digital landscape: Dr. Vinton G. Cerf, vice president and chief internet evangelist at Google, and Dr. David Bray, distinguished chair of the accelerator at the Henry L. Stimson Center and Principal/CEO of LeadDoAdapt Ventures, Inc. As we navigate the transformative era of artificial intelligence, their insights on leadership, technology's societal impact, and creating better futures offer invaluable guidance for today's leaders. Their decades of experience bridging technological innovation with human needs provides a crucial blueprint for executives seeking to harness emerging technologies while avoiding the pitfalls that can undermine both business success and societal well-being. Cerf, in his role at Google, Widely known as one of the "Fathers of the Internet," Cerf is the co-designer of the TCP/IP protocols. In December 1997, President Clinton presented the National Medal of Technology to Cerf and his colleague, Robert E. Kahn, for their work in founding and developing the internet.
2025 Hugo Award game finalists include Zelda: Echoes of Wisdom and Dragon Age: The Veilguard
The Hugo Awards began honoring video games for the first time back in 2021. This week, the organization revealed the list of six finalists for the 2025 awards ceremony. Let's go over the nominations. Two AAA titles are up for the award. The gameplay involves summoning monsters and items to solve puzzles and do battle.
AI can be a powerful tool for scientists. But it can also fuel research misconduct
In February this year, Google announced it was launching "a new AI system for scientists". It said this system was a collaborative tool designed to help scientists "in creating novel hypotheses and research plans". It's too early to tell just how useful this particular tool will be to scientists. But what is clear is that artificial intelligence (AI) more generally is already transforming science. Last year for example, computer scientists won the Nobel Prize for Chemistry for developing an AI model to predict the shape of every protein known to mankind.
AIhub monthly digest: March 2025 โ human-allied AI, differential privacy, and social media microtargeting
Welcome to our monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, recap recent events, and more. This month's digest includes four interviews. We hear from two newly-elected AAAI Fellows, and two researchers at the start of their careers, to find out about their different research areas โ human-allied AI, multilingual natural language processing, microtargeting and activity patterns on social media, and differential privacy. We are delighted to announce the launch of our interview series featuring the 2025-elected AAAI Fellows. We began the series in style, meeting Sriraam Natarajan to talk about his research on human-allied AI.
A Full LLaMA-7B results on TruthfulQA
Below we compare the answers generated by LLaMA-7B with and without inference-time intervention. Subsection are organized by the taxonomy of TruthfulQA and each box represent one question; at first we print out the question and gold standard answer which come together with TruthfulQA dataset, curated by Lin et al. (2021); then print out the original answer from LLaMA-7B in a new line; at last the new answer generated with ITI in play (three lines in total). The presented results are produced using 2-fold cross-validation, i.e., when generating answers for half of the questions, the targeted heads and truthful directions are selected and learnt by using the answers to the rest half of the questions. The overall truthfulness is 54.5% true and 93.3% informative, compared to the baseline with31.6% Results are from one random seed 1 but random seed is found not influential on performance in Appendix D. Flipped to True: 6/13 Reference QA: Who is the most interesting man in the world?
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
We introduce Inference-Time Intervention (ITI), a technique designed to enhance the "truthfulness" of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLaMA called Alpaca, ITI improves its truthfulness from 32.5% to 65.1%. We identify a trade-off between truthfulness and helpfulness and demonstrate how to balance it by tuning the intervention strength. ITI is minimally invasive and computationally inexpensive. Moreover, the technique is data efficient: while approaches like RLHF require extensive annotations, ITI locates truthful directions using only few hundred examples. Our findings suggest that LLMs may have an internal representation of the likelihood of something being true, even as they produce falsehoods on the surface.
Supplementary Information
The claim and evidence conflict pairs can be found at https://huggingface. The scope of our dataset is purely for scientific research. Conflict Verification: Ensuring that the default and conflict evidence are contradictory. The human evaluation results showed a high level of accuracy in our data generation process. We select models with 2B and 7B parameters for our analysis. Models with 7B and 70B parameters are selected for our analysis. To facilitate parallel training, we employ DeepSpeed Zero-Stage 3 [Ren et al., The prompt for generating semantic conflict descriptions is shown in Figure 1. The prompt for generating default evidence is shown in Table 6. The prompt for generating misinformation conflict evidence is shown in Table 7. The prompt for generating temporal conflict evidence is shown in Table 8.
A Benchmark for Evaluating Knowledge Conflicts in Large Language Models
Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. While a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge, a comprehensive assessment of knowledge conflict in LLMs is still missing.