Goto

Collaborating Authors

 Government


Bayesian Evaluation of Large Language Model Behavior

arXiv.org Machine Learning

It is increasingly important to evaluate how text generation systems based on large language models (LLMs) behave, such as their tendency to produce harmful output or their sensitivity to adversarial inputs. Such evaluations often rely on a curated benchmark set of input prompts provided to the LLM, where the output for each prompt may be assessed in a binary fashion (e.g., harmful/non-harmful or does not leak/leaks sensitive information), and the aggregation of binary scores is used to evaluate the LLM. However, existing approaches to evaluation often neglect statistical uncertainty quantification. With an applied statistics audience in mind, we provide background on LLM text generation and evaluation, and then describe a Bayesian approach for quantifying uncertainty in binary evaluation metrics. We focus in particular on uncertainty that is induced by the probabilistic text generation strategies typically deployed in LLM-based systems. We present two case studies applying this approach: 1) evaluating refusal rates on a benchmark of adversarial inputs designed to elicit harmful responses, and 2) evaluating pairwise preferences of one LLM over another on a benchmark of open-ended interactive dialogue examples. We demonstrate how the Bayesian approach can provide useful uncertainty quantification about the behavior of LLM-based systems.


Niger fallout under Biden leaves US troops 'blind' in battle with terror groups

FOX News

Biden administration's diplomatic dispute led to U.S. expulsion from Niger, eliminating drone surveillance capabilities needed to combat Sahel region terrorism.


Zelenskyy says Ukraine working on new prisoner exchange with Russia

Al Jazeera

Is the fall of Pokrovsk inevitable? Is Trump losing patience with Putin? Will sanctions against Russian oil giants hurt Putin? Ukraine is working to resume prisoner exchanges with Russia that could bring 1,200 Ukrainians home, President Volodymyr Zelenskyy says, a day after his national security chief announced progress in negotiations. "We are counting on the resumption of POW exchanges," Zelenskyy wrote on X on Sunday.


Nature is not a blocker to housing growth, MPs find

BBC News

Nature is not a blocker to housing growth and the government risks missing both its housing and nature targets if it views it as one, a cross-party group of MPs has warned in a new report. The Planning and Infrastructure Bill overrides existing habitat protections, which the government has suggested is a barrier to its target to build 1.5 million houses by the end of this parliament. But in a report published on Sunday, the Environmental Audit Committee (EAC) found the measures outlined in the bill are not enough to allow the government to meet its goals. Using nature as a scapegoat means that the government will be less effective at tackling some of the genuine challenges facing the planning system, the report said. A Ministry of Housing spokesperson said it was fixing a failing system with landmark reforms, which would deliver a win-win for the economy and the environment.


What's Grokipedia, Musk's AI-powered rival to Wikipedia?

Al Jazeera

US shutdown ends: What happens next? New Epstein emails: What do they say about Trump? Last month, tech billionaire Elon Musk launched Grokipedia, an AI-powered platform, to rival online encyclopedia Wikipedia. "Grokipedia will exceed Wikipedia by several orders of magnitude in breadth, depth and accuracy," Musk posted on X the day after his site went live on October 27. Grokipedia will exceed Wikipedia by several orders of magnitude in breadth, depth and accuracy https://t.co/Nt4M6vqEZu


How Google's DeepMind tool is 'more quickly' forecasting hurricane behavior

The Guardian

How Google's DeepMind tool is'more quickly' forecasting hurricane behavior'Less expensive and time consuming' model helps with fast and accurate predictions, possibly saving lives and property When then Tropical Storm Melissa was churning south of Haiti, Philippe Papin, a National Hurricane Center (NHC) meteorologist, had confidence it was about to grow into a monster hurricane. As the lead forecaster on duty, he predicted that in just 24 hours the storm would become a category 4 hurricane and begin a turn towards the coast of Jamaica. No NHC forecaster had ever issued such a bold forecast for rapid strengthening. But Papin had an ace up his sleeve: artificial intelligence in the form of Google's new DeepMind hurricane model - released for the first time in June. And, as predicted, Melissa did become a storm of astonishing strength that tore through Jamaica.


How AI is making IVF more predictable

FOX News

Gaia Family uses AI technology to provide fixed-cost IVF treatment plans, removing financial uncertainty for couples through predictable pricing and built-in protections.


Skies at stake: Inside the U.S.–China race for air dominance

FOX News

Military experts warn that Chinese missile strikes on U.S. air bases could cripple American airpower in the Pacific, as both nations pursue different strategies for air superiority.


Indictment of ex-Newsom aide hints at feds' probe into state's earlier investigation of video game giant

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. Dana Williamson, Gov. Gavin Newsom's former chief of staff, leaves the Robert T. Matsui United States Courthouse in Sacramento after being arrested in a federal public corruption probe involving multiple counts of bank and wire fraud on Wednesday. This is read by an automated voice. Please report any issues or inconsistencies here . Newsom's former chief of staff and two political operatives face federal corruption charges for fraud, including misusing campaign funds for luxury purchases.


Zelensky vows energy sector overhaul after 100m corruption scandal

BBC News

Ukrainian President Volodymyr Zelensky has vowed to overhaul state-owned energy companies, after a major corruption scandal engulfed the country's energy sector. Around $100 million (£76m) has been embezzled, anti-graft investigators said, causing outrage in a country where Russian attacks have resulted in crippling power outages. Alongside a full audit of their financial activities, the management of these companies is to be renewed, Zelensky wrote in a post on X on Saturday. Energoatom, the state nuclear company at the heart of the scandal, will have a new supervisory board within a week, he added. Several of those implicated in the scandal have close links to the Ukrainian president.