Government
Bayesian Evaluation of Large Language Model Behavior
Longjohn, Rachel, Wu, Shang, Kher, Saatvik, Belém, Catarina, Smyth, Padhraic
It is increasingly important to evaluate how text generation systems based on large language models (LLMs) behave, such as their tendency to produce harmful output or their sensitivity to adversarial inputs. Such evaluations often rely on a curated benchmark set of input prompts provided to the LLM, where the output for each prompt may be assessed in a binary fashion (e.g., harmful/non-harmful or does not leak/leaks sensitive information), and the aggregation of binary scores is used to evaluate the LLM. However, existing approaches to evaluation often neglect statistical uncertainty quantification. With an applied statistics audience in mind, we provide background on LLM text generation and evaluation, and then describe a Bayesian approach for quantifying uncertainty in binary evaluation metrics. We focus in particular on uncertainty that is induced by the probabilistic text generation strategies typically deployed in LLM-based systems. We present two case studies applying this approach: 1) evaluating refusal rates on a benchmark of adversarial inputs designed to elicit harmful responses, and 2) evaluating pairwise preferences of one LLM over another on a benchmark of open-ended interactive dialogue examples. We demonstrate how the Bayesian approach can provide useful uncertainty quantification about the behavior of LLM-based systems.
Zelenskyy says Ukraine working on new prisoner exchange with Russia
Is the fall of Pokrovsk inevitable? Is Trump losing patience with Putin? Will sanctions against Russian oil giants hurt Putin? Ukraine is working to resume prisoner exchanges with Russia that could bring 1,200 Ukrainians home, President Volodymyr Zelenskyy says, a day after his national security chief announced progress in negotiations. "We are counting on the resumption of POW exchanges," Zelenskyy wrote on X on Sunday.
Nature is not a blocker to housing growth, MPs find
Nature is not a blocker to housing growth and the government risks missing both its housing and nature targets if it views it as one, a cross-party group of MPs has warned in a new report. The Planning and Infrastructure Bill overrides existing habitat protections, which the government has suggested is a barrier to its target to build 1.5 million houses by the end of this parliament. But in a report published on Sunday, the Environmental Audit Committee (EAC) found the measures outlined in the bill are not enough to allow the government to meet its goals. Using nature as a scapegoat means that the government will be less effective at tackling some of the genuine challenges facing the planning system, the report said. A Ministry of Housing spokesperson said it was fixing a failing system with landmark reforms, which would deliver a win-win for the economy and the environment.
What's Grokipedia, Musk's AI-powered rival to Wikipedia?
US shutdown ends: What happens next? New Epstein emails: What do they say about Trump? Last month, tech billionaire Elon Musk launched Grokipedia, an AI-powered platform, to rival online encyclopedia Wikipedia. "Grokipedia will exceed Wikipedia by several orders of magnitude in breadth, depth and accuracy," Musk posted on X the day after his site went live on October 27. Grokipedia will exceed Wikipedia by several orders of magnitude in breadth, depth and accuracy https://t.co/Nt4M6vqEZu
How Google's DeepMind tool is 'more quickly' forecasting hurricane behavior
How Google's DeepMind tool is'more quickly' forecasting hurricane behavior'Less expensive and time consuming' model helps with fast and accurate predictions, possibly saving lives and property When then Tropical Storm Melissa was churning south of Haiti, Philippe Papin, a National Hurricane Center (NHC) meteorologist, had confidence it was about to grow into a monster hurricane. As the lead forecaster on duty, he predicted that in just 24 hours the storm would become a category 4 hurricane and begin a turn towards the coast of Jamaica. No NHC forecaster had ever issued such a bold forecast for rapid strengthening. But Papin had an ace up his sleeve: artificial intelligence in the form of Google's new DeepMind hurricane model - released for the first time in June. And, as predicted, Melissa did become a storm of astonishing strength that tore through Jamaica.
Indictment of ex-Newsom aide hints at feds' probe into state's earlier investigation of video game giant
Things to Do in L.A. Tap to enable a layout that focuses on the article. Dana Williamson, Gov. Gavin Newsom's former chief of staff, leaves the Robert T. Matsui United States Courthouse in Sacramento after being arrested in a federal public corruption probe involving multiple counts of bank and wire fraud on Wednesday. This is read by an automated voice. Please report any issues or inconsistencies here . Newsom's former chief of staff and two political operatives face federal corruption charges for fraud, including misusing campaign funds for luxury purchases.
Zelensky vows energy sector overhaul after 100m corruption scandal
Ukrainian President Volodymyr Zelensky has vowed to overhaul state-owned energy companies, after a major corruption scandal engulfed the country's energy sector. Around $100 million (£76m) has been embezzled, anti-graft investigators said, causing outrage in a country where Russian attacks have resulted in crippling power outages. Alongside a full audit of their financial activities, the management of these companies is to be renewed, Zelensky wrote in a post on X on Saturday. Energoatom, the state nuclear company at the heart of the scandal, will have a new supervisory board within a week, he added. Several of those implicated in the scandal have close links to the Ukrainian president.