Goto

Collaborating Authors

 berlin


Former member of German militant group jailed for armed robberies after 30 years on the run

BBC News

A former member of the German militant group Red Army Faction (RAF) has been jailed for 13 years for carrying out a string of armed robberies between 1999 and 2016. Daniela Klette, 67, was finally caught in a flat in Berlin in 2024 after more than 30 years on the run. She went on trial last year. Her defence had called for her acquittal but the court in Verden in Lower Saxony found her guilty on Wednesday of aggravated robbery, violating weapons laws and other offences over a 17-year period. Klette's RAF group, also known as the Baader-Meinhof gang, was eventually disbanded after a campaign of murder, kidnapping and bombing from the early 1970s to the early 1990s.


Champion ethical hacker warns AI tools like Mythos will make competing harder

BBC News

An ethical hacker who just won major prizes at a prestigious international competition says her days of competing could be numbered due to the rise of AI tools like Claude Mythos. Valentina Palmiotti - better known as Chompie - was the most successful individual at the annual Pwn2Own hacking competition in Berlin. She told BBC News that, for now, AI tools were helping her to win bug bounties - money given to hackers who spot vulnerabilities in online systems before they can be exploited by cyber-criminals. But she said systems like Mythos were so powerful that even champion hackers like her would soon struggle to compete with them. AI has shaken the cyber-security world, with concerns focussing on Mythos in particular.


5fc47800ee5b30b8777fdd30abcaaf3b-Supplemental-Conference.pdf

Neural Information Processing Systems

Having defined and validated the pairwise feedback simulator and evaluations in AlpacaFarm, we569 now turn our attention to studying methods that learn from pairwise feedback on AlpacaFarm.570 Unfortunately, the lack of existing benchmarks for learning from pairwise feedback for instruction571 following means that there has not been any open study of these methods in the instruction-following572 setting. In the remainder of this section, we will introduce our reference methods, which fall into two575 categories based on whether they fit a surrogate reward model as part of the learning process.576 FeedME is a method proposed by OpenAI [45] that incorporates human feedback578 with supervised fine-tuning on model generations that are rated 7/7 by human labelers. We adapt579 this approach to the pairwise feedback setting and call this baseline binary FeedME. This approach580 fine-tunes the SFT model on the chosen response in each preference pair with supervised learning.581 Motivated by controllable generation through conditioning [27, 34,582 29, 21], we propose binary reward conditioning, a baseline method that fine-tunes the SFT model583 with the feedback data Dpairwise by conditioning instances with either a positive or negative control584 token. Specifically, for each instance (x,y0,y1,z) 2D pairwise, the string concatenation of instruction585 x and response yz denoted as [x,yz] is prepended with the positive token and used in supervised586 fine-tuning (similarly [x,y1 z]is prepended with the negative token). This process creates a modified587 demonstration dataset that is double the size of Dpairwise. At test time, we draw samples from the588 fine-tuned model conditioned on the positive token.589 A.2 Methods that optimize a surrogate reward function590 We now describe methods that incorporate feedback by first building a surrogate reward model with591 pairwise feedback data. To start, we describe the step of training the surrogate reward model.592 While this can be a powerful approach,596 we will see that it can also lead to over-optimization [19] where models learn to exploit the reward597 model rather than achieve high true reward. We now describe 4 methods that leverage the surrogate598 reward model.599





Is Good Taste a Trap?

The New Yorker

Is Good Taste a Trap? The judgments we use to elevate our lives can also hem them in. In Belle Burden's memoir, " Strangers," she describes the end of her marriage. It happened suddenly: until learning of her husband's infidelity, through a voice mail from a stranger, she had no idea anything was wrong. Burden and her husband shared an apartment in Tribeca and a house on Martha's Vineyard.


Russia-Ukraine war: List of key events, day 1,391

Al Jazeera

What is in the 28-point US plan for Ukraine? 'Ukraine is running out of men, money and time' Can the US get all sides to end the war? Why is Europe opposing Trump's peace plan? A Russian drone attack killed a 62-year-old Ukrainian man as he was riding a bicycle in the Velyka Pysarivka community of Ukraine's Sumy region, Governor Oleh Hryhorov said in a post on the Telegram messaging app. Russian forces launched 850 attacks on Ukraine's Zaporizhia region in a single day, injuring 14 people and damaging houses, cars and infrastructure, Governor Ivan Fedorov said on Telegram.


Russia-Ukraine war: List of key events, day 1,389

Al Jazeera

What is in the 28-point US plan for Ukraine? 'Ukraine is running out of men, money and time' Can the US get all sides to end the war? Why is Europe opposing Trump's peace plan? Two people were killed in a Ukrainian drone strike on the Russian city of Saratov, regional Governor Roman Busargin said in a statement on Telegram. An unspecified number of people were also injured in the attack.


LingGym: How Far Are LLMs from Thinking Like Field Linguists?

arXiv.org Artificial Intelligence

This paper introduces LingGym, a new benchmark that evaluates LLMs' capacity for meta-linguistic reasoning using Interlinear Glossed Text (IGT) and grammatical descriptions extracted from 18 typologically diverse reference grammars. Unlike previous work that focuses on specific downstream tasks, we assess whether LLMs can generalize linguistic inference across low-resource languages and structures not seen during training. We present a controlled evaluation task: Word-Gloss Inference, in which the model must infer a missing word and gloss from context using varying levels of linguistic information (e.g., glosses, grammatical explanations, translations). Our results show that incorporating structured linguistic cues leads to consistent improvements in reasoning performance across all models. This work highlights both the promise and current limitations of using LLMs for typologically informed linguistic analysis and low-resource language documentation.