agreement
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance
Programming assistants powered by large language models have improved dramatically, yet existing benchmarks still evaluate them in narrow code-generation settings. Recent efforts such as InfiBench and StackEval rely on Stack Overflow questions and remain limited to single-turn interactions, manually curated data, and isolated snippets rather than full project environments. We introduce CodeAssistBench (CAB), the first benchmark for evaluating multi-turn, project-grounded programming assistance at scale. CAB automatically constructs datasets from GitHub issues tagged as questions, using an LLM-driven pipeline that filters noise, extracts runnable contexts, builds executable containers, and verifies environment correctness. This enables continuous, automated expansion across diverse repositories without manual intervention. Using CAB, we create a testbed of 3,286 real-world issues across 214 repositories, spanning seven languages. Evaluating state-of-theart models reveals a substantial gap: while models achieve 70-83% accuracy on Stack Overflow-style questions, they solve only 7.22-16.49% of CAB issues from post-training-cutoff repositories. These results highlight a fundamental challenge: current LLMs struggle to provide assistance in realistic, project-specific contexts despite strong performance on traditional Q&A benchmarks. CAB provides a scalable, reproducible framework for advancing research in multi-turn, codebasegrounded programming agents.
The 60-Day Test: What Iran's Agreement with the United States Really Means
Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens.
Israeli attacks on southern Lebanon kill three despite US-Iran deal
What is Lebanon's Beaufort Castle? Why is Israel attacking Nabatieh? Israeli air attacks on southern Lebanon have killed at least three people, Lebanese state media has reported, a day after the United States and Iran signed an interim agreement that called for an end to their war on all fronts, including Lebanon. Lebanon's National News Agency (NNA) reported on Thursday that an Israeli drone attack hit a car near the town of Kfar Tebnit, killing two people. NNA also reported that a strike carried out by an Israeli drone in the town of Beit Yahoun in the Nabatieh governorate wounded two people.
Trump's Iran Agreement Draws More Alarm Than Relief From GOP
Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. The D.C. Brief Open follow modal Personalized Content Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens.
Israeli air strikes on Lebanon continue despite US-Iran deal
What is Lebanon's Beaufort Castle? Why is Israel attacking Nabatieh? Israeli air strikes have continued to target towns in southern Lebanon despite an agreement between the United States and Iran set to be formally signed on Friday to end the war on all fronts. Israeli drones carried out three attacks in Tyre that resulted in injuries while a drone also targeted the Bint Jbeil district in Nabatieh, Lebanon's state-run National News Agency said on Wednesday. Earlier on Wednesday, Al Jazeera correspondents on the ground reported that Israeli forces carried out an air strike on the outskirts of Kfar Tebnit, also in the Nabatieh district.
Israel launches fresh strikes on Lebanon despite Trump criticism
Israeli forces have carried out new strikes in southern Lebanon, state media say, despite renewed criticism from US President Donald Trump of Israel's actions in the country. Israeli drone strikes injured several people in Mansouri and Aaziyyeh on Wednesday, while jets attacked Nabatieh al-Fawqa and Kfar Tebnit, Lebanon's National News Agency reported. Israel's military has not commented, but it did say five soldiers were injured in a drone attack in Lebanon by the Iran-backed armed group Hezbollah. Mediator Pakistan has said the deal between the US and Iran to end the war includes Lebanon. On Tuesday, Trump said Israel's prime minister needed to be more responsible with respect to Lebanon.
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings such as therapy, education, and social role-play. While these simulations enable scalable training and evaluation of AI agents, off-the-shelf LLMs often drift from their assigned personas, contradict earlier statements, or abandon role-appropriate behavior. We introduce a unified framework for evaluating and improving persona consistency in LLM-generated dialogue. We define three automatic metrics--prompt-to-line consistency, line-to-line consistency, and Q&A consistency--that capture different types of persona drift and validate each against human annotations. Using these metrics as reward signals, we apply multiturn reinforcement learning to fine-tune LLMs for three user roles: a patient, a student, and a social chat partner. Our method reduces inconsistency by over 55%, resulting in more coherent, faithful, and trustworthy simulated users.