agreement
US strikes Iran targets for second time in three days
The US military has carried out new strikes on Iran, targeting a military site in Bandar Abbas, a strategic port city. US Central Command (Centcom) said its forces also shot down four Iranian one-way attack drones that posed a threat around the Strait of Hormuz. The site in Bandar Abbas was struck as it was about to launch a fifth drone, Centcom said. Iranian media reported that explosions were heard to the east of the city. The strikes come amid a fragile ceasefire between the US and Iran, and protracted negotiations to end the three-month war that has choked traffic in the Strait of Hormuz and shot up global energy prices.
The Baltics urgently need a de-escalation mechanism; Belarus can help
Recent weeks have seen a significant escalation of military tensions in and around the Baltics. Lithuania, Latvia and Estonia, which are all NATO members, now experience regular incursions into their airspace by Ukrainian drones. According to both Kyiv and the Baltic capitals, those drones, en route to hit targets in western Russia, get diverted by Russian electronic jamming and end up entering these countries' territories. In early May, several stray unmanned aircraft crashed in Latvia, one of them damaging an oil storage facility. Those developments triggered a political crisis in Latvia and led to the collapse of its government.
More than 1.5m foreign pilgrims begin Hajj despite Iran war fears
More than 1.5m foreign pilgrims begin Hajj despite Iran war fears Muslims have begun the annual Hajj pilgrimage in Saudi Arabia against the backdrop of a region deeply shaken by the Iran war. Saudi authorities said last week that some 1.51 million pilgrims had arrived from outside the kingdom. That is 11,000 more than last year, despite concerns in the region about a resumption of the three-month-old conflict between the US, Israel and Iran. Before a fragile ceasefire took effect last month, Iran launched waves of missile and drone attacks on Saudi Arabia and its Gulf neighbours in retaliation for US and Israeli air strikes. Two civilians living in the central city of al-Kharj were killed in an Iranian attack on 8 March, along with a US service member stationed at the nearby Prince Sultan Air Base.
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
Wang, Tong, Xu, Yiqing, Yang, Leo Yang
Interpretable text representations should expose coordinates that are not only predictive, but also meaningful enough for independent auditors to apply. Existing discriminative representations often use anonymous embedding directions, while concept-bottleneck and LLM-assisted methods attach natural-language names to features without ensuring that those definitions are reproducible or distinct from the target label. We propose an operational criterion for interpretable discriminative text representations: each coordinate should satisfy conceptual clarity, measured by chance-adjusted agreement between independent annotators applying the feature definition, and label disentanglement, meaning the feature should not merely paraphrase the prediction target. We instantiate this criterion in LLM-assisted Feature Discovery (LFD), an iterative method that proposes lexical and semantic features from contrastive outcome-opposed text pairs, screens candidates using cross-LLM Cohen's $ฮบ$, and selects features by residual held-out predictive gain. A stylized analysis connects the $ฮบ$ screen to a per-feature annotation-noise bound, formalizing agreement as a reliability check. Across ten text-classification tasks spanning seven corpora, LFD matches the predictive performance of a strong text bottleneck baseline while producing substantially clearer and less label-entangled features. Human audits with 232 raters show that LFD features achieve higher human--human and human--LLM agreement than baseline concepts, and raters consistently judge them as less label-leaking. These results suggest that agreement-tested, label-disentangled coordinates provide a practical auditability standard for interpretable text classification.
Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?
Large language models (LLMs) are increasingly used as automated evaluators of AI systems, including in high-stakes applications. In this role, LLMs are used to generate judgments about the quality, appropriateness, or even safety of model outputs. This approach is motivated by practical constraints. Expert human ratings are costly and difficult to scale, whereas LLM ratings can be produced quickly at low cost. However, current approaches to deploying LLM evaluators are ad hoc, typically limited to reporting agreement metrics between human and LLM judges as a justification for substitution of human ratings, and lack a formal basis for study design. This paper (1) shifts the role of the LLM judge from substitutive to auxiliary, and (2) formulates the LLM-as-a-judge paradigm as one of augmenting human evaluation through a two-stage sampling design, where LLM evaluations are measured for all observations at the first stage and human ratings are partially observed for a subsample at the second stage. We propose to use a doubly robust estimator from the missing data literature, which takes advantage of the robustness property against the prediction model, since the missingness model is known by design. Using the asymptotic variance of this estimator, we propose how sample sizes of human and LLM ratings can be determined to achieve a targeted level of power. We also show that a study can be efficiently designed by allocating more human ratings for types of evaluations where the predictability of LLM ratings is not high. To the best of our knowledge, there is very little guidance on how much human oversight should be retained when validating benchmarks.
Risk-Controlled Post-Processing of Decision Policies
Joshi, Sunay, Wang, Tao, Hassani, Hamed, Dobriban, Edgar
Predictive models are often deployed through existing decision policies that stakeholders are reluctant to change unless a risk constraint requires intervention. We study risk-controlled post-processing: given a deterministic baseline policy, choose a new policy that maximizes agreement with the baseline subject to a chance constraint on a user-specified loss. At the population level, we show that the optimal policy has a threshold structure: it follows the baseline except on contexts where switching to the oracle fallback policy yields a large reduction in conditional violation risk. At the finite-sample level, given a fitted fallback policy and score, we develop a post-processing algorithm that uses calibration data to select a threshold. Leveraging tools from algorithmic stability and stochastic processes, we show that under regularity conditions, in the i.i.d. setting, the expected excess risk of the post-processed policy is $O(\log n/n)$. In the special case when an exact-safe fallback policy is available, the algorithm achieves precise expected risk control under exchangeability. In this setting, we also give high-probability near-optimality guarantees on the post-processed policy. Experiments on a COVID-19 radiograph diagnosis task, an LLM routing problem, and a synthetic multiclass decision task show that targeted post-processing can meet or nearly meet risk budgets while preserving substantially more agreement with the baseline than score-blind random mixing.
Heterogeneous Ordinal Structure Learning with Bayesian Nonparametric Complexity Discovery
Public attitudes toward artificial intelligence are heterogeneous, ordinally measured, and poorly captured by any single dependency graph. Existing ordinal structure learners assume a shared directed acyclic graph (DAG) across all respondents; recent heterogeneous ordinal graphical-model approaches focus on subgroup discovery rather than confirmatory cluster-specific DAG estimation; and latent profile analyses discard dependency structure entirely. We introduce a heterogeneous ordinal structure-learning framework combining monotone Gaussian score embedding, Bayesian nonparametric (BNP) complexity discovery via a truncated stick-breaking prior, and confirmatory fixed-K estimation with cluster-specific sparse DAG learning. The key methodological insight is a discovery-to-confirmation workflow: the nonparametric stage calibrates plausible archetype complexity, while inner-validated confirmatory refitting yields stable, interpretable structural estimates. On the 2024 Pew American Trends Panel AI attitudes survey, Wave 152 (W152) survey, (N = 4,788, 8 ordinal items), the confirmatory K*=5 model reduces holdout transformed-score mean squared error (MSE) by 25.8% over a single-graph baseline and by 4.6% over mixture-only clustering. A controlled tiered semi-synthetic benchmark calibrated to W152 structure validates recovery across difficulty regimes and transparently reveals failure modes under stress conditions.
SpaceX backs Anthropic with data centre deal amidst Musk's OpenAI lawsuit
SpaceX backs Anthropic with data centre deal amidst Musk's OpenAI lawsuit Anthropic has reached a deal to tap the computing resources of Elon Musk's SpaceX, marking a detente with its one-time critic and a boost for both companies in the high-stakes artificial intelligence race. Under the agreement announced on Wednesday, Anthropic will use the full computing power of SpaceX's Colossus 1 facility in Memphis, Tennessee, which houses more than 220,000 Nvidia processors and will give the Claude chatbot maker 300 megawatts of new capacity within a month. That's enough electricity to power more than 300,000 homes - as the Dario Amodei-led company seeks to boost the capacity of its Claude Pro and Claude Max AI assistants for subscribers. The tool allows AI systems to review work between sessions, spot patterns, and update files that store user preferences and other context. Available as a research preview, "dreaming" comes with software for managing agents, or AI programmes that perform tasks with little human involvement.
Microsoft, Google, xAI give US access to AI models for security testing
Tech giants Microsoft, Google and xAI say they will allow the United States federal government access to their new artificial intelligence models for national security testing. The Center for AI Standards and Innovation (CAISI) at the Department of Commerce announced the agreement on Tuesday amid increasing concerns about the capabilities that Anthropic's newly unveiled Mythos model could give hackers. The agreement fulfils a pledge the administration of US President Donald Trump made in July to partner with technology companies to vet their AI models for "national security risks". Microsoft will work with US government scientists to test AI systems "in ways that probe unexpected behaviors", the company said in a statement. Together they will develop shared data sets and workflows for testing the company's models, the company said.
Deadly Israeli strikes on southern Lebanon despite ceasefire
At least nine people, including two children, were killed in Israeli strikes in southern Lebanon on Thursday, the health ministry said, as violence continues despite a ceasefire now in its second week. The strikes - which Israel said were targeting Hezbollah infrastructure - also wounded 23 people, among them eight children and seven women, the ministry said. Separately, Hezbollah said it had carried out attacks on Israeli forces in the south, including a drone strike targeting soldiers in the Bint Jbeil district. The violence comes as Israel presses ahead with military operations in Lebanon despite the ceasefire announced on 16 April, after direct talks between Lebanese and Israeli ambassadors in Washington. Lebanese President Joseph Aoun criticised what he described as continuing Israeli violations of the truce, saying strikes and demolitions of homes and places of worship were ongoing despite the ceasefire.