Goto

Collaborating Authors

 Oceania


BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection

arXiv.org Artificial Intelligence

One of the main challenges in mechanistic interpretability is circuit discovery, determining which parts of a model perform a given task. We build on the Mechanistic Interpretability Benchmark (MIB) and propose three key improvements to circuit discovery. First, we use bootstrapping to identify edges with consistent attribution scores. Second, we introduce a simple ratio-based selection strategy to prioritize strong positive-scoring edges, balancing performance and faithfulness. Third, we replace the standard greedy selection with an integer linear programming formulation. Our methods yield more faithful circuits and outperform prior approaches across multiple MIB tasks and models. Our code is available at: https://github.com/technion-cs-nlp/MIB-Shared-Task.


Hierarchical Graph Networks for Accurate Weather Forecasting via Lightweight Training

arXiv.org Artificial Intelligence

Climate events arise from intricate, multivariate dynamics governed by global-scale drivers, profoundly impacting food, energy, and infrastructure. Yet, accurate weather prediction remains elusive due to physical processes unfolding across diverse spatio-temporal scales, which fixed-resolution methods cannot capture. Hierarchical Graph Neural Networks (HGNNs) offer a multiscale representation, but nonlinear downward mappings often erase global trends, weakening the integration of physics into forecasts. We introduce HiFlowCast and its ensemble variant HiAntFlow, HGNNs that embed physics within a multiscale prediction framework. Two innovations underpin their design: a Latent-Memory-Retention mechanism that preserves global trends during downward traversal, and a Latent-to-Physics branch that integrates PDE solution fields across diverse scales. Our Flow models cut errors by over 5% at 13-day lead times and by 5-8% under 1st and 99th quantile extremes, improving reliability for rare events. Leveraging pretrained model weights, they converge within a single epoch, reducing training cost and their carbon footprint. Such efficiency is vital as the growing scale of machine learning challenges sustainability and limits research accessibility. Code and model weights are in the supplementary materials.


Wisdom and Delusion of LLM Ensembles for Code Generation and Repair

arXiv.org Artificial Intelligence

Today's pursuit of a single Large Language Model (LMM) for all software engineering tasks is resource-intensive and overlooks the potential benefits of complementarity, where different models contribute unique strengths. However, the degree to which coding LLMs complement each other and the best strategy for maximizing an ensemble's potential are unclear, leaving practitioners without a clear path to move beyond single-model systems. To address this gap, we empirically compare ten individual LLMs from five families, and three ensembles of these LLMs across three software engineering benchmarks covering code generation and program repair. We assess the complementarity between models and the performance gap between the best individual model and the ensembles. Next, we evaluate various selection heuristics to identify correct solutions from an ensemble's candidate pool. We find that the theoretical upperbound for an ensemble's performance can be 83% above the best single model. Our results show that consensus-based strategies for selecting solutions fall into a "popularity trap," amplifying common but incorrect outputs. In contrast, a diversity-based strategy realizes up to 95% of this theoretical potential, and proves effective even in small two-model ensembles, enabling a cost-efficient way to enhance performance by leveraging multiple LLMs.


Amazon reports strongest cloud growth since 2022 after major outage

The Guardian

An aerial view of an Amazon Web Services Data Center known as US East 1 in Ashburn, Virginia on 20 October 2025. An aerial view of an Amazon Web Services Data Center known as US East 1 in Ashburn, Virginia on 20 October 2025. Thu 30 Oct 2025 16.50 EDTLast modified on Fri 31 Oct 2025 05.25 EDT Amazon has made its first financial disclosures since the disastrous outage suffered by its cloud computing division that brought everything from smart beds to banks offline. In spite of the global outage, Amazon Web Services has continued to grow, and this quarter reported a 20% increase in revenue year over year. Wall Street estimated that AWS would bring in $32.42bn in net sales in the third quarter, with the company reporting actual revenue of $33bn.


18th century lead ammo found in Scottish Highlands

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. Archaeologists in Scotland have excavated over 100 weapon projectiles, including cannon shot and lead musket balls from one of the country's most famous battlefields . With these new finds, experts say they can better contextualize the Battle of Culloden, as well as highlight some of the conflict's lesser known participants. In July 1745, Charles Stuart arrived in Scotland seeking to return his father to the British throne. For the next nine months, Stuart proceeded to lead thousands of supporters, militiamen, and conscripted soldiers in a military campaign now known as the Jacobite rising of 1745 .


Neanderthals used 'crayons' to color

Popular Science

Science Biology Evolution Neanderthals used'crayons' to color Ancient ochre pigment fragments show that our cousins had an artistic flair. Breakthroughs, discoveries, and DIY tips sent every weekday. Neanderthals are getting a well-deserved scientific rewrite. A growing body of paleoarchaeological evidence indicates that our extinct cousins were far from the lumbering oafs we initially believed them to be. Recent discoveries show the one-time Homo sapien competitors were creative enough to craft stone multitools and even collect small trinkets.


OpenAI thought to be preparing for 1tn stock market float

The Guardian

A float would support Sam Altman's ambitions to splash trillions of dollars on building datacentres. A float would support Sam Altman's ambitions to splash trillions of dollars on building datacentres. OpenAI is reportedly gearing up for a stock market listing valuing the company at $1tn (ยฃ760bn) as soon as next year, in what would be one of the biggest ever initial public offerings. The developer behind the hit AI chatbot ChatGPT is considering whether to file for an IPO as soon as the second half of 2026, according to Reuters, which cited people familiar with the matter. The company is thought to be looking to raise at least $60bn.


Beach bliss turns chaotic as shark lunges at snorkeler: 'He could have ripped my arm off'

FOX News

Snorkeler reportedly receives 27 stitches after shark bites arm multiple times off Boca Chita Key. Latest of 10 Florida shark attacks reported this year in state waters.


Verdicts in as Liam Hemsworth takes over as The Witcher

BBC News

The latest season of Netflix's The Witcher has landed - with one big difference. Former lead actor Henry Cavill has been replaced as main character Geralt of Rivia by Liam Hemsworth. The Australian has stepped in for the final two seasons of the fantasy show, based on a popular series of novels and video games. Previously, British actor Cavill had portrayed the title character, a monster hunter with supernatural abilities known as the White Wolf. When he announced he was passing the torch to Hemsworth in October 2022, describing him as a fantastic actor, not all fans agreed.


Caught on camera: Rats hunting bats mid-flight

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. For the first time, a brown rat has been caught on camera actively hunting bats . The never-before-seen footage shows the rat grabbing a snack at hibernation sites in northern Germany. While it's undeniably impressive that rats can grab their supper mid-air, the new footage does not bode well for the bats. According to a study recently published in the journal, rat predation may cause enough damage to significantly threaten local bat populations .