Industry
PUO-Bench: A Panel Understanding and Operation Benchmark with A Privacy-Preserving Framework
Recent advancements in Vision-Language Models (VLMs) have enabled GUI agents to leverage visual features for interface understanding and operation in the digital world. However, limited research has addressed the interpretation and interaction with control panels in real-world settings. To bridge this gap, we propose the Panel Understanding and Operation (PUO) benchmark, comprising annotated panel images from appliances and associated vision-language instruction pairs. Experimental results on the benchmark demonstrate significant performance disparities between zero-shot and fine-tuned VLMs, revealing the lack of PUO-specific capabilities in existing language models. Furthermore, we introduce a Privacy-Preserving Framework (PPF) to address privacy concerns in cloud-based panel parsing and reasoning. PPF employs a dual-stage architecture, performing panel understanding on edge devices while delegating complex reasoning to cloud-based LLMs. Although this design introduces a performance trade-off due to edge model limitations, it eliminates the transmission of raw visual data, thereby mitigating privacy risks. Overall, this work provides foundational resources and methodologies for advancing interactive human-machine systems and robotic field in panel-centric applications.
The Download: soccer's data renaissance and China's big nuclear plans
Plus: Autonomous drones may have killed soldiers for the first time. Imagine tuning in to the opening kickoff of a World Cup match and seeing a player intentionally kick the ball out of bounds. You may question the logic of surrendering possession seconds into a game. If you were Jesse Davis, though, you'd know that this play could be a prime setup to score. Davis is a professor of computer science at KU Leuven in Belgium and head of its Sports Analytics Lab, which has been at the vanguard of a data awakening in soccer. Using AI and data analytics, his team has uncovered hidden tactical patterns and challenged long-held assumptions about how the game should be played.
OpenBox: Annotate Any Bounding Boxes in 3D
Unsupervised and open-vocabulary 3D object detection has recently gained attention, particularly in autonomous driving, where reducing annotation costs and recognizing unseen objects are critical for both safety and scalability. However, most existing approaches uniformly annotate 3D bounding boxes, ignore objects' physical states, and require multiple self-training iterations for annotation refinement, resulting in suboptimal quality and substantial computational overhead. To address these challenges, we propose OpenBox, a two-stage automatic annotation pipeline that leverages a 2D vision foundation model. In the first stage, OpenBox associates instance-level cues from 2D images processed by a vision foundation model with the corresponding 3D point clouds via context-aware refinement. In the second stage, it categorizes instances by rigidity and motion state, then generates adaptive bounding boxes with class-specific size statistics. As a result, OpenBox produces high-quality 3D bounding box annotations without requiring self-training. Experiments on the Waymo Open Dataset (WOD), the Lyft Level 5 Perception dataset, and the nuScenes dataset demonstrate improved accuracy and efficiency over baselines.
Best Smart Chess Boards (2026): Chessnut, Millennium
I played the ultimate game of strategy on a variety of smart chess boards to find the best for online and in-person matches. Playing chess can be challenging, fun, and at times frustrating. Garry Kasparov called the game "mental torture." With virtually limitless possibilities, chess offers unparalleled depth, and you could easily fill a library with books on how to play it. The internet has opened up a wealth of potential competitors, and smart chess boards enable you to play anyone online or off, not to mention dabble in a variety of chess programs.
Signal Alums Reveal 'Encrypted Spaces,' a System for Making Private Collaboration Apps
The new open-source project could serve as the basis for a future of apps with features as complex as Slack, Discord, or Google Docs--but with added protection against surveillance. End-to-end encryption, in which data is encoded so that only users on either "end" of a conversation can decrypt their communications--and not the server that relays that information or any other interloper--has become the standard for modern privacy on the internet. But its very name suggests a kind of simple pipe with two openings. The metaphor, and often the encryption technology that has enabled that model, doesn't fit neatly onto the world of Slack, Discord, Google Docs, and the other multiuser, complex, collaborative software where people now live and work. So one group of cryptographers has built what they describe as the foundation for a new generation of end-to-end encrypted apps, with a new metaphor: Instead of a mere pipe, they want to create "spaces" where users can hold group conversations, host information on a server, collectively make changes to it, invite in new collaborators or kick them out, all while maintaining the same strong encryption protections that prevent the server or network eavesdroppers from accessing their data.
SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement
Generating music with coherent structure, harmonious instrumental and vocal elements remains a significant challenge in song generation. Existing language models and diffusion-based methods often struggle to balance global coherence with local fidelity, resulting in outputs that lack musicality or suffer from incoherent progression and mismatched lyrics. This paper introduces SongBloom, a novel framework for full-length song generation that leverages an interleaved paradigm of autoregressive sketching and diffusion-based refinement. SongBloom employs an autoregressive diffusion model that combines the high fidelity of diffusion models with the scalability of language models. Specifically, it gradually extends a musical sketch from short to long and refines the details from coarse to fine-grained. The interleaved generation paradigm effectively integrates prior semantic and acoustic context to guide the generation process. Experimental results demonstrate that SongBloom outperforms existing methods across both subjective and objective metrics and achieves performance comparable to the state-of-the-art commercial music generation platforms.
Eulerian Neural Network Informed by Chemical Transport for Air Quality Forecasting
Air pollution remains one of the most critical environmental challenges globally, posing severe threats to public health, ecological sustainability, and climate governance. While existing physics-based and data-driven models have made progress in air quality forecasting, they often struggle to jointly capture the complex spatiotemporal dynamics and ensure spatial continuity of pollutant distributions. In this study, we introduce CTENet, a novel chemical transport deep learning model that embeds the Advection-Diffusion-Reaction equation into a Physics-Informed Neural Network (PINN) framework using an Eulerian representation to model the spatiotemporal evolution of pollutants. Extensive experiments on two real-world datasets demonstrate that CTENet consistently outperforms state-of-the-art (SOTA) baselines, achieving a remarkable RMSE improvement of 45.8% on the USA dataset and 21.0% on the China dataset.
Anthropic v. OpenAI: Behind the bitter battle for the future of AI
The tension between OpenAI CEO Sam Altman and Anthropic CEO Dario Amodei is the driving force in today's biggest technological revolution. SAN FRANCISCO/NEW YORK - If not for the intense rivalry between Anthropic and OpenAI, the generative AI boom might not have arrived so quickly. In late 2022, OpenAI caught wind that Anthropic was working on an AI-powered chatbot. OpenAI CEO Sam Altman immediately directed employees to fast-track a competing product, four people familiar with the matter said. Two weeks later, the company released ChatGPT, sparking a technological revolution that promises to overhaul the global economy and the way humans interact.
What we know about US sea drone used in helicopter crew rescue mission
A sea drone was used to save two crew members of a downed US army helicopter off the coast of Oman earlier this week, according to the US military - making it the first publicly known instance of an unmanned vessel being used to conduct a rescue mission. President Donald Trump said the apache helicopter was shot down by Iran near the Strait of Hormuz - the dangerous waterway which has been largely blocked off to shipping since the start of the Iran war. The two soldiers were safely rescued within approximately two hours and are in stable condition, US Central Command (Centcom) said. BBC Verify has examined what we know about the drone boat and how the mission took place. What is the US sea drone?
Memory-Enhanced Neural Solvers for Routing Problems
Routing Problems are central to many real-world applications, yet remain challenging due to their (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off between quality and scalability, making them suitable for industrial use. While Reinforcement Learning (RL) offers a flexible framework for designing heuristics, its adoption over handcrafted heuristics remains incomplete. Existing learned methods still lack the ability to adapt to specific instances and fully leverage the available computational budget. Current best methods either rely on a collection of pre-trained policies, or on RL fine-tuning; hence failing to fully utilize newly available information within the constraints of the budget. In response, we present MEMENTO, an approach that leverages memory to improve the search of neural solvers at inference. MEMENTO updates the action distribution dynamically based on the outcome of previous decisions. We validate its effectiveness on Traveling Salesman and Capacitated Vehicle Routing problems, demonstrating its superiority over tree-search and policy-gradient fine-tuning; and showing that it can be zero-shot combined with diversity-based solvers. We successfully train all RL auto-regressive solvers on large instances, and verify MEMENTO's scalability and data-efficiency: pushing the state-of-the-art on 11 out of 12 evaluated tasks.