Technology
Object State Recognition Initial StatearT nsitioning State End State LLMPlease provide the initial, transitioning, and end states for slicing a lemon
Recognizing the physical states of objects and their transformations within videos is crucial for structured video understanding and enabling robust real-world applications, such as robotic manipulation. However, pretrained vision-language models often struggle to capture these nuanced dynamics and their temporal context, and specialized object state recognition frameworks may not generalize to unseen actions or objects. We introduce SAGE (State-Action Graph Embeddings), a novel framework that offers a unified model of physical state transitions by decomposing states into fine-grained, language-described visual concepts that are sharable across different objects and actions. SAGE initially leverages Large Language Models to construct a State-Action Graph, which is then multimodally refined using Vision-Language Models. Extensive experiments show that our method significantly outperforms baselines, generalizes effectively to unseen objects and actions in open-world settings. SAGE improves the prior state-of-the-art by as much as 14.6% on novel state recognition with less than 5% of its inference time.
Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback
Direct preference optimization (DPO) methods have shown strong potential in aligning text-to-image diffusion models with human preferences by training on paired comparisons. These methods improve training stability by avoiding the REINFORCE algorithm but still struggle with challenges such as accurately estimating image probabilities due to the non-linear nature of the sigmoid function and the limited diversity of offline datasets. In this paper, we introduce Diffusion Denoising Ranking Optimization (Diffusion-DRO), a new preference learning framework grounded in inverse reinforcement learning. Diffusion-DRO removes the dependency on a reward model by casting preference learning as a ranking problem, thereby simplifying the training objective into a denoising formulation and overcoming the non-linear estimation issues found in prior methods. Moreover, Diffusion-DRO uniquely integrates offline expert demonstrations with online policy-generated negative samples, enabling it to effectively capture human preferences while addressing the limitations of offline data. Comprehensive experiments show that Diffusion-DRO delivers improved generation quality across a range of challenging and unseen prompts, outperforming state-of-the-art baselines in both both quantitative metrics and user studies.
Israel launches fresh strikes on Lebanon despite Trump criticism
Israeli forces have carried out new strikes in southern Lebanon, state media say, despite renewed criticism from US President Donald Trump of Israel's actions in the country. Israeli drone strikes injured several people in Mansouri and Aaziyyeh on Wednesday, while jets attacked Nabatieh al-Fawqa and Kfar Tebnit, Lebanon's National News Agency reported. Israel's military has not commented, but it did say five soldiers were injured in a drone attack in Lebanon by the Iran-backed armed group Hezbollah. Mediator Pakistan has said the deal between the US and Iran to end the war includes Lebanon. On Tuesday, Trump said Israel's prime minister needed to be more responsible with respect to Lebanon.
Will it take a 'Chernobyl-scale disaster' for us to regulate cyber weapons of mass destruction? Stuart Russell
'The CEOs are telling us, "We're on track to create superhuman intelligence, which has a good chance of causing human extinction."' 'The CEOs are telling us, "We're on track to create superhuman intelligence, which has a good chance of causing human extinction."' Will it take a'Chernobyl-scale disaster' for us to regulate cyber weapons of mass destruction? T he AI company Anthropic has been making major headlines recently. Its trillion-dollar IPO plan and its blood feud with secretary of defense Pete Hegseth have attracted much attention, but two other events may be even more consequential.
Interactive. Violent. Gross. Inside Fishtank, the Unhinged Future of Reality TV
WIRED goes on location--and on camera--with the cult hit. On March 16, 2026, at 5:45 pm in a leafy suburb of Atlanta called Sandy Springs, police pound on the door of a neglected French Country-style mansion, rifles at the ready, bodycams rolling. Minutes earlier, a distress call came from someone claiming to be hiding from a gunman in the mansion's downstairs bathroom. The dispatcher heard a gunshot ring out in the distance, then the line disconnected. "Open the door!" an officer yells. A calm young man with a mullet and woolly eyebrows steps out, hands raised. The police ask him who else is in the house. "Just my friends," he replies, as seven other young people, men and women, silently file out behind him, less evidently relaxed. They remain outside while two officers search the house. Inside the mansion there are no immediate signs of a massacre, but the decor alone arouses suspicion. All of the windows are frosted over, so only a chilly light leaks in. The place is a mess, and the walls are adorned with lurid, seemingly AI-generated art: a frowning baby holding an assault rifle, a rubber ducky bobbing in a mug of what looks like black coffee, a lidless and levitating eyeball crying into a martini glass. The rooms are painted primary colors, grass green and cherry red, like a kindergarten class. A vape dangles from a doorframe by a chain, suspended at mouth level. The pantry is practically empty. The bedroom is a dormitory featuring seven identical twin beds. No one is hiding in the bathroom. The call, it seems, was a prank. The police return to the driveway and ask, "What is it that you guys are doing here?" "We're just livestreaming," says a man in a camo hat named Matt. "You guys don't have any firearms or anything inside the house?" There are guns in the house, Matt says, for self-defense. Fans of their livestream can be obsessive, he explains, and tend to have perverse ideas about jokes. The officer asks to see their weapons, and they go downstairs. The room is cluttered with ergonomic swivel chairs, desks strewn with takeout containers and energy drinks, two flatscreen TVs, and a dozen computer monitors.
Fair Matroid Selection
We investigate the problem of sequentially selecting elements of an unknown matroid in an online manner to form an independent set, with the goal of maximizing the minimum probability of acceptance across all elements, a property we define as f-fairness. Under adversarial arrival orders, we design an α(lnk + 1)-fair algorithm, where α is the arboricity of the matroid and k is the rank, a result that is nearly optimal. For laminar matroids, we develop a (2α 1)-fair algorithm, which is optimal up to constant factors, achieved through a novel online coloring scheme. In the random arrival order setting, we achieve a (4+o(1))α-fair algorithm for graphic matroids, matching the optimal result up to constant factors, relying on a novel technique for learning a degeneracy ordering using a sampled subset of edges. We further generalize our result to p-matchoids, obtaining a β(plnk + 1)-fair algorithm for the adversarial arrival model, where β is the optimal offline fairness. Notably, all our results can be extended to a setting with no prior knowledge of the matroid with only a logarithmic increase in the fairness factor.
Riemannian Flow Matching for Brain Connectivity Matrices via Pullback Geometry
Generating realistic brain connectivity matrices is key to analyzing population heterogeneity in brain organization, understanding disease, and augmenting data in challenging classification problems. Functional connectivity matrices lie in constrained spaces--such as the set of symmetric positive definite or correlation matrices--that can be modeled as Riemannian manifolds. However, using Riemannian tools typically requires redefining core operations (geodesics, norms, integration), making generative modeling computationally inefficient. In this work, we propose DIFFEOCFM, an approach that enables conditional flow matching (CFM) on matrix manifolds by exploiting pullback metrics induced by global diffeomorphisms on Euclidean spaces. We show that Riemannian CFM with such metrics is equivalent to applying standard CFM after data transformation. This equivalence allows efficient vector field learning, and fast sampling with standard ODE solvers.
Density Ratio-Free Doubly Robust Proxy Causal Learning
We study the problem of causal function estimation in the Proxy Causal Learning (PCL) framework, where confounders are not observed but proxies for the confounders are available. Two main approaches have been proposed: outcome bridge-based and treatment bridge-based methods. In this work, we propose two kernel-based doubly robust estimators that combine the strengths of both approaches, and naturally handle continuous and high-dimensional variables. Our identification strategy builds on a recent density ratio-free method for treatment bridge-based PCL; furthermore, in contrast to previous approaches, it does not require indicator functions or kernel smoothing over the treatment variable. These properties make it especially well-suited for continuous or high-dimensional treatments. By using kernel mean embeddings, we propose the first density-ratio free doubly robust estimators for proxy causal learning, which have closed form solutions and strong uniform consistency guarantees. Our estimators outperform existing methods on PCL benchmarks, including a prior doubly robust method that requires both kernel smoothing and density ratio estimation.
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Multimodal agents, which integrate a controller (e.g., a vision language model) with external tools, have demonstrated remarkable capabilities in tackling complex multimodal tasks. Existing approaches for training these agents, both supervised fine-tuning and reinforcement learning, depend on extensive human-annotated taskanswer pairs and tool trajectories. However, for complex multimodal tasks, such annotations are prohibitively expensive or impractical to obtain. In this paper, we propose an iterative tool usage exploration method for multimodal agents without any pre-collected data, namely SPORT, via step-wise preference optimization to refine the trajectories of tool usage. Our method enables multimodal agents to autonomously discover effective tool usage strategies through self-exploration and optimization, eliminating the bottleneck of human annotation.