Industry
Reward-oriented Causal Representation Learning
Causal representation learning (CRL) is the process of disentangling the latent low-dimensional causally-related generating factors underlying high-dimensional observable data. Extensive recent studies have characterized CRL identifiability and perfect recovery of the latent variables and their attendant causal graph. This paper introduces the notion of reward-oriented CRL, the purpose of which is to move away from perfectly learning the latent representation and instead learning it to the extent needed for optimizing a desired downstream task (reward). In reward-oriented CRL, perfectly learning the latent representation can be excessive; instead, it must be learned at the coarsest level sufficient for optimizing the desired task. Reward-oriented CRL is formalized as the optimization of a desired function of the observable data over the space of all possible interventions and focuses on linear causal and transformation models. To sequentially identify the optimal subset of interventions, an adaptive exploration algorithm is designed that learns the latent causal graph and the variables needed to identify the best intervention. It is shown that for an n-dimensional latent space and a d-dimensional observation space, over a horizon T the algorithm's regret scales as O(d
Paramount Refused to Air an Ad Criticizing Its Merger With Warner Bros.
The commercial was submitted by the Freedom of the Press Foundation to run during Donald Trump's UFC event. It criticized the $111 billion merger as a threat to the First Amendment. Viewers who tuned into the Paramount+ livestream of UFC Freedom 250 on Sunday night, held to mark President Trump' s 80th birthday as well as the nation's semiquincentennial, were treated to the surreal spectacle of mixed martial artists beating each other bloody in a massive cage installed on the White House lawn. But there was one bruising blow they missed: an advertisement blasting the $111 billion merger agreement between Paramount Skydance and Warner Bros. Discovery . That's because Paramount refused to air the ad, according to Freedom of the Press Foundation, the nonprofit advocacy group that submitted it to run during the event.
Microsoft knows its new Surface PCs are expensive. That's the point
Microsoft launches Surface Pro 12 and Surface Laptop 8 with Snapdragon X2 processors, starting at $1,499 and $1,599 respectively, marking significant price increases from previous models. PCWorld reports Microsoft's strategy focuses on premium Windows-on-Arm devices rather than competing across all price points like other PC vendors. The new Surface models feature improved graphics performance, enhanced webcams, and long battery life, positioning Microsoft to compete directly with Apple's premium laptops. The Microsoft Surface premium: for years, laptop buyers have criticized Microsoft for charging more and delivering less. Now Microsoft is preparing to ship the Surface Laptop 8 as well as the Surface Pro 12 with Qualcomm Snapdragon X2 processors inside.
Elon Musk's unprecendented accumulation of wealth
IPO mints Musk as world's first trillionaire - now SpaceX is public, it will be harder than ever not to have a stake in its future I'm filling in for your usual host Blake Montgomery, who is out this week on vacation. Today, we'll be talking about the historic SpaceX IPO and the US government's surprise order to limit the use of Anthropic's most advanced AI model over cybersecurity concerns. Elon Musk's SpaceX hit the market on Friday in the biggest IPO of all time, raising $85.7bn and easily shattering the previous record of $29.4bn set by the Saudi oil giant Aramco. The rocket, AI and satellite communications company ended the day at $160.95 per share, up from its IPO price of $135 and satisfying any Wall Street skepticism over the unorthodox rollout of the stock. SpaceX's successful market debut turned Musk into the world's first trillionaire, an unprecedented accumulation of wealth that supporters touted as a testament to his financial genius and critics denounced as a symbol of a broken economic system.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We train a proof-of-concept model from scratch with 3.5 billion parameters and 800 billion tokens. We show that this model can effortlessly use varying levels of compute, significantly improving with additional compute especially on reasoning tasks, such as math and coding. Further, this architecture naturally reduces compute costs via zero-shot per-token adaptive compute, KV-cache sharing and speculative decoding.
GUIDED: Granular Understanding via Identification, Detection, and Discrimination for Fine-Grained Open-Vocabulary Object Detection
Fine-grained open-vocabulary object detection (FG-OVD) aims to detect novel object categories described by attribute-rich texts. While existing open-vocabulary detectors show promise at the base-category level, they underperform in fine-grained settings due to the semantic entanglement of subjects and attributes in pretrained vision-language model (VLM) embeddings - leading to over-representation of attributes, mislocalization, and semantic drift in embedding space. We propose GUIDED, a decomposition framework specifically designed to address the semantic entanglement between subjects and attributes in fine-grained prompts. By separating object localization and fine-grained recognition into distinct pathways, GUIDED aligns each subtask with the module best suited for its respective roles. Specifically, given a fine-grained class name, we first use a language model to extract a coarse-grained subject and its descriptive attributes. Then the detector is guided solely by the subject embedding, ensuring stable localization unaffected by irrelevant or overrepresented attributes. To selectively retain helpful attributes, we introduce an attribute embedding fusion module that incorporates attribute information into detection queries in an attention-based manner.
Unified Transferability Metrics for Time Series Foundation Models
With the increasing number of time series pre-trained models, designing transferability evaluation metrics for time series has become an urgent problem to address. While transferability evaluation has been extensively studied in computer vision, we aim to address a critical gap by developing tailored metrics for time series analysis. In this paper, we introduce TEMPLATE, a transferability estimation framework specifically tailored for versatile time series analysis, comprising three complementary metrics: (1) Dependency Learning Score quantifies a model's capacity to capture temporal dependencies.
Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions
Recent advances in generative artificial intelligence (GenAI) models have enabled the generation of personalized content that adapts to up-to-date user context. While personalized decision systems are often modeled using bandit formulations, the integration of GenAI introduces new structure into otherwise classical sequential learning problems. In GenAI-powered interventions, the agent selects a query, but the environment experiences a stochastic response drawn from the generative model. Standard bandit methods do not explicitly account for this structure, where actions influence rewards only through stochastic, observed treatments. We introduce generator-mediated bandit-Thompson sampling (GAMBITTS), a bandit approach designed for this action/treatment split, using mobile health interventions with large language model-generated text as a motivating case study. GAMBITTS explicitly models both the treatment and reward generation processes, using information in the delivered treatment to accelerate policy learning relative to standard methods. We establish regret bounds for GAMBITTS by decomposing sources of uncertainty in treatment and reward, identifying conditions where it achieves stronger guarantees than standard bandit approaches. In simulation studies, GAMBITTS consistently outperforms conventional algorithms by leveraging observed treatments to more accurately estimate expected rewards.