Industry
Google One charges forever -- Internxt's 10TB lifetime plan is 269.97 once during Deal Days
When you purchase through links in our articles, we may earn a small commission. Google One charges forever -- Internxt's 10TB lifetime plan is $269.97 Internxt takes your privacy seriously. Files are encrypted on your device before upload, then split into fragments and distributed across secure servers, meaning Internxt itself has zero ability to read your data. It runs on a zero-knowledge model, is fully open source, meets GDPR standards, and has been independently audited.
One 20 payment gets you the full Office suite and zero subscription stress during this Deal Days sale
When you purchase through links in our articles, we may earn a small commission. Lifetime access to Microsoft Office Professional Plus 2019 for Windows is $19.97 (MSRP $229) during Deal Days -- one payment, lifetime access, no subscription required. Paying monthly for Office can feel unnecessary if all you really need are the core apps you use every day. Instead of renting software, you can own it outright with a one-time purchase. During Deal Days, our answer to Prime Day running through June 28, Microsoft Office Professional Plus 2019 for Windows is available for $19.97 (MSRP $229), giving you permanent access to Word, Excel, PowerPoint, Outlook, OneNote, Publisher, and Access with no recurring fees.
TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning
In-context learning, the ability of large language models to perform tasks using only examples provided in the prompt, has recently been adapted for time series forecasting. This paradigm enables zero-shot prediction, where past values serve as context for forecasting future values, making powerful forecasting tools accessible to non-experts and increasing the performance when training data are scarce. Most existing zero-shot forecasting approaches rely on transformer architectures, which, despite their success in language, often fall short of expectations in time series forecasting, where recurrent models like LSTMs frequently have the edge. Conversely, while LSTMs are well-suited for time series modeling due to their state-tracking capabilities, they lack strong in-context learning abilities. We introduce TiRex that closes this gap by leveraging xLSTM, an enhanced LSTM with competitive in-context learning skills. Unlike transformers, state-space models, or parallelizable RNNs such as RWKV, TiRex retains state-tracking, a critical property for long-horizon forecasting. To further facilitate its state-tracking ability, we propose a training-time masking strategy called CPM. TiRex sets a new state of the art in zero-shot time series forecasting on the HuggingFace benchmarks GiftEval and Chronos-ZS, outperforming significantly larger models including TabPFN-TS (Prior Labs), Chronos Bolt (Amazon), TimesFM (Google), and Moirai (Salesforce) across both short-and long-term forecasts.
Efficient Allocation of Working Memory Resource for Utility Maximization in Humans and Recurrent Neural Networks
Working memory (WM) supports the temporary retention of task-relevant information. It is limited in capacity and inherently noisy. The ability to flexibly allocate WM resource is a hallmark of adaptive behavior. While it is well established that WM resource can be prioritized via selective attention, whether they can be allocated based on reward incentive alone remains under debate--raising open questions about whether humans can efficiently allocate WM resource based on utility. To address this, we conducted behavioral experiments using orientations as stimuli.
CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing
Private large language model (LLM) inference based on cryptographic primitives offers a promising path towards privacy-preserving deep learning. However, existing frameworks only support dense LLMs like LLaMA-1 and struggle to scale to mixture-of-experts (MoE) architectures. The key challenge comes from securely evaluating the dynamic routing mechanism in MoE layers, which may reveal sensitive input information if not fully protected. In this paper, we propose CryptoMoE, the first framework that enables private, efficient, and accurate inference for MoE-based models. CryptoMoE balances expert loads to protect expert routing information and proposes novel protocols for secure expert dispatch and combine. CryptoMoE also develops a confidence-aware token selection strategy and a batch matrix multiplication protocol to improve accuracy and efficiency further.
Continual Release Moment Estimation with Differential Privacy
We propose Joint Moment Estimation (JME), a method for continually and privately estimating both the first and second moments of a data stream with reduced noise compared to naive approaches. JME supports the matrix mechanism and exploits a joint sensitivity analysis to identify a privacy regime in which the second-moment estimation incurs no additional privacy cost, thereby improving accuracy while maintaining privacy. We demonstrate JME's effectiveness in two applications: estimating the running mean and covariance matrix for Gaussian density estimation and model training with DP-Adam.
One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding
Long video understanding is inherently challenging for vision-language models (VLMs) because of the extensive number of frames. With each video frame typically expanding into tens or hundreds of tokens, the limited context length of large language models (LLMs) forces the VLMs to perceive the frames sparsely and lose temporal information. To address this, we explore extreme video token compression towards one token per frame at the final LLM layer. Our key insight is that heuristic-based compression, widely adopted by previous methods, is prone to information loss, and this necessitates supervising LLM layers into learnable and progressive modules for token-level compression (LP-Comp). Such compression enables our VLM to digest 2x-4x more frames with improved performance. To further increase the token efficiency, we investigate frame-level compression, which selects the frames most relevant to the queries via the internal attention scores of the LLM layers, named question-conditioned compression (QC-Comp). As a notable distinction from previous studies, we mitigate the position bias of LLM attention in long contexts, i.e., the over-concentration on the beginning and end of a sequence, by splitting long videos into short segments and employing local attention. Collectively, our combined token-level and frame-level leads to an extreme compression model for long video understanding, named XComp, achieving a significantly larger compression ratio and enabling denser frame sampling. Our XComp is finetuned from VideoChat-Flash with a data-efficient supervised compression tuning stage that only requires 2.5% of the supervised fine-tuning data, yet boosts the accuracy from 42.9% to 46.2% on LVBench and enhances multiple other long video benchmarks.
State-Covering Trajectory Stitching for Diffusion Planners
Diffusion-based generative models are emerging as powerful tools for long-horizon planning in reinforcement learning (RL), particularly with offline datasets. However, their performance is fundamentally limited by the quality and diversity of training data. This often restricts their generalization to tasks outside their training distribution or longer planning horizons. To overcome this challenge, we propose State-Covering Trajectory Stitching (SCoTS), a novel reward-free trajectory augmentation method that incrementally stitches together short trajectory segments, systematically generating diverse and extended trajectories. SCoTS first learns a temporal distance-preserving latent representation that captures the underlying temporal structure of the environment, then iteratively stitches trajectory segments guided by directional exploration and novelty to effectively cover and expand this latent space. We demonstrate that SCoTS significantly improves the performance and generalization capabilities of diffusion planners on offline goal-conditioned benchmarks requiring stitching and long-horizon reasoning. Furthermore, augmented trajectories generated by SCoTS significantly improve the performance of widely used offline goal-conditioned RL algorithms across diverse environments. Our code is available at https://github.com/leekwoon/scots/
On Evaluating Policies for Robust POMDPs
Robust partially observable Markov decision processes (RPOMDPs) model sequential decision-making problems under partial observability, where an agent must be robust against a range of dynamics. RPOMDPs can be viewed as a two-player game between an agent, who selects actions, and nature, who adversarially selects the dynamics. Evaluating an agent policy requires finding an adversarial nature policy, which is computationally challenging. In this paper, we advance the evaluation of agent policies for RPOMDPs in three ways. First, we discuss suitable benchmarks.
Appendix412 Table of Contents
Starting from Grobid's XML output, peS2o filters papers that are too short, have453 incorrect metadata, are in languages other than English, and contain OCR errors using a combination454 of heuristic-and model-based filtering steps. We refer the reader to the datasheet and code for more455 details on this processing pipeline.456 The subset of peS2o included in the Common Pile starts from v3 of the corpus, which contains457 documents from January 1, 1970 to October 6, 2024. We retain full-text papers with CCBY,458 CCBY-SA, or CC0 licenses, or that have been labeled as public domain; metadata is provided459 by the Semantic Scholar APIs [85]. After filtering, this set contains 6.3 million papers, or 35.7460 billion whitespace-separated segments.