Antarctica
ZeroMerge: Parameter-Free KV Cache Compression for Memory-Efficient Long-Context LLMs
Liu, Xin, Liu, Pei, Tang, Guoming
The linear growth of key-value (KV) cache memory and quadratic computational complexity pose significant bottlenecks for large language models (LLMs) in long-context processing. While existing KV cache optimization methods address these challenges through token pruning or feature merging, they often suffer from irreversible information loss or require costly parameter retraining. We propose ZeroMerge, a dynamic zero-shot compression framework that achieves efficient cache management through three key innovations: (1) Fine-grained memory allocation guided by multi-dimensional token importance metrics at head-level granularity, (2) A residual merging mechanism that preserves critical context through compensated attention scoring, and (3) Parameter-free adaptation compatible with diverse LLM architectures without retraining. Comprehensive evaluations across LLaMA-2 model demonstrate that ZeroMerge maintains full-cache performance at 5\% compression ratios while doubling inference throughput at 40K token lengths. The method effectively balances memory efficiency, generation quality, and deployment flexibility, advancing practical long-context LLM applications. The code is available at https://github.com/SusCom-Lab/ZeroMerge.
Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling
Kim, Subin, Oh, Seoung Wug, Wang, Jui-Hsien, Lee, Joon-Young, Shin, Jinwoo
While recent advancements in text-to-video diffusion models enable high-quality short video generation from a single prompt, generating real-world long videos in a single pass remains challenging due to limited data and high computational costs. To address this, several works propose tuning-free approaches, i.e., extending existing models for long video generation, specifically using multiple prompts to allow for dynamic and controlled content changes. However, these methods primarily focus on ensuring smooth transitions between adjacent frames, often leading to content drift and a gradual loss of semantic coherence over longer sequences. To tackle such an issue, we propose Synchronized Coupled Sampling (SynCoS), a novel inference framework that synchronizes denoising paths across the entire video, ensuring long-range consistency across both adjacent and distant frames. Our approach combines two complementary sampling strategies: reverse and optimization-based sampling, which ensure seamless local transitions and enforce global coherence, respectively. However, directly alternating between these samplings misaligns denoising trajectories, disrupting prompt guidance and introducing unintended content changes as they operate independently. To resolve this, SynCoS synchronizes them through a grounded timestep and a fixed baseline noise, ensuring fully coupled sampling with aligned denoising paths. Extensive experiments show that SynCoS significantly improves multi-event long video generation, achieving smoother transitions and superior long-range coherence, outperforming previous approaches both quantitatively and qualitatively.
SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation
Lu, Chen Yi, Tanjim, Md Mehrab, Dasgupta, Ishita, Sarkhel, Somdeb, Wu, Gang, Mitra, Saayan, Chaterji, Somali
We present SKALD, a multi-shot video assembly method that constructs coherent video sequences from candidate shots with minimal reliance on text. Central to our approach is the Learned Clip Assembly (LCA) score, a learning-based metric that measures temporal and semantic relationships between shots to quantify narrative coherence. We tackle the exponential complexity of combining multiple shots with an efficient beam-search algorithm guided by the LCA score. To train our model effectively with limited human annotations, we propose two tasks for the LCA encoder: Shot Coherence Learning, which uses contrastive learning to distinguish coherent and incoherent sequences, and Feature Regression, which converts these learned representations into a real-valued coherence score. We develop two variants: a base SKALD model that relies solely on visual coherence and SKALD-text, which integrates auxiliary text information when available. Experiments on the VSPD and our curated MSV3C datasets show that SKALD achieves an improvement of up to 48.6% in IoU and a 43% speedup over the state-of-the-art methods. A user study further validates our approach, with 45% of participants favoring SKALD-assembled videos, compared to 22% preferring text-based assembly methods.
Multi-Robot Collaboration through Reinforcement Learning and Abstract Simulation
Labiosa, Adam, Hanna, Josiah P.
Teams of people coordinate to perform complex tasks by forming abstract mental models of world and agent dynamics. The use of abstract models contrasts with much recent work in robot learning that uses a high-fidelity simulator and reinforcement learning (RL) to obtain policies for physical robots. Motivated by this difference, we investigate the extent to which so-called abstract simulators can be used for multi-agent reinforcement learning (MARL) and the resulting policies successfully deployed on teams of physical robots. An abstract simulator models the robot's target task at a high-level of abstraction and discards many details of the world that could impact optimal decision-making. Policies are trained in an abstract simulator then transferred to the physical robot by making use of separately-obtained low-level perception and motion control modules. We identify three key categories of modifications to the abstract simulator that enable policy transfer to physical robots: simulation fidelity enhancements, training optimizations and simulation stochasticity. We then run an empirical study with extensive ablations to determine the value of each modification category for enabling policy transfer in cooperative robot soccer tasks. We also compare the performance of policies produced by our method with a well-tuned non-learning-based behavior architecture from the annual RoboCup competition and find that our approach leads to a similar level of performance. Broadly we show that MARL can be use to train cooperative physical robot behaviors using highly abstract models of the world.
Lossy Neural Compression for Geospatial Analytics: A Review
Gomes, Carlos, Wittmann, Isabelle, Robert, Damien, Jakubik, Johannes, Reichelt, Tim, Martone, Michele, Maurogiovanni, Stefano, Vinge, Rikard, Hurst, Jonas, Scheurer, Erik, Sedona, Rocco, Brunschwiler, Thomas, Kesselheim, Stefan, Batic, Matej, Stier, Philip, Wegner, Jan Dirk, Cavallaro, Gabriele, Pebesma, Edzer, Marszalek, Michael, Belenguer-Plomer, Miguel A, Adriko, Kennedy, Fraccaro, Paolo, Kienzler, Romeo, Briq, Rania, Benassou, Sabrina, Lazzarini, Michele, Albrecht, Conrad M
Over the past decades, there has been an explosion in the amount of available Earth Observation (EO) data. The unprecedented coverage of the Earth's surface and atmosphere by satellite imagery has resulted in large volumes of data that must be transmitted to ground stations, stored in data centers, and distributed to end users. Modern Earth System Models (ESMs) face similar challenges, operating at high spatial and temporal resolutions, producing petabytes of data per simulated day. Data compression has gained relevance over the past decade, with neural compression (NC) emerging from deep learning and information theory, making EO data and ESM outputs ideal candidates due to their abundance of unlabeled data. In this review, we outline recent developments in NC applied to geospatial data. We introduce the fundamental concepts of NC including seminal works in its traditional applications to image and video compression domains with focus on lossy compression. We discuss the unique characteristics of EO and ESM data, contrasting them with "natural images", and explain the additional challenges and opportunities they present. Moreover, we review current applications of NC across various EO modalities and explore the limited efforts in ESM compression to date. The advent of self-supervised learning (SSL) and foundation models (FM) has advanced methods to efficiently distill representations from vast unlabeled data. We connect these developments to NC for EO, highlighting the similarities between the two fields and elaborate on the potential of transferring compressed feature representations for machine--to--machine communication. Based on insights drawn from this review, we devise future directions relevant to applications in EO and ESM.
On the Development of Binary Classification Algorithm Based on Principles of Geometry and Statistical Inference
The aim of this paper is to investigate an attempt to build a binary classification algorithm using principles of geometry such as vectors, planes, and vector algebra. The basic idea behind the proposed algorithm is that a hyperplane can be used to completely separate a given set of data points mapped to n dimensional space, if the given data points are linearly separable in the n dimensions. Since points are the foundational elements of any geometrical construct, by manipulating the position of points used for the construction of a given hyperplane, the position of the hyperplane itself can be manipulated. The paper includes testing data against other classifiers on a variety of standard machine learning datasets. With a focus on support vector machines, since they and our proposed classifier use the same geometrical construct of hyperplane, and the versatility of SVMs make them a good bench mark for comparison. Since the algorithm focuses on moving the points through the hyperspace to which the dataset has been mapped, it has been dubbed as moving points algorithm.
Haiti police raid gang leader's stronghold in capital
Haiti police raid gang leader's stronghold in capital 3 hours agoShareSaveLeonardo RochaBBC World Service Americas regional editor Jaroslav LukivBBC NewsShareSaveReutersGang control in Port-au-Prince has led to an almost complete breakdown of law and order The government of Haiti says police have launched a large-scale operation in a shantytown controlled by powerful gang leader Jimmy Chérizier, who is widely known as Barbecue. The authorities say several gang members have been killed in the Lower Delmas area of the capital Port-au-Prince. Local reports say military drones carrying explosives are being used in the operation. He said it was the work of a special task force created two days ago to tackle insecurity.Reuters Jimmy'Barbecue' Chérizier has become one of the most powerful gang leaders in Haiti Chérizier, aged 47, is the feared leader of Viv Ansam (Live Together), a coalition of gangs that control much of the city. It is not clear whether Kenyan police officers deployed in Haiti last year to help fight the gangs are involved in the security operation.
Societal Alignment Frameworks Can Improve LLM Alignment
Stańczak, Karolina, Meade, Nicholas, Bhatia, Mehar, Zhou, Hattie, Böttinger, Konstantin, Barnes, Jeremy, Stanley, Jason, Montgomery, Jessica, Zemel, Richard, Papernot, Nicolas, Chapados, Nicolas, Therien, Denis, Lillicrap, Timothy P., Marasović, Ana, Delacroix, Sylvie, Hadfield, Gillian K., Reddy, Siva
Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.
Assassin's Creed maker confirms leaked game footage is real
Assassin's Creed maker confirms leaked game footage is real 38 minutes agoTom RichardsonBBC NewsbeatUbisoftAssassin's Creed Shadows is seen as a pivotal release for Ubisoft The makers of Assassin's Creed Shadows - the forthcoming entry in one of video gaming's biggest franchises - have confirmed footage leaked online is real. Some players managed to get their hands on the game - due to be released on 20 March - ahead of its official release. Developer Ubisoft said gameplay videos shared online "did not represent the final quality of the game". In a statement posted online, the company said it was "still working on patches" and urged fans not to share spoilers. Shadows will be the first Assassin's Creed instalment set in Japan - something fans have long been asking for.
Complex Networks for Pattern-Based Data Classification
Chire, Josimar, Mahmood, Khalid, Liang, Zhao
Data classification techniques partition the data or feature space into smaller sub-spaces, each corresponding to a specific class. To classify into subspaces, physical features e.g., distance and distributions are utilized. This approach is challenging for the characterization of complex patterns that are embedded in the dataset. However, complex networks remain a powerful technique for capturing internal relationships and class structures, enabling High-Level Classification. Although several complex network-based classification techniques have been proposed, high-level classification by leveraging pattern formation to classify data has not been utilized. In this work, we present two network-based classification techniques utilizing unique measures derived from the Minimum Spanning Tree and Single Source Shortest Path. These network measures are evaluated from the data patterns represented by the inherent network constructed from each class. We have applied our proposed techniques to several data classification scenarios including synthetic and real-world datasets. Compared to the existing classic high-level and machine-learning classification techniques, we have observed promising numerical results for our proposed approaches. Furthermore, the proposed models demonstrate the following distinguished features in comparison to the previous high-level classification techniques: (1) A single network measure is introduced to characterize the data pattern, eliminating the need to determine weight parameters among network measures. Therefore, the model is largely simplified, while obtaining better classification results. (2) The metrics proposed are sensitive and used for classification with competitive results.