Media
GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay
Zhang, Yunan, Jiang, Shuoran, Zhao, Mengchen, Li, Yuefeng, Fan, Yang, Wu, Xiangping, Chen, Qingcai
Abstract--The continual learning capability of large language models (LLMs) is crucial for advancing artificial general intelligence. However, continual fine-tuning LLMs across various domains often suffers from catastrophic forgetting, characterized by: 1) significant forgetting of their general capabilities, and 2) sharp performance declines in previously learned tasks. T o simultaneously address both issues in a simple yet stable manner, we propose General Sample Replay (GeRe), a framework that use usual pretraining texts for efficient anti-forgetting. Beyond revisiting the most prevalent replay-based practices under GeRe, we further leverage neural states to introduce a enhanced activation states constrained optimization method using threshold-based margin (TM) loss, which maintains activation state consistency during replay learning. We are the first to validate that a small, fixed set of pre-collected general replay samples is sufficient to resolve both concerns--retaining general capabilities while promoting overall performance across sequential tasks. Indeed, the former can inherently facilitate the latter. Through controlled experiments, we systematically compare TM with different replay strategies under the GeRe framework, including vanilla label fitting, logit imitation via KL divergence and feature imitation via L1/L2 losses. Results demonstrate that TM consistently improves performance and exhibits better robustness. Our work paves the way for efficient replay of LLMs for the future. Our code and data are available at https://github.com/Qznan/GeRe. The finetuned model forgets its original world knowledge or basic instruction-following skills [1], [2]. Y unan Zhang, Shuoran Jiang, Y ang Fan, Mengchen Zhao, Xiangping Wu, Qingcai Chen are with the Department of Computer Science and T echnology, Harbin Institute of T echnology, Shenzhen, China. Qingcai Chen and Xiangping Wu are the corresponding authors.
Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs
Ziems, Noah, Soylu, Dilara, Agrawal, Lakshya A, Miller, Isaac, Lai, Liheng, Qian, Chen, Song, Kaiqiang, Jiang, Meng, Klein, Dan, Zaharia, Matei, D'Oosterlinck, Karel, Potts, Christopher, Khattab, Omar
Group Relative Policy Optimization (GRPO) has proven to be an effective tool for post-training language models (LMs). However, AI systems are increasingly expressed as modular programs that mix together multiple LM calls with distinct prompt templates and other tools, and it is not clear how best to leverage GRPO to improve these systems. We begin to address this challenge by defining mmGRPO, a simple multi-module generalization of GRPO that groups LM calls by module across rollouts and handles variable-length and interrupted trajectories. We find that mmGRPO, composed with automatic prompt optimization, improves accuracy by 11% on average across classification, many-hop search, and privacy-preserving delegation tasks against the post-trained LM, and by 5% against prompt optimization on its own. We open-source mmGRPO in DSPy as the dspy.GRPO optimizer.
Deep Learning-based Scalable Image-to-3D Facade Parser for Generating Thermal 3D Building Models
Yu, Yinan, Gonzalez-Caceres, Alex, Scheidegger, Samuel, Somanath, Sanjay, Hollberg, Alexander
Renovating existing buildings is essential for climate impact. Early-phase renovation planning requires simulations based on thermal 3D models at Level of Detail (LoD) 3, which include features like windows. However, scalable and accurate identification of such features remains a challenge. This paper presents the Scalable Image-to-3D Facade Parser (SI3FP), a pipeline that generates LoD3 thermal models by extracting geometries from images using both computer vision and deep learning. Unlike existing methods relying on segmentation and projection, SI3FP directly models geometric primitives in the orthographic image plane, providing a unified interface while reducing perspective distortions. SI3FP supports both sparse (e.g., Google Street View) and dense (e.g., hand-held camera) data sources. Tested on typical Swedish residential buildings, SI3FP achieved approximately 5% error in window-to-wall ratio estimates, demonstrating sufficient accuracy for early-stage renovation analysis. The pipeline facilitates large-scale energy renovation planning and has broader applications in urban development and planning.
Are Inherently Interpretable Models More Robust? A Study In Music Emotion Recognition
Hoedt, Katharina, Flexer, Arthur, Widmer, Gerhard
One of the desired key properties of deep learning models is the ability to generalise to unseen samples. When provided with new samples that are (perceptually) similar to one or more training samples, deep learning models are expected to produce correspondingly similar outputs. Models that succeed in predicting similar outputs for similar inputs are often called robust. Deep learning models, on the other hand, have been shown to be highly vulnerable to minor (adversarial) perturbations of the input, which manage to drastically change a model's output and simultaneously expose its reliance on spurious correlations. In this work, we investigate whether inherently interpretable deep models, i.e., deep models that were designed to focus more on meaningful and interpretable features, are more robust to irrelevant perturbations in the data, compared to their black-box counterparts. We test our hypothesis by comparing the robustness of an interpretable and a black-box music emotion recognition (MER) model when challenged with adversarial examples. Furthermore, we include an adversarially trained model, which is optimised to be more robust, in the comparison. Our results indicate that inherently more interpretable models can indeed be more robust than their black-box counterparts, and achieve similar levels of robustness as adversarially trained models, at lower computational cost.
Fox News Politics Newsletter: Mamdani's 'Radical Positions' Alarm New Yorkers, Says Expert
Welcome to the Fox News Politics newsletter, with the latest updates on the Trump administration, Capitol Hill and more Fox News politics content. New York City socialist mayoral candidate Zohran Mamdani's past stances on policing are a legitimate reason for New Yorkers to be concerned, despite his recent walkbacks, according to a New York City crime expert who spoke to Fox News Digital. "I think what scares a lot of New Yorkers about the policy positions taken by Zohran Mamdani over the years is that he has exhibited not just a lack of appreciation for the men and women that stand on that [police] line, but a visceral disdain for them, which has led him to push for things like defunding and dismantling the police," Rafael A. Mangual, senior fellow and head of research for policing and public safety at the Manhattan Institute, told Fox News Digital, shortly after a gunman killed four people in midtown Manhattan, including a NYPD police officer. "It's not so much as just that he said, well, I wanna allocate some of this money to other places. He has gone so far as to say that we should dismantle the entire department."…READ
David Cronenberg's new sci-fi film is devastating and mysterious
Myrna (Jennifer Dale) must have had better blind dates. Her table for two is hemmed in by strange shrouds in tall vitrines. And as she makes small talk with her date Karsh (Vincent Cassel), the restaurant's owner, it becomes clear her surroundings are attached – architecturally, financially and intellectually – to a cemetery. And not just any cemetery: its headstones have screens. Because the bodies are swaddled in natty, camera-riddled, internet-enabled shrouds, you can come here to watch your loved ones decompose.
New tattoo sticker detects date rape drugs in 1 second
Checking your drink for drugs no longer needs to feel like a science experiment. Scientists in South Korea have created a new solution, a temporary tattoo sticker that instantly detects tampering. This simple sticker works fast, stays discreet, and offers surprisingly powerful protection. At first glance, it looks like ordinary skin art. The sticker detects GHB (gamma hydroxybutyrate), a drug commonly used to spike drinks.
Arts and media groups demand Labor take a stand against 'rampant theft' of Australian content to train AI
Arts, creative and media groups have demanded the government rule out allowing big tech companies to take Australian content to train their artificial intelligence models, with concerns such a shift would "sell out" Australian workers and lead to "rampant theft" of intellectual property. "It is not appropriate for big tech to steal the work of Australian artists, musicians, creators, news media, journalism, and use it for their own ends without paying for it," Ley said on Wednesday. In an interim report on "harnessing data and digital technology", the Productivity Commission set out proposals for how tech, including AI, could be regulated and treated in Australia, suggesting it could boost productivity by between 0.5% and 13% over the next decade, adding up to 116bn to Australia's GDP. The commission suggested several possible remedies, including expanding licensing schemes, or an exemption for "text and data mining" and expanding the existing fair dealing rules, which it said existed in other countries. The latter suggestion prompted fierce pushback from arts, creative and media companies, which raised alarm their work could be left open for massively wealthy tech companies to use – without compensation or payment – to train AI models.
Cross-lingual Opinions and Emotions Mining in Comparable Documents
Saad, Motaz, Langlois, David, Smaili, Kamel
Comparable texts are topic-aligned documents in multiple languages that are not direct translations. They are valuable for understanding how a topic is discussed across languages. This research studies differences in sentiments and emotions across English-Arabic comparable documents. First, texts are annotated with sentiment and emotion labels. We apply a cross-lingual method to label documents with opinion classes (subjective/objective), avoiding reliance on machine translation. To annotate with emotions (anger, disgust, fear, joy, sadness, surprise), we manually translate the English WordNet-Affect (WNA) lexicon into Arabic, creating bilingual emotion lexicons used to label the comparable corpora. We then apply a statistical measure to assess the agreement of sentiments and emotions in each source-target document pair. This comparison is especially relevant when the documents originate from different sources. To our knowledge, this aspect has not been explored in prior literature. Our study includes English-Arabic document pairs from Euronews, BBC, and Al-Jazeera (JSC). Results show that sentiment and emotion annotations align when articles come from the same news agency and diverge when they come from different ones. The proposed method is language-independent and generalizable to other language pairs.
Combolutional Neural Networks
Churchwell, Cameron, Kim, Minje, Smaragdis, Paris
Selecting appropriate inductive biases is an essential step in the design of machine learning models, especially when working with audio, where even short clips may contain millions of samples. To this end, we propose the combolutional layer: a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain. We demonstrate the efficacy of the combolutional layer on three information retrieval tasks, evaluate its computational cost relative to other audio frontends, and provide efficient implementations for training. We find that the combolutional layer is an effective replacement for convolutional layers in audio tasks where precise harmonic analysis is important, e.g., piano transcription, speaker classification, and key detection. Additionally, the combolutional layer has several other key benefits over existing frontends, namely: low parameter count, efficient CPU inference, strictly real-valued computations, and improved interpretability.