Goto

Collaborating Authors

 rectifier


TrashorTreasure?AnInteractiveDual-Stream StrategyforSingleImageReflectionSeparation

Neural Information Processing Systems

Existing deep learning based solutions typically restore the target layers individually, or with some concerns at the end of the output, barely taking into account the interaction across thetwostreams/branches. Inorder toutilize information more efficiently, this work presents a general yet simple interactive strategy, namely your trash is my treasure(YTMT), for constructing dual-stream decomposition networks.


Rectifying Distribution Shift in Cascaded Precipitation Nowcasting

Ju, Fanbo, Shi, Haiyuan, Ni, Qingjian

arXiv.org Artificial Intelligence

Precipitation nowcasting, which aims to provide high spatio-temporal resolution precipitation forecasts by leveraging current radar observations, is a core task in regional weather forecasting. Recently, the cascaded architecture has emerged as the mainstream paradigm for deep learning-based precipitation nowcasting. This paradigm involves a deterministic model to predict posterior mean, followed by a probabilistic model to generate local stochasticity. However, existing methods commonly overlook the conflation of the systematic distribution shift in deterministic predictions and the local stochasticity. As a result, the distribution shift of the deterministic component contaminates the predictions of the probabilistic component, leading to inaccuracies in precipitation patterns and intensity, particularly over longer lead times. To address this issue, we introduce RectiCast, a two-stage framework that explicitly decouples the rectification of mean-field shift from the generation of local stochasticity via a dual Flow Matching model. In the first stage, a deterministic model generates the posterior mean. In the second stage, we introduce a Rectifier to explicitly learn the distribution shift and produce a rectified mean. Subsequently, a Generator focuses on modeling the local stochasticity conditioned on the rectified mean. Experiments on two radar datasets demonstrate that RectiCast achieves significant performance improvements over existing state-of-the-art methods.



Prediction-Powered Inference with Inverse Probability Weighting

Datta, Jyotishka, Polson, Nicholas G.

arXiv.org Machine Learning

Prediction-powered inference (PPI) is a recent framework for valid statistical inference with partially labeled data, combining model-based predictions on a large unlabeled set with bias correction from a smaller labeled subset. We show that PPI can be extended to handle informative labeling by replacing its unweighted bias-correction term with an inverse probability weighted (IPW) version, using the classical Horvitz--Thompson or Hájek forms. This connection unites design-based survey sampling ideas with modern prediction-assisted inference, yielding estimators that remain valid when labeling probabilities vary across units. We consider the common setting where the inclusion probabilities are not known but estimated from a correctly specified model. In simulations, the performance of IPW-adjusted PPI with estimated propensities closely matches the known-probability case, retaining both nominal coverage and the variance-reduction benefits of PPI.


Quantize-then-Rectify: Efficient VQ-VAE Training

Zhang, Borui, Rao, Qihang, Zheng, Wenzhao, Zhou, Jie, Lu, Jiwen

arXiv.org Artificial Intelligence

Visual tokenizers are pivotal in multimodal large models, acting as bridges between continuous inputs and discrete tokens. Nevertheless, training high-compression-rate VQ-VAEs remains computationally demanding, often necessitating thousands of GPU hours. This work demonstrates that a pre-trained VAE can be efficiently transformed into a VQ-VAE by controlling quantization noise within the VAE's tolerance threshold. We present \textbf{Quantize-then-Rectify (ReVQ)}, a framework leveraging pre-trained VAEs to enable rapid VQ-VAE training with minimal computational overhead. By integrating \textbf{channel multi-group quantization} to enlarge codebook capacity and a \textbf{post rectifier} to mitigate quantization errors, ReVQ compresses ImageNet images into at most 512 tokens while sustaining competitive reconstruction quality (rFID = 1.06). Significantly, ReVQ reduces training costs by over two orders of magnitude relative to state-of-the-art approaches: ReVQ finishes full training on a single NVIDIA 4090 in approximately 22 hours, whereas comparable methods require 4.5 days on 32 A100 GPUs. Experimental results show that ReVQ achieves superior efficiency-reconstruction trade-offs.


Graph in the Vault: Protecting Edge GNN Inference with Trusted Execution Environment

Ding, Ruyi, Xu, Tianhong, Ding, Aidong Adam, Fei, Yunsi

arXiv.org Artificial Intelligence

--Wide deployment of machine learning models on edge devices has rendered the model intellectual property (IP) and data privacy vulnerable. We propose GNNV ault, the first secure Graph Neural Network (GNN) deployment strategy based on Trusted Execution Environment (TEE). GNNV ault follows the design of "partition-before-training" and includes a private GNN rectifier to complement with a public backbone model. This way, both critical GNN model parameters and the private graph used during inference are protected within secure TEE compartments. Real-world implementations with Intel SGX demonstrate that GNNV ault safeguards GNN inference against state-of-the-art link stealing attacks with a negligible accuracy degradation ( < 2 %). On-device machine learning has emerged as an important paradigm for tasks requiring low latency and high privacy [1]. This trend has also extended to Graph Neural Networks (GNNs) [4], [5], ensuring the privacy of user data during inference for tasks such as community detection [6], e-commerce personaliza-tion [7], and recommender systems [8]. However, local GNN inference grants users significant privileges to local models and data, introducing additional security vulnerabilities [9].


Till the Layers Collapse: Compressing a Deep Neural Network through the Lenses of Batch Normalization Layers

Liao, Zhu, Hezbri, Nour, Quétu, Victor, Nguyen, Van-Tam, Tartaglione, Enzo

arXiv.org Artificial Intelligence

Today, deep neural networks are widely used since they can handle a variety of complex tasks. Their generality makes them very powerful tools in modern technology. However, deep neural networks are often overparameterized. The usage of these large models consumes a lot of computation resources. In this paper, we introduce a method called \textbf{T}ill the \textbf{L}ayers \textbf{C}ollapse (TLC), which compresses deep neural networks through the lenses of batch normalization layers. By reducing the depth of these networks, our method decreases deep neural networks' computational requirements and overall latency. We validate our method on popular models such as Swin-T, MobileNet-V2, and RoBERTa, across both image classification and natural language processing (NLP) tasks.


Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets

Wang, Yuxin, Schröder, Maresa, Frauen, Dennis, Schweisthal, Jonas, Hess, Konstantin, Feuerriegel, Stefan

arXiv.org Machine Learning

Constructing confidence intervals (CIs) for the average treatment effect (ATE) from patient records is crucial to assess the effectiveness and safety of drugs. However, patient records typically come from different hospitals, thus raising the question of how multiple observational datasets can be effectively combined for this purpose. In our paper, we propose a new method that estimates the ATE from multiple observational datasets and provides valid CIs. Our method makes little assumptions about the observational datasets and is thus widely applicable in medical practice. The key idea of our method is that we leverage predictionpowered inferences and thereby essentially'shrink' the CIs so that we offer more precise uncertainty quantification as compared to naïve approaches. We further prove the unbiasedness of our method and the validity of our CIs. We confirm our theoretical results through various numerical experiments. Finally, we provide an extension of our method for constructing CIs from combinations of experimental and observational datasets. Estimating the average treatment effect (ATE) together with confidence intervals (CIs) is relevant in many fields, such as medicine, where the ATE is used to assess the effectiveness and safety of drugs (Glass et al., 2013; Feuerriegel et al., 2024). Nowadays, there is a growing interest in using observational datasets for this purpose, for example, electronic health records (EHRs) and clinical registries (Johnson et al., 2016; Corrigan-Curay et al., 2018; Hong, 2021). Importantly, such observational datasets typically originate from different hospitals, different health providers, or even different countries (Colnet et al., 2024), thus raising the question of how to construct CIs for ATE estimation from multiple observational datasets. Motivating example: During the COVID-19 pandemic, the effectiveness and safety of potential drugs and vaccines were often assessed from electronic health records that originated from different hospitals to rapidly generate new evidence with treatment guidelines (Tacconelli et al., 2022). For example, one study (Wong et al., 2024) estimated the effect of nirmatrelvir/ritonavir (also known under the commercial name "paxlovid") in patients with COVID-19 diagnosis on 28-day all-cause hospitalizations from data obtained through a retrospective, multi-center study.


Federated Prediction-Powered Inference from Decentralized Data

Luo, Ping, Deng, Xiaoge, Wen, Ziqing, Sun, Tao, Li, Dongsheng

arXiv.org Artificial Intelligence

In various domains, the increasing application of machine learning allows researchers to access inexpensive predictive data, which can be utilized as auxiliary data for statistical inference. Although such data are often unreliable compared to gold-standard datasets, Prediction-Powered Inference (PPI) has been proposed to ensure statistical validity despite the unreliability. However, the challenge of `data silos' arises when the private gold-standard datasets are non-shareable for model training, leading to less accurate predictive models and invalid inferences. In this paper, we introduces the Federated Prediction-Powered Inference (Fed-PPI) framework, which addresses this challenge by enabling decentralized experimental data to contribute to statistically valid conclusions without sharing private information. The Fed-PPI framework involves training local models on private data, aggregating them through Federated Learning (FL), and deriving confidence intervals using PPI computation. The proposed framework is evaluated through experiments, demonstrating its effectiveness in producing valid confidence intervals.


Rectifier: Code Translation with Corrector via LLMs

Yin, Xin, Ni, Chao, Nguyen, Tien N., Wang, Shaohua, Yang, Xiaohu

arXiv.org Artificial Intelligence

Software migration is garnering increasing attention with the evolution of software and society. Early studies mainly relied on handcrafted translation rules to translate between two languages, the translation process is error-prone and time-consuming. In recent years, researchers have begun to explore the use of pre-trained large language models (LLMs) in code translation. However, code translation is a complex task that LLMs would generate mistakes during code translation, they all produce certain types of errors when performing code translation tasks, which include (1) compilation error, (2) runtime error, (3) functional error, and (4) non-terminating execution. We found that the root causes of these errors are very similar (e.g. failure to import packages, errors in loop boundaries, operator errors, and more). In this paper, we propose a general corrector, namely Rectifier, which is a micro and universal model for repairing translation errors. It learns from errors generated by existing LLMs and can be widely applied to correct errors generated by any LLM. The experimental results on translation tasks between C++, Java, and Python show that our model has effective repair ability, and cross experiments also demonstrate the robustness of our method.