Goto

Collaborating Authors

 Promising Solution


Motion Graph Unleashed: A Novel Approach to Video Prediction Bohan Tang

Neural Information Processing Systems

We introduce motion graph, a novel approach to the video prediction problem, which predicts future video frames from limited past data. The motion graph transforms patches of video frames into interconnected graph nodes, to comprehensively describe the spatial-temporal relationships among them. This representation overcomes the limitations of existing motion representations such as image differences, optical flow, and motion matrix that either fall short in capturing complex motion patterns or suffer from excessive memory consumption. We further present a video prediction pipeline empowered by motion graph, exhibiting substantial performance improvements and cost reductions. Experiments on various datasets, including UCF Sports, KITTI and Cityscapes, highlight the strong representative ability of motion graph. Especially on UCF Sports, our method matches and outperforms the SOTA methods with a significant reduction in model size by 78% and a substantial decrease in GPU memory utilization by 47%. Please refer to this link for the official code.


RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models

Neural Information Processing Systems

Natural images captured by mobile devices often suffer from multiple types of degradation, such as noise, blur, and low light. Traditional image restoration methods require manual selection of specific tasks, algorithms, and execution sequences, which is time-consuming and may yield suboptimal results. All-in-one models, though capable of handling multiple tasks, typically support only a limited range and often produce overly smooth, low-fidelity outcomes due to their broad data distribution fitting. To address these challenges, we first define a new pipeline for restoring images with multiple degradations, and then introduce RestoreAgent, an intelligent image restoration system leveraging multimodal large language models. RestoreAgent autonomously assesses the type and extent of degradation in input images and performs restoration through (1) determining the appropriate restoration tasks, (2) optimizing the task sequence, (3) selecting the most suitable models, and (4) executing the restoration. Experimental results demonstrate the superior performance of RestoreAgent in handling complex degradation, surpassing human experts.


Quadratic Video Interpolation

Neural Information Processing Systems

Video interpolation is an important problem in computer vision, which helps overcome the temporal limitation of camera sensors. Existing video interpolation methods usually assume uniform motion between consecutive frames and use linear models for interpolation, which cannot well approximate the complex motion in the real world. To address these issues, we propose a quadratic video interpolation method which exploits the acceleration information in videos. This method allows prediction with curvilinear trajectory and variable velocity, and generates more accurate interpolation results. For high-quality frame synthesis, we develop a flow reversal layer to estimate flow fields starting from the unknown target frame to the source frame.


Conformal Classification with Equalized Coverage for Adaptively Selected Groups

Neural Information Processing Systems

This paper introduces a conformal inference method to evaluate uncertainty in classification by generating prediction sets with valid coverage conditional on adaptively chosen features. These features are carefully selected to reflect potential model limitations or biases. This can be useful to find a practical compromise between efficiency--by providing informative predictions--and algorithmic fairness-- by ensuring equalized coverage for the most sensitive groups. We demonstrate the validity and effectiveness of this method on simulated and real data sets.



Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification

Neural Information Processing Systems

We study the problem of fair binary classification using the notion of Equal Opportunity. It requires the true positive rate to distribute equally across the sensitive groups. Within this setting we show that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold. We provide a constructive expression for the threshold. This result motivates us to devise a plug-in classification procedure based on both unlabeled and labeled datasets. While the latter is used to learn the output conditional probability, the former is used for calibration. The overall procedure can be computed in polynomial time and it is shown to be statistically consistent both in terms of the classification error and fairness measure. Finally, we present numerical experiments which indicate that our method is often superior or competitive with the state-of-the-art methods on benchmark datasets.



Cross-sectional Learning of Extremal Dependence among Financial Assets

Neural Information Processing Systems

We propose a novel probabilistic model to facilitate the learning of multivariate tail dependence of multiple financial assets. Our method allows one to construct from known random vectors, e.g., standard normal, sophisticated joint heavytailed random vectors featuring not only distinct marginal tail heaviness, but also flexible tail dependence structure. The novelty lies in that pairwise tail dependence between any two dimensions is modeled separately from their correlation, and can vary respectively according to its own parameter rather than the correlation parameter, which is an essential advantage over many commonly used methods such as multivariate t or elliptical distribution. It is also intuitive to interpret, easy to track, and simple to sample comparing to the copula approach. We show its flexible tail dependence structure through simulation. Coupled with a GARCH model to eliminate serial dependence of each individual asset return series, we use this novel method to model and forecast multivariate conditional distribution of stock returns, and obtain notable performance improvements in multi-dimensional coverage tests. Besides, our empirical finding about the asymmetry of tails of the idiosyncratic component as well as the market component is interesting and worth to be well studied in the future.


Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations

Neural Information Processing Systems

In this paper, we propose one novel model for point cloud semantic segmentation, which exploits both the local and global structures within the point cloud based on the contextual point representations. Specifically, we enrich each point representation by performing one novel gated fusion on the point itself and its contextual points. Afterwards, based on the enriched representation, we propose one novel graph pointnet module, relying on the graph attention block to dynamically compose and update each point representation within the local point cloud structure. Finally, we resort to the spatial-wise and channel-wise attention strategies to exploit the point cloud global structure and thereby yield the resulting semantic label for each point. Extensive results on the public point cloud databases, namely the S3DIS and ScanNet datasets, demonstrate the effectiveness of our proposed model, outperforming the state-of-the-art approaches. Our code for this paper is available at https://github.com/fly519/ELGS.


RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models

Neural Information Processing Systems

Machine unlearning is a promising solution for efficiently removing specific knowledge by post hoc modifying models. In this paper, we propose a Real-World Knowledge Unlearning benchmark (RWKU) for LLM unlearning. RWKU is designed based on the following three key factors: (1) For the task setting, we consider a more practical and challenging unlearning setting, where neither the forget corpus nor the retain corpus is accessible.