Goto

Collaborating Authors

 Asia


Scaling up Remote Sensing Segmentation with Segment Anything Model

Neural Information Processing Systems

The success of the Segment Anything Model (SAM) demonstrates the significance of data-centric machine learning. However, due to the difficulties and high costs associated with annotating Remote Sensing (RS) images, a large amount of valuable RS data remains unlabeled, particularly at the pixel level. In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS. SAMRS totally possesses 105,090 images and 1,668,241 instances, surpassing existing high-resolution RS segmentation datasets in size by several orders of magnitude. It provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination. We also provide a comprehensive analysis of SAMRS from various aspects. Moreover, preliminary experiments highlight the importance of conducting segmentation pre-training with SAMRS to address task discrepancies and alleviate the limitations posed by limited training data during fine-tuning. The code and dataset will be available at SAMRS.


NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem

Neural Information Processing Systems

We present NeuroLKH, a novel algorithm that combines deep learning with the strong traditional heuristic Lin-Kernighan-Helsgaun (LKH) for solving Traveling Salesman Problem. Specifically, we train a Sparse Graph Network (SGN) with supervised learning for edge scores and unsupervised learning for node penalties, both of which are critical for improving the performance of LKH. Based on the output of SGN, NeuroLKH creates the edge candidate set and transforms edge distances to guide the searching process of LKH. Extensive experiments firmly demonstrate that, by training one model on a wide range of problem sizes, NeuroLKH significantly outperforms LKH and generalizes well to much larger sizes. Also, we show that NeuroLKH can be applied to other routing problems such as Capacitated Vehicle Routing Problem (CVRP), Pickup and Delivery Problem (PDP), and CVRP with Time Windows (CVRPTW).


Spectral Co-Distillation for Personalized Federated Learning

Neural Information Processing Systems

Personalized federated learning (PFL) has been widely investigated to address the challenge of data heterogeneity, especially when a single generic model is inadequate in satisfying the diverse performance requirements of local clients simultaneously. Existing PFL methods are inherently based on the idea that the relations between the generic global and personalized local models are captured by the similarity of model weights. Such a similarity is primarily based on either partitioning the model architecture into generic versus personalized components, or modeling client relationships via model weights. To better capture similar (yet distinct) generic versus personalized model representations, we propose spectral distillation, a novel distillation method based on model spectrum information. Building upon spectral distillation, we also introduce a co-distillation framework that establishes a two-way bridge between generic and personalized model training. Moreover, to utilize the local idle time in conventional PFL, we propose a waitfree local training protocol. Through extensive experiments on multiple datasets over diverse heterogeneous data settings, we demonstrate the outperformance and efficacy of our proposed spectral co-distillation method, as well as our wait-free training protocol.



Supplementary for: " GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization "

Neural Information Processing Systems

We organize our supplementary document as follows: 1. Results on additional dataset 2. Results for limited data settings on YFCC26k and GWS15k datasets 3. Additional Ablations (a) Gallery Size (b) Queue Length (c) ฯƒฮท for Batch GPS noise (d) ฯƒฮท for Queue GPS noise (e) ฯƒ for Random Fourier Features (f) Number of hierarchies (M) 4. Different selection choices for GPSGallery Construction (a) Evenly Spaced GPSCoordinates (b) Test Set GPSCoordinates 5. Analysis of Runtime and Memory Footprint 6. Motivations for using Pretrained CLIP as Image encoder Backbone 7. Qualitative Demonstration (a) Hierarchical learning in our location encoder L () (b) GeoCLIP with Image Query (c) Distribution of correct predictions of GeoCLIP on different datasets (d) GeoCLIP with Text Query 8. Discussion on Ethical Issues and Possible Mitigation In section 4.1 of the main paper, we demonstrated the performance of our GeoCLIP method on Im2GPS3k [2] and GWS15k [1] datasets and compared them with the state-of-the-art methods. Here, we perform experiments on another dataset YFCC26k [6]. The results are provided in Table 1. This result highlights that GeoCLIP performs well across datasets, being useful across different data distributions. GeoCLIP achieves decent performance across datasets even when the training data is significantly reduced. 2 We show the efficacy of GeoCLIP on limited training samples of Im2GPS3k in section 4.2 of the main paper. Now, we further investigate the performance of GeoCLIP for limited data settings on other datasets (YFCC26k and GWS15k).




Object-Centric Slot Diffusion

Neural Information Processing Systems

The recent success of transformer-based image generative models in object-centric learning highlights the importance of powerful image generators for handling complex scenes. However, despite the high expressiveness of diffusion models in image generation, their integration into object-centric learning remains largely unexplored in this domain. In this paper, we explore the feasibility and potential of integrating diffusion models into object-centric learning and investigate the pros and cons of this approach. We introduce Latent Slot Diffusion (LSD), a novel model that serves dual purposes: it is the first object-centric learning model to replace conventional slot decoders with a latent diffusion model conditioned on object slots, and it is also the first unsupervised compositional conditional diffusion model that operates without the need for supervised annotations like text. Through experiments on various object-centric tasks, including the first application of the FFHQ dataset in this field, we demonstrate that LSD significantly outperforms state-of-the-art transformer-based decoders, particularly in more complex scenes, and exhibits superior unsupervised compositional generation quality. In addition, we conduct a preliminary investigation into the integration of pre-trained diffusion models in LSD and demonstrate its effectiveness in real-world image segmentation and generation.



Towards Accelerated Model Training via Bayesian Data Selection

Neural Information Processing Systems

Mislabeled, duplicated, or biased data in real-world scenarios can lead to prolonged training and even hinder model convergence. Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety simultaneously. Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss. However, its practical adoption relies on less principled approximations and additional holdout data. This work solves these problems by leveraging a lightweight Bayesian treatment and incorporating off-the-shelf zero-shot predictors built on large-scale pre-trained models. The resulting algorithm is efficient and easy to implement. We perform extensive empirical studies on challenging benchmarks with considerable data noise and imbalance in the online batch selection scenario, and observe superior training efficiency over competitive baselines. Notably, on the challenging WebVision benchmark, our method can achieve similar predictive performance with significantly fewer training iterations than leading data selection methods.