Goto

Collaborating Authors

 Energy


Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing

Neural Information Processing Systems

This paper revisits the classical multi-scale representation learning problem but under the general framework of self-supervised learning for remote sensing image understanding. We present Cross-Scale MAE, a self-supervised model built upon the Masked Auto-Encoder (MAE). During pre-training, Cross-Scale MAE employs scale augmentation techniques and enforces cross-scale consistency constraints through both contrastive and generative losses to ensure consistent and meaningful representations well-suited for a wide range of downstream tasks. Further, our implementation leverages the xFormers library to accelerate network pre-training on a single GPU while maintaining the quality of learned representations. Experimental evaluations demonstrate that Cross-Scale MAE exhibits superior performance compared to standard MAE and other state-of-the-art remote sensing MAE methods.


Structured Energy Network As a Loss

Neural Information Processing Systems

Belanger & McCallum (2016) and Gygli et al. (2017) have shown that an energy network can capture arbitrary dependencies amongst the output variables in structured prediction; however, their reliance on gradient-based inference (GBI) makes the inference slow and unstable. In this work, we propose Structured Energy As Loss (SEAL) to take advantage of the expressivity of energy networks without incurring the high inference cost. This is a novel learning framework that uses an energy network as a trainable loss function (loss-net) to train a separate neural network (task-net), which is then used to perform the inference through a forward pass. We establish SEAL as a general framework wherein various learning strategies like margin-based, regression, and noise-contrastive, could be employed to learn the parameters of loss-net. Through extensive evaluation on multi-label classification, semantic role labeling, and image segmentation, we demonstrate that SEAL provides various useful design choices, is faster at inference than GBI, and leads to significant performance gains over the baselines.


Four bright spots in climate news in 2025

MIT Technology Review

Things aren't great, but there are a few positive signs, we promise. Climate news hasn't been great in 2025. Global greenhouse-gas emissions hit record highs (again). This year is set to be either the second or third warmest on record. Climate-fueled disasters like wildfires in California and flooding in Indonesia and Pakistan devastated communities and caused billions in damage. In addition to these worrying indicators of our continued contributions to climate change and their obvious effects, the world's largest economy has made a sharp U-turn on climate policy this year.


AlphaFold Changed Science. After 5 Years, It's Still Evolving

WIRED

WIRED spoke with DeepMind's Pushmeet Kohli about the recent past--and promising future--of the Nobel Prize-winning research project that changed biology and chemistry forever. Amino acids "folded" to form a protein. Over the past few years, we've periodically reported on its successes; last year, it won the Nobel Prize in Chemistry . Until AlphaFold's debut in November 2020, DeepMind had been best known for teaching an artificial intelligence to beat human champions at the ancient game of Go Its work culminated in the compilation of a database that now contains over 200 million predicted structures, essentially the entire known protein universe, and is used by nearly 3.5 million researchers in 190 countries around the world The Nature article published in 2021 describing the algorithm has been cited 40,000 times to date. Last year, AlphaFold 3 arrived, extending the capabilities of artificial intelligence to DNA, RNA, and drugs.


Learning Energy Networks with Generalized Fenchel-Young Losses

Neural Information Processing Systems

This allows one to capture potentially complex relationships between inputs andoutputs.To learn the parameters of the energy function, the solution to thatoptimization problem is typically fed into a loss function.The key challenge for training energy networks lies in computing loss gradients,as this typically requires argmin/argmax differentiation.In this paper, building upon a generalized notion of conjugate function,which replaces the usual bilinear pairing with a general energy function,we propose generalized Fenchel-Young losses, a natural loss construction forlearning energy networks. Our losses enjoy many desirable properties and theirgradients can be computed efficiently without argmin/argmax differentiation.We also prove the calibration of their excess risk in the case of linear-concaveenergies. We demonstrate our losses on multilabel classification and imitation learning tasks.


SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

Neural Information Processing Systems

The success of the Segment Anything Model (SAM) demonstrates the significance of data-centric machine learning. However, due to the difficulties and high costs associated with annotating Remote Sensing (RS) images, a large amount of valuable RS data remains unlabeled, particularly at the pixel level. In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS. SAMRS totally possesses 105,090 images and 1,668,241 instances, surpassing existing high-resolution RS segmentation datasets in size by several orders of magnitude. It provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination. We also provide a comprehensive analysis of SAMRS from various aspects. Moreover, preliminary experiments highlight the importance of conducting segmentation pre-training with SAMRS to address task discrepancies and alleviate the limitations posed by limited training data during fine-tuning.


Model-Based Control with Sparse Neural Dynamics

Neural Information Processing Systems

Learning predictive models from observations using deep neural networks (DNNs) is a promising new approach to many real-world planning and control problems. However, common DNNs are too unstructured for effective planning, and current control methods typically rely on extensive sampling or local gradient descent. In this paper, we propose a new framework for integrated model learning and predictive control that is amenable to efficient optimization algorithms. Specifically, we start with a ReLU neural model of the system dynamics and, with minimal losses in prediction accuracy, we gradually sparsify it by removing redundant neurons. This discrete sparsification process is approximated as a continuous problem, enabling an end-to-end optimization of both the model architecture and the weight parameters. The sparsified model is subsequently used by a mixed-integer predictive controller, which represents the neuron activations as binary variables and employs efficient branch-and-bound algorithms. Our framework is applicable to a wide variety of DNNs, from simple multilayer perceptrons to complex graph neural dynamics. It can efficiently handle tasks involving complicated contact dynamics, such as object pushing, compositional object sorting, and manipulation of deformable objects. Numerical and hardware experiments show that, despite the aggressive sparsification, our framework can deliver better closed-loop performance than existing state-of-the-art methods.


CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders

Neural Information Processing Systems

A vital and rapidly growing application, remote sensing offers vast yet sparsely labeled, spatially aligned multimodal data; this makes self-supervised learning algorithms invaluable. We present CROMA: a framework that combines contrastive and reconstruction self-supervised objectives to learn rich unimodal and multimodal representations. Our method separately encodes masked-out multispectral optical and synthetic aperture radar samples--aligned in space and time--and performs cross-modal contrastive learning. Another encoder fuses these sensors, producing joint multimodal encodings that are used to predict the masked patches via a lightweight decoder. We show that these objectives are complementary when leveraged on spatially aligned multimodal data. We also introduce X-and 2D-ALiBi, which spatially biases our cross-and self-attention matrices. These strategies improve representations and allow our models to effectively extrapolate to images up to $17.6\times$ larger at test-time.


Improving *day-ahead* Solar Irradiance Time Series Forecasting by Leveraging Spatio-Temporal Context

Neural Information Processing Systems

Nonetheless, the inherent variability of solar irradiance poses a significant challenge for seamlessly integrating solar power into the electrical grid. While the majority of prior research has centered on employing purely time series-based methodologies for solar forecasting, only a limited number of studies have taken into account factors such as cloud cover or the surrounding physical context.In this paper, we put forth a deep learning architecture designed to harness spatio-temporal context using satellite data, to attain highly accurate day-ahead time-series forecasting for any given station, with a particular emphasis on forecasting Global Horizontal Irradiance (GHI). We also suggest a methodology to extract a distribution for each time step prediction, which can serve as a very valuable measure of uncertainty attached to the forecast. When evaluating models, we propose a testing scheme in which we separate particularly difficult examples from easy ones, in order to capture the model performances in crucial situations, which in the case of this study are the days suffering from varying cloudy conditions. Furthermore, we present a new multi-modal dataset gathering satellite imagery over a large zone and time series for solar irradiance and other related physical variables from multiple geographically diverse solar stations. Our approach exhibits robust performance in solar irradiance forecasting, including zero-shot generalization tests at unobserved solar stations, and holds great promise in promoting the effective integration of solar power into the grid.


5 incredible aerospace breakthroughs in 2025

Popular Science

The Vera C. Rubin Observatory won our Innovation of the Year honors. Breakthroughs, discoveries, and DIY tips sent every weekday. From the most detailed movie of the night sky ever made to the first commercial soft landing on the moon, this year has been an inflection point for exploring and understanding the vast expanse above our heads. We also saw breakthroughs in small changes to commercial airliners that improve efficiency, as well as a new type of rocket engine that might be the future of extremely high speed air travel, plus the closest view of Mercury we've ever seen! Vera C. Rubin Observatory by U.S. National Science Foundation & Department of Energy: World's largest digital camera to conduct 10-year survey of the night sky Prepare to see space like never before.