Well File:



FOOGD: Federated Collaboration for Both Out-of-distribution Generalization and Detection

Neural Information Processing Systems

Federated learning (FL) is a promising machine learning paradigm that collaborates with client models to capture global knowledge. However, deploying FL models in real-world scenarios remains unreliable due to the coexistence of in-distribution data and unexpected out-of-distribution (OOD) data, such as covariate-shift and semantic-shift data. Current FL researches typically address either covariate-shift data through OOD generalization or semantic-shift data via OOD detection, overlooking the simultaneous occurrence of various OOD shifts. In this work, we propose FOOGD, a method that estimates the probability density of each client and obtains reliable global distribution as guidance for the subsequent FL process.


A Implementation details

Neural Information Processing Systems

A.1 Encoder network architecture We adopt a similar encoder network (Figure S1) as RDE to transform the structural context of mutations in the interface to a conditional vector used by the generative process of side-chain conformations. We define the structural context as the 128 residues in closest proximity to the mutation sites. The input features can be grouped into single node features and pair edge features. The node features include amino acid types, backbone torsion angles, and local atom coordinates for each amino acid, while the edge features include pair distance and relative sequence position between two amino acids. The input features are first fed into MLP layers (denoted as Transition layer in Figure S1) and then combined with the spatial backbone frames to pass through the Invariant Point Attention Module (IPA), an SE(3)-invariant network proposed in AlphaFold2 [Jumper et al., 2021].



Predicting mutational effects on protein-protein binding via a side-chain diffusion probabilistic model

Neural Information Processing Systems

Many crucial biological processes rely on networks of protein-protein interactions. Predicting the effect of amino acid mutations on protein-protein binding is vital in protein engineering and therapeutic discovery. However, the scarcity of annotated experimental data on binding energy poses a significant challenge for developing computational approaches, particularly deep learning-based methods. In this work, we propose SidechainDiff, a representation learning-based approach that leverages unlabelled experimental protein structures. SidechainDiff utilizes a Riemannian diffusion model to learn the generative process of side-chain conformations and can also give the structural context representations of mutations on the proteinprotein interface. Leveraging the learned representations, we achieve state-of-theart performance in predicting the mutational effects on protein-protein binding. Furthermore, SidechainDiff is the first diffusion-based generative model for sidechains, distinguishing it from prior efforts that have predominantly focused on generating protein backbone structures.




The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes

Neural Information Processing Systems

Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. Deformable odometry and SLAM pipelines, which tackle the most challenging scenario of exploratory trajectories, suffer from a lack of robustness and proper quantitative evaluation methodologies. To tackle this issue with a common benchmark, we introduce the Drunkard's Dataset, a challenging collection of synthetic data targeting visual navigation and reconstruction in deformable environments. This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes where every surface exhibits non-rigid deformations over time. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels, including camera poses, RGB images and depth, optical flow and normal maps at high resolution and quality. We further present a novel deformable odometry method, dubbed the Drunkard's Odometry, which decomposes optical flow estimates into rigid-body camera motion and non-rigid scene deformations. In order to validate our data, our work contains an evaluation of several baselines as well as a novel tracking error metric which does not require ground truth data.


Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Neural Information Processing Systems

The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance.