Hierarchical Federated Learning with Multi-Timescale Gradient Correction
While traditional federated learning (FL) typically focuses on a star topology where clients are directly connected to a central server, real-world distributed systems often exhibit hierarchical architectures. Hierarchical FL (HFL) has emerged as a promising solution to bridge this gap, leveraging aggregation points at multiple levels of the system. However, existing algorithms for HFL encounter challenges in dealing with multi-timescale model drift, i.e., model drift occurring across hierarchical levels of data heterogeneity. In this paper, we propose a multi-timescale gradient correction (MTGC) methodology to resolve this issue. Our key idea is to introduce distinct control variables to (i) correct the client gradient towards the group gradient, i.e., to reduce client model drift caused by local updates based on individual datasets, and (ii) correct the group gradient towards the global gradient, i.e., to reduce group model drift caused by FL over clients within the group. We analytically characterize the convergence behavior of MTGC under general non-convex settings, overcoming challenges associated with couplings between correction terms. We show that our convergence bound is immune to the extent of data heterogeneity, confirming the stability of the proposed algorithm against multi-level non-i.i.d.
Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes (Supplementary Material) CRCV, University of Central Florida 1
The only difference is VIS involves instance-classification which could be solved by just a single-frame object detection. However, action detection requires higher-level action-classification which requires temporal reasoning. The per-frame localization task of both the problems is fundamentally identical.
On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes CRCV, University of Central Florida 1
This paper explores the impact of occlusions in video action detection. We facilitate this study by introducing five new benchmark datasets namely O-UCF and O-JHMDB consisting of synthetically controlled static/dynamic occlusions, OVIS-UCF and OVIS-JHMDB consisting of occlusions with realistic motions and Real-OUCF for occlusions in realistic-world scenarios. We formally confirm an intuitive expectation: existing models suffer a lot as occlusion severity is increased and exhibit different behaviours when occluders are static vs when they are moving.
Controllable Text-to-Image Generation
Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip Torr
In this paper, we propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions. To achieve this, we introduce a word-level spatial and channel-wise attention-driven generator that can disentangle different visual attributes, and allow the model to focus on generating and manipulating subregions corresponding to the most relevant words. Also, a word-level discriminator is proposed to provide fine-grained supervisory feedback by correlating words with image regions, facilitating training an effective generator which is able to manipulate specific visual attributes without affecting the generation of other content. Furthermore, perceptual loss is adopted to reduce the randomness involved in the image generation, and to encourage the generator to manipulate specific attributes required in the modified text. Extensive experiments on benchmark datasets demonstrate that our method outperforms existing state of the art, and is able to effectively manipulate synthetic images using natural language descriptions.
Supplementary Material
In this section we first introduce notations and demonstrate how to express a region ω of the partition Ω as a polytope defined by a system of inequalities, and then leverage this formulation to demonstrate how to obtain Ω by recursively exploring neighboring regions starting from a random point/region. From this, we see that the pre-activation signs and the regions are tied together. Corollary 2. The H-representation of the polyhedral region ω is given by From the above, it is clear that the sign locates in which side of each hyperplane the region is located. The search for all cells in a partition is known as the cell enumeration problem and has been extensively studied in the context of speicific partitions such as hypreplane arrangements [54-56]. In fact, changing one activation state say 1 to 1 for a specific unit at layer l will alter the affine parameters from (10) and (11) due to the layer composition.
Learning Provably Robust Estimators for Inverse Problems via Jittering
Deep neural networks provide excellent performance for inverse problems such as denoising. However, neural networks can be sensitive to adversarial or worstcase perturbations. This raises the question of whether such networks can be trained efficiently to be worst-case robust. In this paper, we investigate whether jittering, a simple regularization technique that adds isotropic Gaussian noise during training, is effective for learning worst-case robust estimators for inverse problems. While well studied for prediction in classification tasks, the effectiveness of jittering for inverse problems has not been systematically investigated.
Beyond Single Stationary Policies: Meta-Task Players as Naturally Superior Collaborators
In human-AI collaborative tasks, the distribution of human behavior, influenced by mental models, is non-stationary, manifesting in various levels of initiative and different collaborative strategies. A significant challenge in human-AI collaboration is determining how to collaborate effectively with humans exhibiting non-stationary dynamics. Current collaborative agents involve initially running self-play (SP) multiple times to build a policy pool, followed by training the final adaptive policy against this pool. These agents themselves are a single policy network, which is insufficient for handling non-stationary human dynamics. We discern that despite the inherent diversity in human behaviors, the underlying meta-tasks within specific collaborative contexts tend to be strikingly similar.
Transfer Learning for Latent Variable Network Models
We study transfer learning for estimation in latent variable network models. In our setting, the conditional edge probability matrices given the latent variables are represented by P for the source and Q for the target. We wish to estimate Q given two kinds of data: (1) edge data from a subgraph induced by an o(1) fraction of the nodes of Q, and (2) edge data from all of P. If the source P has no relation to the target Q, the estimation error must be Ω(1). However, we show that if the latent variables are shared, then vanishing error is possible. We give an efficient algorithm that utilizes the ordering of a suitably defined graph distance. Our algorithm achieves o(1) error and does not assume a parametric form on the source or target networks. Next, for the specific case of Stochastic Block Models we prove a minimax lower bound and show that a simple algorithm achieves this rate. Finally, we empirically demonstrate our algorithm's use on real-world and simulated network estimation problems.
IDEA: An Invariant Perspective for Efficient Domain Adaptive Image Retrieval Haixin Wang, Hao Wu2,, Jinan Sun
In this paper, we study the problem of unsupervised domain adaptive retrieval, which transfers retrieval models from a label-rich source domain to a label-scarce target domain. Although there exist numerous approaches that incorporate transfer learning techniques into deep hashing frameworks, they often overlook the crucial invariance needed for adequate alignment between these two domains. Even worse, these methods fail to distinguish between causal and non-causal effects embedded in images, making cross-domain retrieval ineffective. To address these challenges, we propose an Invariance-acquired Domain Adaptive Hashing (IDEA) model.