zsl
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > California (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > California (0.04)
- North America > Canada (0.04)
- (3 more...)
HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones. Typically, to guarantee desirable knowledge transfer, a common (latent) space is adopted for associating the visual and semantic domains in ZSL. However, existing common space learning methods align the semantic and visual domains by merely mitigating distribution disagreement through one-step adaptation. This strategy is usually ineffective due to the heterogeneous nature of the feature representations in the two domains, which intrinsically contain both distribution and structure variations. To address this and advance ZSL, we propose a novel hierarchical semantic-visual adaptation (HSVA) framework.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > California (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > California (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (3 more...)
Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning
Mircea, Andrei, Chakraborty, Supriyo, Chitsazan, Nima, Naphade, Milind, Sahu, Sambit, Rish, Irina, Lobacheva, Ekaterina
This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl
- Asia > Middle East > Jordan (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Reviews: Semantic-Guided Multi-Attention Localization for Zero-Shot Learning
The problem is relevant and the method is based on an interesting attention based idea to look at different regions in the image for the task of ZSL The losses used focus on (i) making each attention map peaky, while making different maps diverse, (ii) embedding based softmax for better prediction and (iii) class center triplet loss which makes the features closer to their respective class centers relative to the other class centers. Line 190 mentions that the image and parts are sent to "separate backbone networks", which implies that the network parameters are not shared. If that is the case then the method will have 3x parameters cf competing methods ie. a significantly higher capacity network overall. What happens when the CNN params are shared? And what happens when the image only baseline has a higher capacity network backbone (which is also then end-to-end finetuned)?