zhao
Dual-Agent GANs for Photorealistic and Identity Preserving Profile Face Synthesis
Synthesizing realistic profile faces is promising for more efficiently training deep pose-invariant models for large-scale unconstrained face recognition, by populating samples with extreme poses and avoiding tedious annotations. However, learning from synthetic faces may not achieve the desired performance due to the discrepancy between distributions of the synthetic and real face images. To narrow this gap, we propose a Dual-Agent Generative Adversarial Network (DA-GAN) model, which can improve the realism of a face simulator's output using unlabeled real faces, while preserving the identity information during the realism refinement. The dual agents are specifically designed for distinguishing real v.s.
Disentangling factors of variation in deep representation using adversarial training
We propose a deep generative model for learning to distill the hidden factors of variation within a set of labeled observations into two complementary codes. One code describes the factors of variation relevant to solving a specified task. The other code describes the remaining factors of variation that are irrelevant to solving this task. The only available source of supervision during the training process comes from our ability to distinguish among different observations belonging to the same category. Concrete examples include multiple images of the same object from different viewpoints, or multiple speech samples from the same speaker. In both of these instances, the factors of variation irrelevant to classification are implicitly expressed by intra-class variabilities, such as the relative position of an object in an image, or the linguistic content of an utterance. Most existing approaches for solving this problem rely heavily on having access to pairs of observations only sharing a single factor of variation, e.g.
- Asia > China > Shanghai > Shanghai (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
ProbabilisticMissingValueImputation forMixedCategoricalandOrderedData
Social survey datasets, for example, are typically mixed because they include variables like age (continuous), demographic group (categorical), and Likert scales (ordinal) measuring how strongly a respondent agrees with certain stated opinions. Continuous variables are encoded as real numbers and sometimes called numeric. We refer to variables that admit a total order (e.g.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > France (0.04)
DeepStack: DeeplyStackingVisualTokens isSurprisinglySimpleandEffectiveforLMMs
This inevitably introduces a tremendous memory andcompute overheadintotheLLMs, whichisparticularly significant when it comes to high-resolution images and multi-frame videos. Several previous works attempt to mitigate this issue by proposing various token compression strategies. A straightforward way is to reduce the number of tokens with spatial grouping [70, 47]. Instead of pooling vision tokens, a few work instead to concatenate local tokens along the feature dimension to preserve visual information [11, 48]. Moreover, other works seek more sophisticated token resampling, such as Q-Former [43], Perceiver [4]and Abstractor [8],etc.
- North America > United States > Massachusetts (0.05)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (6 more...)
- North America > United States (0.04)
- Asia > China > Shanghai > Shanghai (0.04)