Goto

Collaborating Authors

 zhao


Dual-Agent GANs for Photorealistic and Identity Preserving Profile Face Synthesis

Neural Information Processing Systems

Synthesizing realistic profile faces is promising for more efficiently training deep pose-invariant models for large-scale unconstrained face recognition, by populating samples with extreme poses and avoiding tedious annotations. However, learning from synthetic faces may not achieve the desired performance due to the discrepancy between distributions of the synthetic and real face images. To narrow this gap, we propose a Dual-Agent Generative Adversarial Network (DA-GAN) model, which can improve the realism of a face simulator's output using unlabeled real faces, while preserving the identity information during the realism refinement. The dual agents are specifically designed for distinguishing real v.s.


Disentangling factors of variation in deep representation using adversarial training

Neural Information Processing Systems

We propose a deep generative model for learning to distill the hidden factors of variation within a set of labeled observations into two complementary codes. One code describes the factors of variation relevant to solving a specified task. The other code describes the remaining factors of variation that are irrelevant to solving this task. The only available source of supervision during the training process comes from our ability to distinguish among different observations belonging to the same category. Concrete examples include multiple images of the same object from different viewpoints, or multiple speech samples from the same speaker. In both of these instances, the factors of variation irrelevant to classification are implicitly expressed by intra-class variabilities, such as the relative position of an object in an image, or the linguistic content of an utterance. Most existing approaches for solving this problem rely heavily on having access to pairs of observations only sharing a single factor of variation, e.g.



ProbabilisticMissingValueImputation forMixedCategoricalandOrderedData

Neural Information Processing Systems

Social survey datasets, for example, are typically mixed because they include variables like age (continuous), demographic group (categorical), and Likert scales (ordinal) measuring how strongly a respondent agrees with certain stated opinions. Continuous variables are encoded as real numbers and sometimes called numeric. We refer to variables that admit a total order (e.g.



DeepStack: DeeplyStackingVisualTokens isSurprisinglySimpleandEffectiveforLMMs

Neural Information Processing Systems

This inevitably introduces a tremendous memory andcompute overheadintotheLLMs, whichisparticularly significant when it comes to high-resolution images and multi-frame videos. Several previous works attempt to mitigate this issue by proposing various token compression strategies. A straightforward way is to reduce the number of tokens with spatial grouping [70, 47]. Instead of pooling vision tokens, a few work instead to concatenate local tokens along the feature dimension to preserve visual information [11, 48]. Moreover, other works seek more sophisticated token resampling, such as Q-Former [43], Perceiver [4]and Abstractor [8],etc.


RapidModelArchitectureAdaptionfor Meta-Learning

Neural Information Processing Systems

MostNASmethodstodayfocusona single task with afixedhardwaresystem, yetreal-life model deployments covering multiple tasks andvarioushardwareplatforms willsignificantly prolong thisprocess.


scaleVision

Neural Information Processing Systems

By making our data processing source code publiclyavailable, weaim toengage themarine science community toenrich thedata pool andinspire themachine learning community to develop more robust models.