Goto

Collaborating Authors

 dagan



A Proofs

Neural Information Processing Systems

GANs, we need to rewrite the objective functions that are easy to calculate derivatives. Proposition 2. F or any continuous and differentiable function f whose domain is X, we have: E Readers are encouraged to refer to the original proof in [57] for more details. Theorem 2. Given the optimal classifier Please see Appendix A.2 for details. Proposition 1. F or any fixed generator, given a data Theorem 3. The objective function for the generator of SSGAN-LA, given the optimal label-augmented discriminator, boils down to: min Theorem 4. At the equilibrium point of DAGAN, the optimal generator implies We first prove the first sentence in this Theorem. We then prove the second sentence in this Theorem.


DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

arXiv.org Artificial Intelligence

Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibitively costly to obtain. In this work, firstly, we present a novel self-supervised method for learning dense 3D facial geometry (ie, depth) from face videos, without requiring camera parameters and 3D geometry annotations in training. We further propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning. Secondly, we design an effective geometry-guided facial keypoint estimation module, providing accurate keypoints for generating motion fields. Lastly, we develop a 3D-aware cross-modal (ie, appearance and depth) attention mechanism, which can be applied to each generation layer, to capture facial geometries in a coarse-to-fine manner. Extensive experiments are conducted on three challenging benchmarks (ie, VoxCeleb1, VoxCeleb2, and HDTF). The results demonstrate that our proposed framework can generate highly realistic-looking reenacted talking videos, with new state-of-the-art performances established on these benchmarks. The codes and trained models are publicly available on the GitHub project page at https://github.com/harlanhong/CVPR2022-DaGAN


IDF reveals its artificial intelligence war data 'factory'

#artificialintelligence

The IDF revealed its artificial intelligence (AI) war data "factory" and strategy on Tuesday as part of Tel Aviv University's Blavatnik Virtual AI Week. Aviad Dagan, who is the director of the IDF's Digital Transformation Administration said that although the military has been using AI for some time, including during the May 2021 Gaza war, a new strategy for AI was approved by IDF Chief-of-Staff Lt. Gen. Aviv Kohavi only a few weeks ago. "Data and AI can actually win wars… not only arms, physical jets and submarines," Dagan said. "The speed at which we can create a new weapon is totally different from creating a physical weapon," he said. "It is dramatically more flexible and adaptive than any kind of AI network," including the long delays and resources needed for purchasing an F-35 and most other new weapons for troops.


Language Models for Lexical Inference in Context

arXiv.org Artificial Intelligence

Lexical inference (LI) denotes the task of deciding Recently, transfer learning has become ubiquitous whether or not an entailment relation holds between in NLP; Transformer (Vaswani et al., two lexical items. It is therefore related to the detection 2017) language models (LMs) pretrained on large of other lexical relations like hyponymy amounts of textual data (Devlin et al., 2019a; Liu between nouns (Hearst, 1992), e.g., dog animal, et al., 2019) form the basis of a lot of current stateof-the-art or troponymy between verbs (Fellbaum and Miller, models. Besides zero-and few-shot capabilities 1990), e.g., to traipse to walk. Lexical inference (Radford et al., 2019; Brown et al., 2020), in context (LIiC) adds the problem of disambiguating pretrained LMs have also been found to acquire the pair of lexical items in a given context before factual and relational knowledge during pretraining reasoning about the inference question.


Dual Attention GANs for Semantic Image Synthesis

arXiv.org Artificial Intelligence

In this paper, we focus on the semantic image synthesis task that aims at transferring semantic label maps to photo-realistic images. Existing methods lack effective semantic constraints to preserve the semantic information and ignore the structural correlations in both spatial and channel dimensions, leading to unsatisfactory blurry and artifact-prone results. To address these limitations, we propose a novel Dual Attention GAN (DAGAN) to synthesize photo-realistic and semantically-consistent images with fine details from the input layouts without imposing extra training overhead or modifying the network architectures of existing methods. We also propose two novel modules, i.e., position-wise Spatial Attention Module Figure 1: Visualization of generated semantic maps compared (SAM) and scale-wise Channel Attention Module (CAM), to capture with those from GauGAN [31] on Cityscapes (top) and semantic structure attention in spatial and channel dimensions, ADE20K (bottom). Equipped with semantic attention modeling respectively. Specifically, SAM selectively correlates the pixels at in both spatial and channel dimensions, the proposed each position by a spatial attention map, leading to pixels with the DAGAN can achieve mutual gains within the regions with same semantic label being related to each other regardless of their the same semantic label regardless of the distances, thus improving spatial distances. Meanwhile, CAM selectively emphasizes the scalewise intra-class semantic consistency. Most improved regions features at each channel by a channel attention map, which are highlighted in the ground truths with white dash integrates associated features among all channel maps regardless of boxes.


Reality Engines offers a deep learning tour de force to challenge Amazon et al in Enterprise AI

#artificialintelligence

Bindu Reddy, co-founder and chief executive of startup Reality Engines, unveiled a slew of enterprise apps based on cutting-edge deep learning techniques. "Our moat comes both from constantly innovating and in getting more and more practice on key enterprise use-cases," said Reddy, who was formerly head of "AI verticals" at Amazon's AWS cloud service. Barely a year old, Reality Engines of San Francisco emerged from stealth mode on Tuesday, announcing a slew of artificial intelligence offerings to perform corporate tasks such as budgeting for cloud services or monitoring corporate networks for break-ins. Most exciting of all is that the tiny 18-person team has some very novel takes on deep learning forms of AI, the product of seasoned vets in machine learning technology and products. This is no me-too chatbot service, it would appear.


Reality Engines offers a deep learning tour de force to challenge Amazon et al in Enterprise AI ZDNet

#artificialintelligence

Bindu Reddy, co-founder and chief executive of startup Reality Engines, unveiled a slew of enterprise apps based on cutting-edge deep learning techniques. "Our moat comes both from constantly innovating and in getting more and more practice on key enterprise use-cases," said Reddy, who was formerly head of "AI verticals" at Amazon's AWS cloud service. Barely a year old, Reality Engines of San Francisco emerged from stealth mode on Tuesday, announcing a slew of artificial intelligence offerings to perform corporate tasks such as budgeting for cloud services or monitoring corporate networks for break-ins. Most exciting of all is that the tiny 18-person team has some very novel takes on deep learning forms of AI, the product of seasoned vets in machine learning technology and products. This is no me-too chatbot service, it would appear.


SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference

arXiv.org Artificial Intelligence

We present SherLIiC, a testbed for lexical inference in context (LIiC), consisting of 3985 manually annotated inference rule candidates (InfCands), accompanied by (i) ~960k unlabeled InfCands, and (ii) ~190k typed textual relations between Freebase entities extracted from the large entity-linked corpus ClueWeb09. Each InfCand consists of one of these relations, expressed as a lemmatized dependency path, and two argument placeholders, each linked to one or more Freebase types. Due to our candidate selection process based on strong distributional evidence, SherLIiC is much harder than existing testbeds because distributional evidence is of little utility in the classification of InfCands. We also show that, due to its construction, many of SherLIiC's correct InfCands are novel and missing from existing rule bases. We evaluate a number of strong baselines on SherLIiC, ranging from semantic vector space models to state of the art neural models of natural language inference (NLI). We show that SherLIiC poses a tough challenge to existing NLI systems.


Data Augmentation Generative Adversarial Networks

arXiv.org Machine Learning

Effective training of neural networks requires much data. In the low-data regime, parameters are underdetermined, and learnt networks generalise poorly. Data Augmentation alleviates this by using existing data more effectively. However standard data augmentation produces only limited plausible alternative data. Given there is potential to generate a much broader set of augmentations, we design and train a generative model to do data augmentation. The model, based on image conditional Generative Adversarial Networks, takes data from a source domain and learns to take any data item and generalise it to generate other within-class data items. As this generative process does not depend on the classes themselves, it can be applied to novel unseen classes of data. We show that a Data Augmentation Generative Adversarial Network (DAGAN) augments standard vanilla classifiers well. We also show a DAGAN can enhance few-shot learning systems such as Matching Networks. We demonstrate these approaches on Omniglot, on EMNIST having learnt the DAGAN on Omniglot, and VGG-Face data. In our experiments we can see over 13% increase in accuracy in the low-data regime experiments in Omniglot (from 69% to 82%), EMNIST (73.9% to 76%) and VGG-Face (4.5% to 12%); in Matching Networks for Omniglot we observe an increase of 0.5% (from 96.9% to 97.4%) and an increase of 1.8% in EMNIST (from 59.5% to 61.3%).