Supervised Learning
Generalized Relation Learning with Semantic Correlation Awareness for Link Prediction
Zhang, Yao, Zhang, Xu, Wang, Jun, Liang, Hongru, Lei, Wenqiang, Sun, Zhe, Jatowt, Adam, Yang, Zhenglu
Developing link prediction models to automatically complete knowledge graphs has recently been the focus of significant research interest. The current methods for the link prediction taskhavetwonaturalproblems:1)the relation distributions in KGs are usually unbalanced, and 2) there are many unseen relations that occur in practical situations. These two problems limit the training effectiveness and practical applications of the existing link prediction models. We advocate a holistic understanding of KGs and we propose in this work a unified Generalized Relation Learning framework GRL to address the above two problems, which can be plugged into existing link prediction models. GRL conducts a generalized relation learning, which is aware of semantic correlations between relations that serve as a bridge to connect semantically similar relations. After training with GRL, the closeness of semantically similar relations in vector space and the discrimination of dissimilar relations are improved. We perform comprehensive experiments on six benchmarks to demonstrate the superior capability of GRL in the link prediction task. In particular, GRL is found to enhance the existing link prediction models making them insensitive to unbalanced relation distributions and capable of learning unseen relations.
Convergence dynamics of Generative Adversarial Networks: the dual metric flows
Fitting neural networks often resorts to stochastic (or similar) gradient descent which is a noise-tolerant (and efficient) resolution of a gradient descent dynamics. It outputs a sequence of networks parameters, which sequence evolves during the training steps. The gradient descent is the limit, when the learning rate is small and the batch size is infinite, of this set of increasingly optimal network parameters obtained during training. In this contribution, we investigate instead the convergence in the Generative Adversarial Networks used in machine learning. We study the limit of small learning rate, and show that, similar to single network training, the GAN learning dynamics tend, for vanishing learning rate to some limit dynamics. This leads us to consider evolution equations in metric spaces (which is the natural framework for evolving probability laws)that we call dual flows. We give formal definitions of solutions and prove the convergence. The theory is then applied to specific instances of GANs and we discuss how this insight helps understand and mitigate the mode collapse.
Intrinsic persistent homology via density-based metric learning
Borghini, Eugenio, Fernรกndez, Ximena, Groisman, Pablo, Mindlin, Gabriel
We address the problem of estimating intrinsic distances in a manifold from a finite sample. We prove that the metric space defined by the sample endowed with a computable metric known as sample Fermat distance converges a.s. in the sense of Gromov-Hausdorff. The limiting object is the manifold itself endowed with the population Fermat distance, an intrinsic metric that accounts for both the geometry of the manifold and the density that produces the sample. This result is applied to obtain sample persistence diagrams that converge towards an intrinsic persistence diagram. We show that this method outperforms more standard approaches based on Euclidean norm with theoretical results and computational experiments.
Japan COVID-19 cases set single-day record
Japan shattered the nationwide record for COVID-19 cases in a day on Wednesday, registering 2,746 cases of the deadly virus as of 6 p.m., public broadcaster NHK reported. The day saw a spate of records in several prefectures, including 245 cases in Aichi, 75 in Kyoto, 72 in Hiroshima, 49 in Gunma and 21 in Oita, and comes amid a recent surge in infections that have prompted concern among regional governments and health authorities. Tokyo, meanwhile, reported 572 new cases -- the second highest daily total ever -- while the number of serious cases dipped by one from a day earlier to 59. The news came a day after the nationwide death toll hit a single-day record of 47 and as serious cases also hit an all-time daily high of 536, according to the health ministry. The capital's daily figure on Wednesday, which was just shy of the record 584 cases recorded last Saturday, was based on 1,428 tests, the Tokyo Metropolitan Government said in a statement.
A PAC-Bayesian Perspective on Structured Prediction with Implicit Loss Embeddings
Cantelobre, Thรฉophile, Guedj, Benjamin, Pรฉrez-Ortiz, Marรญa, Shawe-Taylor, John
Many practical machine learning tasks can be framed as Structured prediction problems, where several output variables are predicted and considered interdependent. Recent theoretical advances in structured prediction have focused on obtaining fast rates convergence guarantees, especially in the Implicit Loss Embedding (ILE) framework. PAC-Bayes has gained interest recently for its capacity of producing tight risk bounds for predictor distributions. This work proposes a novel PAC-Bayes perspective on the ILE Structured prediction framework. We present two generalization bounds, on the risk and excess risk, which yield insights into the behavior of ILE predictors. Two learning algorithms are derived from these bounds.
LCS Graph Kernel Based on Wasserstein Distance in Longest Common Subsequence Metric Space
Huang, Jianming, Fang, Zhongxi, Kasai, Hiroyuki
For graph classification tasks, many methods use a common strategy to aggregate information of vertex neighbors. Although this strategy provides an efficient means of extracting graph topological features, it brings excessive amounts of information that might greatly reduce its accuracy when dealing with large-scale neighborhoods. Learning graphs using paths or walks will not suffer from this difficulty, but many have low utilization of each path or walk, which might engender information loss and high computational costs. To solve this, we propose a graph kernel using a longest common subsequence (LCS kernel) to compute more comprehensive similarity between paths and walks, which resolves substructure isomorphism difficulties. We also combine it with optimal transport theory to extract more in-depth features of graphs. Furthermore, we propose an LCS metric space and apply an adjacent point merge operation to reduce its computational costs. Finally, we demonstrate that our proposed method outperforms many state-of-the-art graph kernel methods.
Unsupervised Word Translation Pairing using Refinement based Point Set Registration
Oprea, Silviu, Dutta, Sourav, Assem, Haytham
Cross-lingual alignment of word embeddings play an important role in knowledge transfer across languages, for improving machine translation and other multi-lingual applications. Current unsupervised approaches rely on similarities in geometric structure of word embedding spaces across languages, to learn structure-preserving linear transformations using adversarial networks and refinement strategies. However, such techniques, in practice, tend to suffer from instability and convergence issues, requiring tedious fine-tuning for precise parameter setting. This paper proposes BioSpere, a novel framework for unsupervised mapping of bi-lingual word embeddings onto a shared vector space, by combining adversarial initialization and refinement procedure with point set registration algorithm used in image processing. We show that our framework alleviates the shortcomings of existing methodologies, and is relatively invariant to variable adversarial learning performance, depicting robustness in terms of parameter choices and training losses. Experimental evaluation on parallel dictionary induction task demonstrates state-of-the-art results for our framework on diverse language pairs.
AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries
Hu, Qianjiang, Wang, Xiao, Hu, Wei, Qi, Guo-Jun
Contrastive learning relies on constructing a collection of negative examples that are sufficiently hard to discriminate against positive queries when their representations are self-trained. Existing contrastive learning methods either maintain a queue of negative samples over minibatches while only a small portion of them are updated in an iteration, or only use the other examples from the current minibatch as negatives. They could not closely track the change of the learned representation over iterations by updating the entire queue as a whole, or discard the useful information from the past minibatches. Alternatively, we present to directly learn a set of negative adversaries playing against the self-trained representation. Two players, the representation network and negative adversaries, are alternately updated to obtain the most challenging negative examples against which the representation of positive queries will be trained to discriminate. We further show that the negative adversaries are updated towards a weighted combination of positive queries by maximizing the adversarial contrastive loss, thereby allowing them to closely track the change of representations over time. Experiment results demonstrate the proposed Adversarial Contrastive (AdCo) model not only achieves superior performances with little computational overhead to the state-of-the-art contrast models, but also can be pretrained more rapidly with fewer epochs.
On InstaHide, Phase Retrieval, and Sparse Matrix Factorization
Chen, Sitan, Song, Zhao, Zhuo, Danyang
In this work, we examine the security of InstaHide, a scheme recently proposed by [Huang, Song, Li and Arora, ICML'20] for preserving the security of private datasets in the context of distributed learning. To generate a synthetic training example to be shared among the distributed learners, InstaHide takes a convex combination of private feature vectors and randomly flips the sign of each entry of the resulting vector with probability 1/2. A salient question is whether this scheme is secure in any provable sense, perhaps under a plausible hardness assumption and assuming the distributions generating the public and private data satisfy certain properties. We show that the answer to this appears to be quite subtle and closely related to the average-case complexity of a new multi-task, missing-data version of the classic problem of phase retrieval. Motivated by this connection, we design a provable algorithm that can recover private vectors using only the public vectors and synthetic vectors generated by InstaHide, under the assumption that the private and public vectors are isotropic Gaussian.