Goto

Collaborating Authors

 Nara Institute of Science and Technology


Data-Dependent Learning of Symmetric/Antisymmetric Relations for Knowledge Base Completion

AAAI Conferences

Embedding-based methods for knowledge base completion (KBC) learn representations of entities and relations in a vector space, along with the scoring function to estimate the likelihood of relations between entities. The learnable class of scoring functions is designed to be expressive enough to cover a variety of real-world relations, but this expressive comes at the cost of an increased number of parameters. In particular, parameters in these methods are superfluous for relations that are either symmetric or antisymmetric. To mitigate this problem, we propose a new L1 regularizer for Complex Embeddings, which is one of the state-of-the-art embedding-based methods for KBC. This regularizer promotes symmetry or antisymmetry of the scoring function on a relation-by-relation basis, in accordance with the observed data. Our empirical evaluation shows that the proposed method outperforms the original Complex Embeddings and other baseline methods on the FB15k dataset.


Temporal-Enhanced Convolutional Network for Person Re-Identification

AAAI Conferences

We propose a new neural network called Temporal-enhanced Convolutional Network (T-CN) for video-based person reidentification. For each video sequence of a person, a spatial convolutional subnet is first applied to each frame for representing appearance information, and then a temporal convolutional subnet links small ranges of continuous frames to extract local motion information. Such spatial and temporal convolutions together construct our T-CN based representation. Finally, a recurrent network is utilized to further explore global dynamics, followed by temporal pooling to generate an overall feature vector for the whole sequence. In the training stage, a Siamese network architecture is adopted to jointly optimize all the components with losses covering both identification and verification. In the testing stage, our network generates an overall discriminative feature representation for each input video sequence (whose length may vary a lot) in a feed-forward way, and even a simple Euclidean distance based matching can generate good re-identification results. Figure 1: The overall architecture of our proposed model. Experiments on the most widely used benchmark datasets demonstrate the superiority of our proposal, in comparison with the state-of-the-art.


Artificial Intelligence and Expertise: The Two Faces of the Same Artificial Performance Coin

AAAI Conferences

To ensure we do not forget relevant aspects of AI, we The field of Artificial Intelligence (AI) is fertile: it is at the present some key works which have already focused on same time the root of the dreams and deceptions of many defining (artificial) intelligence in Section 2. We then highlight people, a common feature in science fiction, and various the potential lack of cross-fertilisation they may be subject technical projects in many domains of application. Although to in Section 3 and consider the definition of human we may appreciate the rich emotions and ideas brought by expertise to draw a definition of human intelligence in Section a concept such as AI, some people are seriously working on 4. Next, we generalise these definitions to cover also artificial it in an attempt to produce autonomous agents able to meet agents in Section 5 and provide more details about the the various needs of different users. These projects, however, domain-generic data and processes of our definition of intelligence have faced several troubles and unfulfilled promises in in Section 6. We rely further on the expertise field in the history of the field, leading to shortenings of funding Section 7 by describing three kinds of measures of expertise, and years of research efforts lost (Franklin 2014). Despite mapping them to existing measures of intelligence, and suggesting the presence of "intrepid researchers" to advance the field, directions to investigate. Finally, Section 8 expands from an industrial point of view such projects were abandoned the discussion to a novel conception of the field of AI as a and considered as failures.


Non-Linear Similarity Learning for Compositionality

AAAI Conferences

Many NLP applications rely on the existence ofsimilarity measures over text data.Although word vector space modelsprovide good similarity measures between words,phrasal and sentential similarities derived from compositionof individual words remain as a difficult problem.In this paper, we propose a new method of ofnon-linear similarity learning for semantic compositionality.In this method, word representations are learnedthrough the similarity learning of sentencesin a high-dimensional space with kernel functions.On the task of predicting the semantic similarity oftwo sentences (SemEval 2014, Task 1),our method outperforms linear baselines,feature engineering approaches,recursive neural networks,and achieve competitive results with long short-term memory models.


Localized Centering: Reducing Hubness in Large-Sample Data

AAAI Conferences

Hubness has been recently identified as a problematic phenomenon occurring in high-dimensional space. In this paper, we address a different type of hubness that occurs when the number of samples is large. We investigate the difference between the hubness in high-dimensional data and the one in large-sample data. One finding is that centering, which is known to reduce the former, does not work for the latter. We then propose a new hub-reduction method, called localized centering. It is an extension of centering, yet works effectively for both types of hubness. Using real-world datasets consisting of a large number of documents, we demonstrate that the proposed method improves the accuracy of k-nearest neighbor classification.


Non-Linear Regression for Bag-of-Words Data via Gaussian Process Latent Variable Set Model

AAAI Conferences

Gaussian process (GP) regression is a widely used method for non-linear prediction.The performance of the GP regression depends on whether it can properly capture the covariance structure of target variables, which is represented by kernels between input data.However, when the input is represented as a set of features, e.g. bag-of-words, it is difficult to calculate desirable kernel values because the co-occurrence of different but relevant words cannot be reflected in the kernel calculation.To overcome this problem, we propose a Gaussian process latent variable set model (GP-LVSM), which is a non-linear regression model effective for bag-of-words data.With the GP-LVSM, a latent vector is associated with each word, and each document is represented as a distribution of the latent vectors for words appearing in the document. We efficiently represent the distributions by using the framework of kernel embeddings of distributions that can hold high-order moment information of distributions without need for explicit density estimation.By learning latent vectors so as to maximize the posterior probability, kernels that reflect relations between words are obtained, and also words are visualized in a low-dimensional space.In experiments using 25 item review datasets, we demonstrate the effectiveness of the GP-LVSM in prediction and visualization.


Investigating the Effectiveness of Laplacian-Based Kernels in Hub Reduction

AAAI Conferences

A “hub” is an object closely surrounded by, or very similar to, many other objects in the dataset. Recent studies by Radovanovi´c et al. indicate that in high dimensional spaces, hubs almost always emerge, and objects close to the data centroid tend to become hubs. In this paper, we show that the family of kernels based on the graph Laplacian makes all objects in the dataset equally similar to the centroid, and thus they are expected to make less hubs when used as a similarity measure. We investigate this hypothesis using both synthetic and real-world data. It turns out that these kernels suppress hubs in some cases but not always, and the results seem to be affected by the size of the data—a factor not discussed previously. However, for the datasets in which hubs are indeed reduced by the Laplacian-based kernels, these kernels work well in ranking and classification tasks. This result suggests that the amount of hubs, which can be readily computed in an unsupervised fashion, can be a yardstick of whether Laplacian-based kernels work effectively for a given data.


Transfer Learning for Multiple-Domain Sentiment Analysis — Identifying Domain Dependent/Independent Word Polarity

AAAI Conferences

Sentiment analysis is the task of determining the attitude (positive or negative) of documents. While the polarity of words in the documents is informative for this task, polarity of some words cannot be determined without domain knowledge. Detecting word polarity thus poses a challenge for multiple-domain sentiment analysis. Previous approaches tackle this problem with transfer learning techniques, but they cannot handle multiple source domains and multiple target domains. This paper proposes a novel Bayesian probabilistic model to handle multiple source and multiple target domains. In this model, each word is associated with three factors: Domain label, domain dependence/independence and word polarity. We derive an efficient algorithm using Gibbs sampling for inferring the parameters of the model, from both labeled and unlabeled texts. Using real data, we demonstrate the effectiveness of our model in a document polarity classification task compared with a method not considering the differences between domains. Moreover our method can also tell whether each word's polarity is domain-dependent or domain-independent. This feature allows us to construct a word polarity dictionary for each domain.