AITopics

Predicting human activities is important for improving recommender systems or analyzing social relationships among users. Those human activities are usually repre- sented as multi-object relationships (e.g. user’s tagging activities for items or user’s tweeting activities at some locations). Since multi-object relationships are naturally represented as a tensor, tensor factorization is becom- ing more important for predicting users’ possible ac- tivities. However, its prediction accuracy is weak for ambiguous and/or sparsely observed objects. Our so- lution, Semantic data Representation for Tensor Fac- torization (SRTF), tackles these problems by incorpo- rating semantics into tensor factorization based on the following ideas: (1) It first links objects to vocabu- laries/taxonomies and resolves the ambiguity caused by objects that can be used for multiple purposes. (2) It next links objects to composite classes that merge classes in different kinds of vocabularies/taxonomies (e.g. classes in vocabularies for movie genres and those for directors) to avoid low prediction accuracy caused by rough-grained semantics. (3) It then lifts sparsely observed objects into their classes to solve the sparsity problem for rarely observed objects. To the best of our knowledge, this is the first study that leverages seman- tics to inject expert knowledge into tensor factorization. Experiments show that SRTF achieves up to 10% higher accuracy than state-of-the-art methods.

artificial intelligence, factorization, machine learning, (13 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > New York > Rensselaer County > Troy (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Genre: Research Report (0.66)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Mixing-Time Regularized Policy Gradient

Morimura, Tetsuro (IBM Research - Tokyo) | Osogami, Takayuki (IBM Research - Tokyo) | Shirai, Tomoyuki (Kyushu University)

Policy gradient reinforcement learning (PGRL) has been receiving substantial attention as a mean for seeking stochastic policies that maximize cumulative reward. However, the learning speed of PGRL is known to decrease substantially when PGRL explores the policies that give the Markov chains having long mixing time. We study a new approach of regularizing how the PGRL explores the policies by the use of the hitting time of the Markov chains. The hitting time gives an upper bound on the mixing time, and the proposed approach improves the learning efficiency by keeping the mixing time of the Markov chains short. In particular, we propose a method of temporal-difference learning for estimating the gradient of the hitting time. Numerical experiments show that the proposed method outperforms conventional methods of PGRL.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.77)

Pre-Trained Multi-View Word Embedding Using Two-Side Neural Network

Luo, Yong (Peking University) | Tang, Jian (Peking University) | Yan, Jun (Microsoft Research Asia) | Xu, Chao (Peking University) | Chen, Zheng (Microsoft Research Asia)

Word embedding aims to learn a continuous representation for each word. It attracts increasing attention due to its effectiveness in various tasks such as named entity recognition and language modeling. Most existing word embedding results are generally trained on one individual data source such as news pages or Wikipedia articles. However, when we apply them to other tasks such as web search, the performance suffers. To obtain a robust word embedding for different applications, multiple data sources could be leveraged. In this paper, we proposed a two-side multimodal neural network to learn a robust word embedding from multiple data sources including free text, user search queries and search click-through data. This framework takes the word embeddings learned from different data sources as pre-train, and then uses a two-side neural network to unify these embeddings. The pre-trained embeddings are obtained by adapting the recently proposed CBOW algorithm. Since the proposed neural network does not need to re-train word embeddings for a new task, it is highly scalable in real world problem solving. Besides, the network allows weighting different sources differently when applied to different application tasks. Experiments on two real-world applications including web search ranking and word similarity measuring show that our neural network with multiple sources outperforms state-of-the-art word embedding algorithm with each individual source. It also outperforms other competitive baselines using multiple sources.

artificial intelligence, machine learning, natural language, (21 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country: Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Liu, Xinwang (National University of Defense Technology, Changsha) | Wang, Lei (University of Wollongong) | Zhang, Jian (University of Technology Sydney) | Yin, Jianping (National University Defense Technology, Changsha)

Sample-adaptive Multiple Kernel Learning

Existing multiple kernel learning (MKL) algorithms \textit{indiscriminately} apply a same set of kernel combination weights to all samples. However, the utility of base kernels could vary across samples and a base kernel useful for one sample could become noisy for another. In this case, rigidly applying a same set of kernel combination weights could adversely affect the learning performance. To improve this situation, we propose a sample-adaptive MKL algorithm, in which base kernels are allowed to be adaptively switched on/off with respect to each sample. We achieve this goal by assigning a latent binary variable to each base kernel when it is applied to a sample. The kernel combination weights and the latent variables are jointly optimized via margin maximization principle. As demonstrated on five benchmark data sets, the proposed algorithm consistently outperforms the comparable ones in the literature.

artificial intelligence, machine learning, optimization problem, (16 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
Oceania > Australia (0.04)
Asia > China > Hunan Province > Changsha (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Partial Multi-View Clustering

Li, Shao-Yuan (Nanjing University) | Jiang, Yuan (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)

Real data are often with multiple modalities or comingfrom multiple channels, while multi-view clusteringprovides a natural formulation for generating clustersfrom such data. Previous studies assumed that each exampleappears in all views, or at least there is one viewcontaining all examples. In real tasks, however, it is oftenthe case that every view suffers from the missing ofsome data and therefore results in many partial examples,i.e., examples with some views missing. In this paper,we present possibly the first study on partial multiviewclustering. Our proposed approach, PVC, worksby establishing a latent subspace where the instancescorresponding to the same example in different viewsare close to each other, and similar instances (belongingto different examples) in the same view should bewell grouped. Experiments on two-view data demonstratethe advantages of our proposed approach.

data mining, information, machine learning, (16 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Texas (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > Illinois (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Data Science > Data Mining (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

Feature-Cost Sensitive Learning with Submodular Trees of Classifiers

Kusner, Matt (Washington University in St. Louis) | Chen, Wenlin (Washington University in St. Louis) | Zhou, Quan (Tsinghua University) | Xu, Zhixiang (Eddie) (Washington University in St. Louis) | Weinberger, Kilian (Washington University in St. Louis) | Chen, Yixin (Washington University in St. Louis)

During the past decade, machine learning algorithms have become commonplace in large-scale real-world industrial applications. In these settings, the computation time to train and test machine learning algorithms is a key consideration. At training-time the algorithms must scale to very large data set sizes.At testing-time, the cost of feature extraction can dominate the CPU runtime. Recently, a promising method was proposed to account for the feature extraction cost at testing time, called Cost-sensitive Tree of Classifiers (CSTC). Although the CSTC problem is NP-hard, the authors suggest an approximation through a mixed-norm relaxation across many classifiers. This relaxation is slow to train and requires involved optimization hyperparameter tuning. We propose a different relaxation using approximate submodularity, called Approximately Submodular Tree of Classifiers (ASTC). ASTC is much simpler to implement, yields equivalent results but requires no optimization hyperparameter tuning and is up to two orders of magnitude faster to train.

artificial intelligence, data mining, machine learning, (16 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York (0.04)
North America > United States > Iowa > Dubuque County > Dubuque (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report (0.54)
Summary/Review (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Data Science > Data Mining > Feature Extraction (0.54)

Constructing Symbolic Representations for High-Level Planning

Konidaris, George (MIT CSAIL) | Kaelbling, Leslie (MIT) | Lozano-Perez, Tomas (MIT)

We consider the problem of constructing a symbolic description of a continuous, low-level environment for use in planning. We show that symbols that can represent the preconditions and effects of an agent's actions are both necessary and sufficient for high-level planning. This eliminates the symbol design problem when a representation must be constructed in advance, and in principle enables an agent to autonomously learn its own symbolic representations. The resulting representation can be converted into PDDL, a canonical high-level planning representation that enables very fast planning.

artificial intelligence, machine learning, representation, (19 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Singapore (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Kong, Deguang (University of Texas Arlington) | Ding, Chris (University of Texas Arlington)

Non-Convex Feature Learning via Lp,inf Operator

We present a feature selection method for solving sparse regularization problem, which hasa composite regularization of $\ell_p$ norm and $\ell_{\infty}$ norm.We use proximal gradient method to solve this \L1inf operator problem, where a simple but efficient algorithm is designed to minimize a relatively simple objective function, which contains a vector of $\ell_2$ norm and $\ell_\infty$ norm. Proposed method brings some insight for solving sparsity-favoring norm, andextensive experiments are conducted to characterize the effect of varying $p$ and to compare with other approaches on real world multi-class and multi-label datasets.

artificial intelligence, machine learning, selection, (16 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Texas > Tarrant County > Arlington (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Kocák, Tomáš (INRIA Lille - Nord Europe) | Valko, Michal (INRIA Lille - Nord Europe) | Munos, Rémi (INRIA Lille - Nord Europe and Microsoft Research, New England, USA) | Agrawal, Shipra (Microsoft Research, Bangalore)

Spectral Thompson Sampling

Thompson Sampling (TS) has surged a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit problem, where the payoffs of the choices are smooth given an underlying graph. In this setting, each choice is a node of a graph and the expected payoffs of the neighboring nodes are assumed to be similar. Although the setting has application both in recommender systems and advertising, the traditional algorithms would scale poorly with the number of choices. For that purpose we consider an effective dimension d, which is small in real-world graphs. We deliver the analysis showing that the regret of SpectralTS scales as d\sqrt(T \ln N) with high probability, where T is the time horizon and N is the number of choices. Since a d\sqrt(T \ln N) regret is comparable to the known results, SpectralTS offers a computationally more efficient alternative. We also show that our algorithm is competitive on both synthetic and real-world data.

artificial intelligence, data mining, machine learning, (18 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)

Kanagawa, Motonobu (Graduate University for Advanced Studies) | Nishiyama, Yu (The Institute of Statistical Mathematics) | Gretton, Arthur (University College London) | Fukumizu, Kenji (The Institute of Statistical Mathematics)

Monte Carlo Filtering Using Kernel Embedding of Distributions

Recent advances of kernel methods have yielded a framework for representing probabilities using a reproducing kernel Hilbert space, called kernel embedding of distributions. In this paper, we propose a Monte Carlo filtering algorithm based on kernel embeddings. The proposed method is applied to state-space models where sampling from the transition model is possible, while the observation model is to be learned from training samples without assuming a parametric model. As a theoretical basis of the proposed method, we prove consistency of the Monte Carlo method combined with kernel embeddings. Experimental results on synthetic models and real vision-based robot localization confirm the effectiveness of the proposed approach.

artificial intelligence, bayesian inference, machine learning, (19 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Europe > United Kingdom (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)