Asia
Labeling Complicated Objects: Multi-View Multi-Instance Multi-Label Learning
Nguyen, Cam-Tu (Nanjing University) | Wang, Xiaoliang (Nanjing University) | Liu, Jing (Institute of Automation Chinese Academy of Sciences) | Zhou, Zhi-Hua (Nanjing University)
Multi-Instance Multi-Label (MIML) is a learning framework where an example is associated with multiple labels and represented by a set of feature vectors (multiple instances). In the formalization of MIML learning, instances come from a single source (single view). To leverage multiple information sources (multi-view), we develop a multi-view MIML framework based on hierarchical Bayesian Network, and derive an effective learning algorithm based on variational inference. The model can naturally deal with examples in which some views could be absent (partial examples). On multi-view datasets, it is shown that our method is better than other multi-view and single-view approaches particularly in the presence of partial examples. On single-view benchmarks, extensive evaluation shows that our method is highly competitive or better than other MIML approaches on labeling examples and instances. Moreover, our method can effectively handle datasets with a large number of labels.
Detecting Information-Dense Texts in Multiple News Domains
Yang, Yinfei (Amazon Inc.) | Nenkova, Ani (University of Pennsylvania)
We introduce the task of identifying information-dense texts,which report important factual information in direct, succinct manner. We describe a procedure that allows us to label automatically a large training corpus of New York Times texts.We train a classifier based on lexical, discourse and unlexicalized syntactic features and test its performance on a set of manually annotated articles from business, U.S. international relations, sports and science domains. Our results indicate that the task is feasible and that both syntactic and lexicalfeatures are highly predictive for the distinction. We observe considerable variation of prediction accuracy across domains and find that domain-specific models are more accurate.
Lifetime Lexical Variation in Social Media
Liao, Lizi (Beijing Institute of Technology) | Jiang, Jing (Singapore Management University) | Ding, Ying (Singapore Management University) | Huang, Heyan (Beijing Institute of Technology) | Lim, Ee-Peng (Singapore Management University)
As the rapid growth of online social media attracts a large number of Internet users, the large volume of content generated by these users also provides us with an opportunity to study the lexical variation of people of different ages. In this paper, we present a latent variable model that jointly models the lexical content of tweets and Twitter users’ ages. Our model inherently assumes that a topic has not only a word distribution but also an age distribution. We propose a Gibbs-EM algorithm to perform inference on our model. Empirical evaluation shows that our model can learn meaningful age-specific topics such as “school” for teenagers and “health” for older people. Our model can also be used for age prediction and performs better than a number of baseline methods.
On Dataless Hierarchical Text Classification
Song, Yangqiu (University of Illinois at Urbana-Champaign) | Roth, Dan (University of Illinois at Urbana-Champaign)
In this paper, we systematically study the problem of dataless hierarchical text classification. Unlike standard text classification schemes that rely on supervised training, dataless classification depends on understanding the labels of the sought after categories and requires no labeled data. Given a collection of text documents and a set of labels, we show that understanding the labels can be used to accurately categorize the documents. This is done by embedding both labels and documents in a semantic space that allows one to compute meaningful semantic similarity between a document and a potential label. We show that this scheme can be used to support accurate multiclass classification without any supervision. We study several semantic representations and show how to improve the classification using bootstrapping. Our results show that bootstrapped dataless classification is competitive with supervised classification with thousands of labeled examples.
Learning Word Representation Considering Proximity and Ambiguity
Qiu, Lin (Shanghai Jiao Tong University) | Cao, Yong (Microsoft Research) | Nie, Zaiqing (Microsoft Research) | Yu, Yong (Shanghai Jiao Tong University) | Rui, Yong (Microsoft Research)
Distributed representations of words (aka word embedding) have proven helpful in solving natural language processing (NLP) tasks. Training distributed representations of words with neural networks has lately been a major focus of researchers in the field. Recent work on word embedding, the Continuous Bag-of-Words (CBOW) model and the Continuous Skip-gram (Skip-gram) model, have produced particularly impressive results, significantly speeding up the training process to enable word representation learning from large-scale data. However, both CBOW and Skip-gram do not pay enough attention to word proximity in terms of model or word ambiguity in terms of linguistics. In this paper, we propose Proximity-Ambiguity Sensitive (PAS) models (i.e. PAS CBOW and PAS Skip-gram) to produce high quality distributed representations of words considering both word proximity and ambiguity. From the model perspective, we introduce proximity weights as parameters to be learned in PAS CBOW and used in PAS Skip-gram. By better modeling word proximity, we reveal the strength of pooling-structured neural networks in word representation learning. The proximity-sensitive pooling layer can also be applied to other neural network applications that employ pooling layers. From the linguistics perspective, we train multiple representation vectors per word. Each representation vector corresponds to a particular group of POS tags of the word. By using PAS models, we achieved a 16.9% increase in accuracy over state-of-the-art models.
SOML: Sparse Online Metric Learning with Application to Image Retrieval
Gao, Xingyu (Chinese Academy of Sciences and Nanyang Technological University) | Hoi, Steven C.H. (Nanyang Technological University) | Zhang, Yongdong (Chinese Academy of Sciences) | Wan, Ji (Chinese Academy of Sciences and Nanyang Technological University) | Li, Jintao (Chinese Academy of Sciences)
Image similarity search plays a key role in many multimediaapplications, where multimedia data (such as images and videos) areusually represented in high-dimensional feature space. In thispaper, we propose a novel Sparse Online Metric Learning (SOML)scheme for learning sparse distance functions from large-scalehigh-dimensional data and explore its application to imageretrieval. In contrast to many existing distance metric learningalgorithms that are often designed for low-dimensional data, theproposed algorithms are able to learn sparse distance metrics fromhigh-dimensional data in an efficient and scalable manner. Ourexperimental results show that the proposed method achieves betteror at least comparable accuracy performance than thestate-of-the-art non-sparse distance metric learning approaches, butenjoys a significant advantage in computational efficiency andsparsity, making it more practical for real-world applications.
Accurate Household Occupant Behavior Modeling Based on Data Mining Techniques
Baptista, Márcia L. (Universidade de Lisboa) | Fang, Anjie (National Institute of Informatics / University of Bristol) | Prendinger, Helmut (National Institute of Informatics) | Prada, Rui (Universidade de Lisboa) | Yamaguchi, Yohei (Osaka University)
An important requirement of household energy simulation models is their accuracy in estimating energy demand and its fluctuations. Occupant behavior has a major impact upon energy demand. However, Markov chains, the traditional approach to model occupant behavior, (1) has limitations in accurately capturing the coordinated behavior of occupants and (2) is prone to over-fitting. To address these issues, we propose a novel approach that relies on a combination of data mining techniques. The core idea of our model is to determine the behavior of occupants based on nearest neighbor comparison over a database of sample data. Importantly, the model takes into account features related to the coordination of occupants' activities. We use a customized distance function suited for mixed categorical and numerical data. Further, association rule learning allows us to capture the coordination between occupants. Using real data from four households in Japan we are able to show that our model outperforms the traditional Markov chain model with respect to occupant coordination and generalization of behavior patterns.
Computing General First-Order Parallel and Prioritized Circumscription
Wan, Hai (Sun Yat-sen University) | Xiao, Zhanhao (Sun Yat-sen University) | Yuan, Zhenfeng (Sun Yat-sen University) | Zhang, Heng (University of Western Sydney) | Zhang, Yan (University of Western Sydney)
This paper focuses on computing general first-order parallel and prioritized circumscription with varying constants. We propose linear translations from general first-order circumscription to first-order theories under stable model semantics over arbitrary structures, including Tr_v for parallel circumscription and Tr^s_v for conjunction of parallel circumscriptions (further for prioritized circumscription). To improve the efficiency, we give an optimization \Gamma_{\exists} to reduce logic programs in size when eliminating existential quantifiers during the translations. Based on these results, a general first-order circumscription solver, named cfo2lp, is developed by calling answer set programming (ASP) solvers. Using circuit diagnosis problem and extended stable marriage problem as benchmarks, we compare cfo2lp with a propositional circumscription solver circ2dlp and an ASP solver with complex optimization metasp on efficiency. Experimental results demonstrate that for problems represented by first-order circumscription naturally and intuitively, cfo2lp can compute all solutions over finite structures. We also apply our approach to description logics with circumscription and repairs in inconsistent databases, which can be handled effectively.
A Characterization of the Single-Peaked Single-Crossing Domain
Elkind, Edith (University of Oxford, UK) | Faliszewski, Piotr (AGH University) | Skowron, Piotr (University of Warsaw)
In other words, there is no perfect voting rule that one domains may admit efficient algorithms for social choice could always use, independently of the circumstances. However, problems that are hard for general preferences. This observation this result holds under the assumption that there are no has recently led to a new wave of interest in constraints on the voters' preferences. Thus, a common strategy restricted domains within the computational social choice to circumvent Arrow's theorem is to consider restricted community (Conitzer 2009; Walsh 2007; Faliszewski et al. preference domains, i.e., assume that the voters' preferences 2011; Brandt et al. 2010; Faliszewski, Hemaspaandra, and have additional structure. It may then be possible to show Hemaspaandra 2011; Betzler, Slinko, and Uhlmann 2013; that various negative consequences of Arrow's theorem do Cornaz, Galand, and Spanjaard 2012; 2013; Skowron et al. not hold.
The Fisher Market Game: Equilibrium and Welfare
Brânzei, Simina (Aarhus University) | Chen, Yiling (Harvard University) | Deng, Xiaotie (Shanghai Jiao Tong University) | Filos-Ratsikas, Aris (Aarhus University) | Frederiksen, Søren Kristoffer Stiil (Aarhus University) | Zhang, Jie (University of Oxford)
The Fisher market model is one of the most fundamental resource allocation models in economics. In a Fisher market, the prices and allocations of goods are determined according to the preferences and budgets of buyers to clear the market. In a Fisher market game, however, buyers are strategic and report their preferences over goods; the market-clearing prices and allocations are then determined based on their reported preferences rather than their real preferences. We show that the Fisher market game always has a pure Nash equilibrium, for buyers with linear, Leontief, and Cobb-Douglas utility functions, which are three representative classes of utility functions in the important Constant Elasticity of Substitution (CES) family. Furthermore, to quantify the social efficiency, we prove Price of Anarchy bounds for the game when the utility functions of buyers fall into these three classes respectively.