Asia
Cognitive Modelling for Predicting Examinee Performance
Wu, Runze (University of Science and Technology of China) | Liu, Qi (University of Science and Technology of China) | Liu, Yuping (University of Science and Technology of China) | Chen, Enhong (University of Science and Technology of China) | Su, Yu (Anhui USTC iFLYTEK Co., Ltd.) | Chen, Zhigang (Anhui USTC iFLYTEK Co., Ltd., China) | Hu, Guoping (Anhui USTC iFLYTEK Co., Ltd., China)
Cognitive modelling can discover the latent characteristics of examinees for predicting their performance (i.e. scores) on each problem. As cognitive modelling is important for numerous applications, e.g. personalized remedy recommendation, some solutions have been designed in the literature. However, the problem of extracting information from both objective and subjective problems to get more precise and interpretable cognitive analysis is still underexplored. To this end, we propose a fuzzy cognitive diagnosis framework (FuzzyCDF) for examinees' cognitive modelling with both objective and subjective problems. Specifically, to handle the partially correct responses on subjective problems, we first fuzzify the skill proficiency of examinees. Then, we combine fuzzy set theory and educational hypotheses to model the examinees' mastery on the problems. Further, we simulate the generation of examination scores by considering both slip and guess factors. Extensive experiments on three real-world datasets prove that FuzzyCDF can predict examinee performance more effectively, and the output of FuzzyCDF is also interpretative.
Analysis of Sampling Algorithms for Twitter
Palguna, Deepan Subrahmanian (Purdue University) | Joshi, Vikas (IBM India Research Lab) | Chakaravarthy, Venkatesan (IBM India Research Lab) | Kothari, Ravi (IBM India Research Lab) | Subramaniam, LV (IBM India Research Lab)
The daily volume of Tweets in Twitter is around 500 million, and the impact of this data on applications ranging from public safety, opinion mining, news broadcast, etc., is increasing day by day. Analyzing large volumes of Tweets for various applications would require techniques that scale well with the number of Tweets. In this work we come up with a theoretical formulation for sampling Twitter data. We introduce novel statistical metrics to quantify the statistical representativeness of the Tweet sample, and derive sufficient conditions on the number of samples needed for obtaining highly representative Tweet samples. These new statistical metrics quantify the representativeness or goodness of the sample in terms of frequent keyword identification and in terms of restoring public sentiments associated with these keywords. We use uniform random sampling with replacement as our algorithm, and sampling could serve as a first step before using other sophisticated summarization methods to generate summaries for human use. We show that experiments conducted on real Twitter data agree with our bounds. In these experiments, we also compare different kinds of random sampling algorithms. Our bounds are attractive since they do not depend on the total number of Tweets in the universe. Although our ideas and techniques are specific to Twitter, they could find applications in other areas as well.
Mobility Profiling for User Verification with Anonymized Location Data
Lin, Miao (Institute for Infocomm Research, A*STAR) | Cao, Hong (McLaren Applied Technologies, APAC) | Zheng, Vincent (Advanced Digital Sciences Center, University of Illinois at Urbana-Champaign) | Chang, Kevin Chen-Chuan (Advanced Digital Sciences Center, University of Illinois at Urbana-Champaign) | Krishnaswamy, Shonali (Institute for Infocomm Research, A*STAR, Singapore)
Mobile user verification is to authenticate whether a given user is the legitimate user of a smartphone device. Unlike the current methods that commonly require users active cooperation, such as entering a short pin or a one-stroke draw pattern, we propose a new passive verification method that requires minimal imposition of users through modelling users subtle mobility patterns. Specifically, our method computes the statistical ambience features on WiFi and cell tower data from location anonymized data sets and then we customize Hidden Markov Model (HMM) to capture the spatial-temporal patterns of each user's mobility behaviors. Our learned model is subsequently validated and applied to verify a test user in a time-evolving manner through sequential likelihood test. Experimentally, our method achieves 72% verification accuracy with less than a day's data and a detection rate of 94% of illegitimate users with only 2 hours of selected data. As the first verification method that models users' mobility pattern on location-anonymized smartphone data, our achieved result is significant showing the good possibility of leveraging such information for live user authentication.
Joint Learning of Constituency and Dependency Grammars by Decomposed Cross-Lingual Induction
Jiang, Wenbin (Chinese Academy of Sciences) | Liu, Qun (Chinese Academy of Sciences and Dublin City University) | Supnithi, Thepchai (National Electronics and Computer Technology Center)
Cross-lingual induction aims to acquire for one language some linguistic structures resorting to annotations from another language. It works well for simple structured predication problems such as part-of-speech tagging and dependency parsing, but lacks of significant progress for more complicated problems such as constituency parsing and deep semantic parsing, mainly due to the structural non-isomorphism between languages. We propose a decomposed projection strategy for cross-lingual induction, where cross-lingual projection is performed in unit of fundamental decisions of the structured predication. Compared with the structured projection that projects the complete structures, decomposed projection achieves better adaptation of non-isomorphism between languages and efficiently acquires the structured information across languages, thus leading to better performance. For joint cross-lingual induction of constituency and dependency grammars, decomposed cross-lingual induction achieves very significant improvement in both constituency and dependency grammar induction.
Multi-Label Active Learning: Query Type Matters
Huang, Sheng-Jun (Nanjing University of Aeronautics and Astronautics) | Chen, Songcan (Nanjing University of Aeronautics and Astronautics) | Zhou, Zhi-Hua (Nanjing University)
Active learning reduces the labeling cost by selectively querying the most valuable information from the annotator. It is essentially important for multi-label learning, where the labeling cost is rather high because each object may be associated with multiple labels. Existing multi-label active learning (MLAL) research mainly focuses on the task of selecting instances to be queried. In this paper, we disclose for the first time that the query type, which decides what information to query for the selected instance, is more important. Based on this observation, we propose a novel MLAL framework to query the relevance ordering of label pairs, which gets richer information from each query and requires less expertise of the annotator. By incorporating a simple selection strategy and a label ranking model into our framework, the proposed approach can reduce the labeling effort of annotators significantly. Experiments on 20 benchmark datasets and a manually labeled real data validate that our approach not only achieves superior performance on classification, but also provides accurate ranking for relevant labels.
On the Consistency of AUC Pairwise Optimization
Gao, Wei (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)
AUC (Area Under ROC Curve) has been an important critrion widely used in diversity learning tasks. To optimize AUC, many learning approaches have been developed, most working with pairwise surrogate losses. Thus, it is important to study the AUC consistency based on minimizing pairwise surrogate losses. In this paper, we introduce the generalized calibration for AUC optimization, and prove that it is a necessary condition for AUC consistency. We then provide a new sufficient condition for AUC consistency, and show its usefulness in studying the consistency of various surrogate losses, as well as the invetion of new consistent losses. Further, we derive regret bounds for exponential and logistic losses, and present regret bounds for more general surrogate losses in realizable setting. Finally, we prove regret bounds that disclose the equivalence between the pairwise exponential loss of AUC and the univariate exponential loss of accuracy.
Greedy Structure Search for Sum-Product Networks
Dennis, Aaron (Brigham Young University) | Ventura, Dan (Brigham Young University)
Sum-product networks (SPNs) are rooted, directed acyclic graphs (DAGs) of sum and product nodes with well-defined probabilistic semantics. Moreover, exact inference in the distribution represented by an SPN is guaranteed to take linear time in the size of the DAG. In this paper we introduce an algorithm that learns the structure of an SPN using a greedy search approach. It incorporates methods used in a previous SPN structure-learning algorithm, but, unlike the previous algorithm, is not limited to learning tree-structured SPNs. Several proven ideas from circuit complexity theory along with our experimental results provide evidence for the advantages of SPNs with less-restrictive, non-tree structures.
Dissecting German Grammar and Swiss Passports: Open-Domain Decomposition of Compositional Entries in Large-Scale Knowledge Repositories
Pasca, Marius (Google Inc.) | Buisman, Hylke (Google Inc.)
This paper presents a weakly supervised method that decomposes potentially compositional topics (Swiss passport) into zero or more constituent topics (Switzerland, Passport), where all topics are entries in a knowledge repository. The method increases the connectivity of the knowledge repository and, more importantly, identifies the constituent topics whose meaning can be later aggregated into the meaning of the compositional topics. By exploiting evidence within Wikipedia articles, the method acquires constituent topics of Freebase topics at precision and recall above 0.60, over multiple human-annotated evaluation sets.
The Complexity of MAP Inference in Bayesian Networks Specified Through Logical Languages
Maua, Denis Deratani (Universidade de Sao Paulo) | Campos, Cassio Polpo de (Queen's University Belfast) | Cozman, Fabio Gagliardi (Universidade de Sao Paulo)
We study the computational complexity of finding maximum a posteriori configurations in Bayesian networks whose probabilities are specified by logical formulas. This approach leads to a fine grained study in which local information such as context-sensitive independence and determinism can be considered. It also allows us to characterize more precisely the jump from tractability to NP-hardness and beyond, and to consider the complexity introduced by evidence alone.
A Simple Probabilistic Extension of Modal Mu-calculus
Liu, Wanwei (National University of Defense Technology) | Song, Lei (University of Technology Sydeny) | Wang, Ji (National University of Defense Technology) | Zhang, Lijun (Chinese Academy of Sciences)
Probabilistic systems are an important theme in AI domain. As the specification language, PCTL is the most frequently used logic for reasoning about probabilistic properties. In this paper, we present a natural and succinct probabilistic extension of Mu-calculus, another prominent logic in the concurrency theory. We study the relationship with PCTL. Surprisingly, the expressiveness is highly orthogonal with PCTL. The proposed logic captures some useful properties which cannot be expressed in PCTL. We investigate the model checking and satisfiability problem, and show that the model checking problem is in UP and co-UP, and the satisfiability checking can be decided via reducing into solving parity games. This is in contrast to PCTL as well, whose satisfiability checking is still an open problem.