Goto

Collaborating Authors

 Government


Sensing Subjective Well-being from Social Media

arXiv.org Artificial Intelligence

Subjective Well-being(SWB), which refers to how people experience the quality of their lives, is of great use to public policy-makers as well as economic, sociological research, etc. Traditionally, the measurement of SWB relies on time-consuming and costly self-report questionnaires. Nowadays, people are motivated to share their experiences and feelings on social media, so we propose to sense SWB from the vast user generated data on social media. By utilizing 1785 users' social media data with SWB labels, we train machine learning models that are able to "sense" individual SWB from users' social media. Our model, which attains the state-by-art prediction accuracy, can then be used to identify SWB of large population of social media users in time with very low cost.


Human Activity Learning and Segmentation using Partially Hidden Discriminative Models

arXiv.org Machine Learning

Learning and understanding the typical patterns in the daily activities and routines of people from low-level sensory data is an important problem in many application domains such as building smart environments, or providing intelligent assistance. Traditional approaches to this problem typically rely on supervised learning and generative models such as the hidden Markov models and its extensions. While activity data can be readily acquired from pervasive sensors, e.g. in smart environments, providing manual labels to support supervised training is often extremely expensive. In this paper, we propose a new approach based on semi-supervised training of partially hidden discriminative models such as the conditional random field (CRF) and the maximum entropy Markov model (MEMM). We show that these models allow us to incorporate both labeled and unlabeled data for learning, and at the same time, provide us with the flexibility and accuracy of the discriminative framework. Our experimental results in the video surveillance domain illustrate that these models can perform better than their generative counterpart, the partially hidden Markov model, even when a substantial amount of labels are unavailable.


Impact of regularization on Spectral Clustering

arXiv.org Machine Learning

The performance of spectral clustering can be considerably improved via regularization, as demonstrated empirically in Amini et. al (2012). Here, we provide an attempt at quantifying this improvement through theoretical analysis. Under the stochastic block model (SBM), and its extensions, previous results on spectral clustering relied on the minimum degree of the graph being sufficiently large for its good performance. By examining the scenario where the regularization parameter $\tau$ is large we show that the minimum degree assumption can potentially be removed. As a special case, for an SBM with two blocks, the results require the maximum degree to be large (grow faster than $\log n$) as opposed to the minimum degree. More importantly, we show the usefulness of regularization in situations where not all nodes belong to well-defined clusters. Our results rely on a `bias-variance'-like trade-off that arises from understanding the concentration of the sample Laplacian and the eigen gap as a function of the regularization parameter. As a byproduct of our bounds, we propose a data-driven technique \textit{DKest} (standing for estimated Davis-Kahan bounds) for choosing the regularization parameter. This technique is shown to work well through simulations and on a real data set.


Propagating Regular Counting Constraints

AAAI Conferences

Constraints over finite sequences of variables are ubiquitous in sequencing and timetabling. This led to general modelling techniques and generic propagators, often based on deterministic finite automata (DFA) and their extensions. We consider counter-DFAs (cDFA), which provide concise models for regular counting constraints, that is constraints over the number of times a regular-language pattern occurs in a sequence. We show how to enforce domain consistency in polynomial time for at-most and at-least regular counting constraints based on the frequent case of a cDFA with only accepting states and a single counter that can be increased by transitions. We also show that the satisfaction of exact regular counting constraints is NP-hard and that an incomplete propagator for exact regular counting constraints is faster and provides more pruning than the existing propagator from (Beldiceanu, Carlsson, and Petit 2004). Finally, by avoiding the unrolling of the cDFA used by COSTREGULAR, the space complexity reduces from O(n · |Σ| · |Q|) to O(n · (|Σ| + |Q|)), where Σ is the alphabet and Q the state set of the cDFA.


Detecting Information-Dense Texts in Multiple News Domains

AAAI Conferences

We introduce the task of identifying information-dense texts,which report important factual information in direct, succinct manner. We describe a procedure that allows us to label automatically a large training corpus of New York Times texts.We train a classifier based on lexical, discourse and unlexicalized syntactic features and test its performance on a set of manually annotated articles from business, U.S. international relations, sports and science domains. Our results indicate that the task is feasible and that both syntactic and lexicalfeatures are highly predictive for the distinction. We observe considerable variation of prediction accuracy across domains and find that domain-specific models are more accurate.


On Dataless Hierarchical Text Classification

AAAI Conferences

In this paper, we systematically study the problem of dataless hierarchical text classification. Unlike standard text classification schemes that rely on supervised training, dataless classification depends on understanding the labels of the sought after categories and requires no labeled data. Given a collection of text documents and a set of labels, we show that understanding the labels can be used to accurately categorize the documents. This is done by embedding both labels and documents in a semantic space that allows one to compute meaningful semantic similarity between a document and a potential label. We show that this scheme can be used to support accurate multiclass classification without any supervision. We study several semantic representations and show how to improve the classification using bootstrapping. Our results show that bootstrapped dataless classification is competitive with supervised classification with thousands of labeled examples.


A Control Dichotomy for Pure Scoring Rules

AAAI Conferences

Scoring systems are an extremely important class of election systems. A length-m (so-called) scoring vector applies only to m-candidate elections. To handle general elections, one must use a family of vectors, one per length. The most elegant approach to making sure such families are "family-like'' is the recently introduced notion of (polynomial-time uniform) pure scoring rules, where each scoring vector is obtained from its precursor by adding one new coefficient. We obtain the first dichotomy theorem for pure scoring rules for a control problem. In particular, for constructive control by adding voters (CCAV), we show that CCAV is solvable in polynomial time for k-approval with k<=3, k-veto with k<=2, every pure scoring rule in which only the two top-rated candidates gain nonzero scores, and a particular rule that is a "hybrid" of 1-approval and 1-veto. For all other pure scoring rules, CCAV is NP-complete. We also investigate the descriptive richness of different models for defining pure scoring rules, proving how more rule-generation time gives more rules, proving that rationals give more rules than do the natural numbers, and proving that some restrictions previously thought to be "w.l.o.g." in fact do lose generality.


A Characterization of the Single-Peaked Single-Crossing Domain

AAAI Conferences

In other words, there is no perfect voting rule that one domains may admit efficient algorithms for social choice could always use, independently of the circumstances. However, problems that are hard for general preferences. This observation this result holds under the assumption that there are no has recently led to a new wave of interest in constraints on the voters' preferences. Thus, a common strategy restricted domains within the computational social choice to circumvent Arrow's theorem is to consider restricted community (Conitzer 2009; Walsh 2007; Faliszewski et al. preference domains, i.e., assume that the voters' preferences 2011; Brandt et al. 2010; Faliszewski, Hemaspaandra, and have additional structure. It may then be possible to show Hemaspaandra 2011; Betzler, Slinko, and Uhlmann 2013; that various negative consequences of Arrow's theorem do Cornaz, Galand, and Spanjaard 2012; 2013; Skowron et al. not hold.


Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes

AAAI Conferences

Indeed, to the best of our knowledge, we lack forgotten over time (Wu and Humberman 2007). For example, a probabilistic framework to model and predict the popularity videos on YouTube or stories on Digg gain their popularity dynamics of individual items. The reason behind this is by striving for views or votes (Szabo and Huberman partly illustrated in Figure 1, suggesting that the dynamical 2010); papers increase their visibility by competing for citations processes governing individual items appear too noisy to be from new papers (Ren et al. 2010; Wang, Song, and amenable to quantification. Barabási 2013); tweets or Hashtags in Twitter become more In this paper, we model the stochastic popularity dynamics popular as being retweeted (Hong, Dan, and Davison 2011) using reinforced Poisson processes, capturing simultaneously and so do webpages as being attached by incoming hyperlinks three key ingredients: fitness of an item, characterizing (Ratkiewicz et al. 2010). An ability to predict the popularity its inherent competitiveness against other items; a general of individual items within a dynamically evolving system temporal relaxation function, corresponding to the aging not only probes our understanding of complex systems, in the ability to attract new attentions; and a reinforcement but also has important implications in a wide range of domains, mechanism, documenting the well-known "rich-get-richer" from marketing and traffic control to policy making phenomenon. The benefit of the proposed model is threefold: and risk management. Despite recent advances of empirical (1) It models the arrival process of individual attentions methods, we lack a general modeling framework to predict directly in contrast to relying on aggregated popularity the popularity of individual items within a complex evolving time series; (2) As a generative probabilistic model, it can be system.


Imputation, Social Choice, and Partial Preferences

AAAI Conferences

Within the field of Artificial Intelligence (AI) research, Vote (STV) select winners from sets of ballot. For example, the subfield of Computational Social Choice considers under plurality candidate X has won this election by virtue the application of AI techniques to problems in Social of having being at the top of the largest number of ballots. Under STV Y would win instead. Starting in the early 1990's, computer scientists began to In this work, we model voters as having partial orderings take an interest in social choice. Initial work was concerned over the candidates for their preferences, instead of linear with circumventing the impossibility results implied orderings.