AITopics

User-contributed messages on social media sites such as Twitter have emerged aspowerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events andnon-event messages. Our approach relies on a rich family of aggregatestatistics of topically similar message clusters. Large-scale experiments over millions of Twitter messages show the effectiveness of our approach for surfacing real-world event content on Twitter.

artificial intelligence, classifier, machine learning, (18 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Iran (0.04)
Asia > China (0.04)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Improving Text Clustering with Social Tagging

Ares, M. Eduardo (University of A Coruña) | Parapar, Javier (University of A Coruña) | Barreiro, Álvaro (University of A Coruña)

Another important question is the absoluteness of the constraints. Lately several web-based tagging systems such as Technorati, Even if we use this approach to turn tags into constraints, Flickr or Delicious have become very popular. In this a fair amount of them are bound to be inaccurate paper we will exploit the information created by the community (i.e., linking documents which should not be in the same in Delicious: a social bookmarking service where cluster) until a high value of the parameter t, due to the polysemy the users can save the URLs of their favourite webpages of the terms used as tags or to differences in the criteria offering also the possibility of associating tags to them. of the taggers. Consequently, we have used soft positive On the other hand the clustering methods are a very important constraints, meaning that the documents affected by one of data mining tool in order to exploit the knowledge them are likely to be in the same cluster, without forcing the present in data collections. In the last years a new family of clustering algorithm to actually put them so.

artificial intelligence, constraint, machine learning, (20 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California (0.04)
Europe > Spain > Galicia > A Coruña Province > A Coruña (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Prominence Ranking in Graphs with Community Structure

Adali, Sibel (Rensselaer Polytechnic Institute) | Lu, Xiaohui (Rensselaer Polytechnic Institute) | Magdon-Ismail, Malik (Rensselaer Polytechnic Institute) | Purnell, Jonathan (Rensselaer Polytechnic Institute)

We consider prominence ranking in graphs involving actors, their artifacts and the artifact groups. When multiple actors contributing to an artifact constitutes a social tie, associations between the artifacts can be used to infer prominence among actors. This is because prominent actors will tend to collaborate on prominent artifacts, and prominent artifacts will be associated with other prominent artifacts. Our testbed example is the DBLP co-authorship graph: multiple authors (the actors) collaborate to publish research papers (the artifacts); collaboration is the social tie. Papers have prominence themselves (eg. quality and impact of the work) and the prominence of the venues are tied to the prominence of the papers in them. We use our methods to infer prominence based on the venue-based associations of papers, and compare our rankings with external citation based measures of prominence. We compare with numerous other ranking algorithms, and show that the ranking performance gain from using the venues is statistically significant. What if there are no natural artifact groups like venues? We develop a new algorithm which uses discovered artifact groups. Our approach consists of two steps. First, we find artifact groups by linking artifacts with common contributors. Note that instead of finding communities of actors, we consider communities of artifacts. We then use these grouped artifacts in the prominence ranking algorithm. We consider different methods for obtaining the artifact groups, in particular a very efficient embedding based algorithm for graph clustering and show the effectiveness of our method in improving the ranking of actors. The inferred groups are as good as or better than the natural conference venues for DBLP.

artifact, artificial intelligence, machine learning, (18 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Government (0.47)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Scalable Event-Based Clustering of Social Media Via Record Linkage Techniques

Reuter, Timo (CITEC, University of Bielefeld) | Cimiano, Philipp (CITEC, University of Bielefeld) | Drumond, Lucas (University of Hildesheim) | Buza, Krisztian (University of Hildesheim) | Schmidt-Thieme, Lars (University of Hildesheim)

We tackle the problem of grouping content available in social media applications such as Flickr, Youtube, Panoramino etc. into clusters of documents describing the same event. This task has been referred to as event identiﬁcation before. We present a new formalization of the event identiﬁcation task as a record linkage problem and show that this formulation leads to a principled and highly efﬁcient solution to the problem. We present results on two datasets derived from Flickr — last.fm and upcoming — comparing the results in terms of Normalized Mutual Information and F-Measure with respect to several baselines, showing that a record linkage approach outperforms all baselines as well as a state-of-the-art system. We demonstrate that our approach can scale to large amounts of data, reducing the processing time considerably compared to a state-of-the-art approach. The scalability is achieved by applying an appropriate blocking strategy and relying on a Single Linkage clustering algorithm which avoids the exhaustive computation of pairwise similarities.

artificial intelligence, data mining, machine learning, (17 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

Europe > Germany (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

DuBois, Christopher (University of California, Irvine) | Foulds, James (University of California, Irvine) | Smyth, Padhraic (University of California, Irvine)

Latent Set Models for Two-Mode Network Data

Two-mode networks are a natural representation for many kinds of relational data. These networks are bipartite graphs consisting of two distinct sets ("modes") of entities. For example, one can model multiple recipient email data as a two-mode network of (a) individuals and (b) the emails that they send or receive. In this work we present a statistical model for two-mode network data which posits that individuals belong to latent sets and that the members of a particular set tend to co-appear. We show how to infer these latent sets from observed data using a Markov chain Monte Carlo inference algorithm. We apply the model to the Enron email corpus, using it to discover interpretable latent structure as well as evaluating its predictive accuracy on a missing data task. Extensions to the model are discussed that incorporate additional side information such as the email's sender or text content, further improving the accuracy of the model.

artificial intelligence, bayesian inference, machine learning, (20 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

Asia > Middle East > Jordan (0.05)
Europe > France (0.05)
North America > United States > California > Orange County > Irvine (0.04)
(2 more...)

Industry:

Telecommunications > Networks (0.61)
Information Technology > Networks (0.61)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Political Polarization on Twitter

In this study we investigate how social media shape the networked public sphere and facilitate communication between communities with different political orientations. We examine two networks of political communication on Twitter, comprised of more than 250,000 tweets from the six weeks leading up to the 2010 U.S. congressional midterm elections. Using a combination of network clustering algorithms and manually-annotated data we demonstrate that the network of political retweets exhibits a highly segregated partisan structure, with extremely limited connectivity between left- and right-leaning users. Surprisingly this is not the case for the user-to-user mention network, which is dominated by a single politically heterogeneous cluster of users in which ideologically-opposed individuals interact at a much higher rate compared to the network of retweets. To explain the distinct topologies of the retweet and mention networks we conjecture that politically motivated individuals provoke interaction by injecting partisan content into information streams whose primary audience consists of ideologically-opposed users. We conclude with statistical evidence in support of this hypothesis.

artificial intelligence, machine learning, social media, (18 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

North America > Mexico (0.14)
North America > United States > Hawaii (0.04)
Asia > Middle East > Israel (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Government > Voting & Elections (0.48)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Location3: How Users Share and Respond to Location-Based Data on Social

Chang, Jonathan (Facebook) | Sun, Eric (Facebook)

In August 2010 Facebook launched Places, a location-based service that allows users to check into points of interest and share their physical whereabouts with friends. The friends who see these events in their News Feed can then respond to these check-ins by liking or commenting on them. These data consisting of the places people go and how their friends react to them are a rich, novel dataset. In this paper we first analyze this dataset to understand the factors that influence where users check in, including previous check-ins, similarity to other places, where their friends check in, time of day, and demographics. We show how these factors can be used to build a predictive model of where users will check in next. Then we analyze how users respond to their friends’ check-ins and which factors contribute to users liking or commenting on them. We show how this can be used to improve the ranking of check-in stories, ensuring that users see only the most relevant updates from their friends and ensuring that businesses derive maximum value from check-ins at their establishments. Finally, we construct a model to predict friendship based on check-in count and show that cocheck-ins has a statistically significant effect on friendship.

actor, artificial intelligence, machine learning, (17 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Industry: Information Technology > Services (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Aumayr, Erik (National University of Ireland, Galway) | Chan, Jeffrey (National University of Ireland, Galway) | Hayes, Conor (National University of Ireland, Galway)

Reconstruction of Threaded Conversations in Online Discussion Forums

Online discussion boards, or Internet forums, are a signiﬁcant part of the Internet. People use Internet forums to post questions, provide advice and participate in discussions. These online conversations are represented as threads, and the conversation trees within these threads are important in understanding the behaviour of online users. Unfortunately, the reply structures of these threads are generally not publicly accessible or not maintained. Hence, in this paper, we introduce an efﬁcient and simple approach to reconstruct the reply structure in threaded conversations. We contrast its accuracy against three baseline algorithms, and show that our algorithm can accurately recreate the in and out degree distributions of forum reply graphs built from the reconstructed reply structures.

data mining, machine learning, vertex, (22 more...)

Fifth International AAAI Conference on Weblogs and Social Media

Country:

Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > Texas (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Collaboration (0.70)
Information Technology > Data Science > Data Mining (0.70)
(2 more...)

arXiv.org Machine LearningJul-12-2011

BSVM: A Banded Suport Vector Machine

Pendse, Gautam V.

We describe a novel binary classification technique called Banded SVM (B-SVM). In the standard C-SVM formulation of Cortes et al. (1995), the decision rule is encouraged to lie in the interval [1, \infty]. The new B-SVM objective function contains a penalty term that encourages the decision rule to lie in a user specified range [\rho_1, \rho_2]. In addition to the standard set of support vectors (SVs) near the class boundaries, B-SVM results in a second set of SVs in the interior of each class.

artificial intelligence, b-svm, machine learning, (16 more...)

arXiv.org Machine Learning

1107.2347

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Balzano, Laura, Nowak, Robert, Recht, Benjamin

Online Identification and Tracking of Subspaces from Highly Incomplete Information

arXiv.org Machine LearningJul-12-2011

This work presents GROUSE (Grassmanian Rank-One Update Subspace Estimation), an efficient online algorithm for tracking subspaces from highly incomplete observations. GROUSE requires only basic linear algebraic manipulations at each iteration, and each subspace update can be performed in linear time in the dimension of the subspace. The algorithm is derived by analyzing incremental gradient descent on the Grassmannian manifold of subspaces. With a slight modification, GROUSE can also be used as an online incremental algorithm for the matrix completion problem of imputing missing entries of a low-rank matrix. GROUSE performs exceptionally well in practice both in tracking subspaces and as an online algorithm for matrix completion.

artificial intelligence, machine learning, subspace, (14 more...)

arXiv.org Machine Learning

1006.4046

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)