AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Summarizing Newspaper Comments

Llewellyn, Clare (University of Edinburgh) | Grover, Claire (University of Edinburgh) | Oberlander, Jon (University of Edinburgh)

AAAI ConferencesMar-23-2014

This work investigates summarizing the conversations that occur in the comments section of the UK newspaper the Guardian. In the comment summarization task comments are clustered and ranked within the cluster. The top comments from each cluster are used to give an overview of that cluster. It was found that topic model clustering gave the most agreement when evaluated against a human gold standard. This approach is compared to cosine distance clustering and k-means clustering. PageRank was found to be the prefered ranking system when compared with TF-IDF, Mutual Information gain and Maximal Marginal Relevance and evaluated against sets of comments summarized by a journalist for the Guardian letters page.

newspaper comment

AAAI Conferences

Eighth International AAAI Conference on Weblogs and Social Media

Country: Europe > United Kingdom (0.24)

Industry: Media > News (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.53)

Add feedback

VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text

Hutto, C. J. (Georgia Institute of Technology) | Gilbert, Eric (Georgia Institute of Technology)

AAAI ConferencesMar-23-2014

The inherent nature of social media content poses serious challenges to practical applications of sentiment analysis. We present VADER, a simple rule-based model for general sentiment analysis, and compare its effectiveness to eleven typical state-of-practice benchmarks including LIWC, ANEW, the General Inquirer, SentiWordNet, and machine learning oriented techniques relying on Naive Bayes, Maximum Entropy, and Support Vector Machine (SVM) algorithms. Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts. We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Interestingly, using our parsimonious rule-based model to assess the sentiment of tweets, we find that VADER outperforms individual human raters (F1 Classification Accuracy = 0.96 and 0.84, respectively), and generalizes more favorably across contexts than any of our benchmarks.

artificial intelligence, machine learning, natural language, (5 more...)

AAAI Conferences

Eighth International AAAI Conference on Weblogs and Social Media

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.80)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.80)
(3 more...)

Add feedback

Firefly Monte Carlo: Exact MCMC with Subsets of Data

Maclaurin, Dougal, Adams, Ryan P.

arXiv.org Machine LearningMar-22-2014

Markov chain Monte Carlo (MCMC) is a popular and successful general-purpose tool for Bayesian inference. However, MCMC cannot be practically applied to large data sets because of the prohibitive cost of evaluating every likelihood term at every iteration. Here we present Firefly Monte Carlo (FlyMC) an auxiliary variable MCMC algorithm that only queries the likelihoods of a potentially small subset of the data at each iteration yet simulates from the exact posterior distribution, in contrast to recent proposals that are approximate even in the asymptotic limit. FlyMC is compatible with a wide variety of modern MCMC algorithms, and only requires a lower bound on the per-datum likelihood factors. In experiments, we find that FlyMC generates samples from the posterior more than an order of magnitude faster than regular MCMC, opening up MCMC methods to larger datasets than were previously considered feasible.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1403.5693

Country: North America > United States (0.14)

Genre: Research Report (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Bayesian Optimization with Unknown Constraints

Gelbart, Michael A., Snoek, Jasper, Adams, Ryan P.

arXiv.org Machine LearningMar-21-2014

Bayesian optimization (Mockus et al., 1978) is a method for performing global optimization of unknown "black box" objectives that is particularly appropriate when objective function evaluations are expensive (in any sense, such as time or money). For example, consider a food company trying to design a low-calorie variant of a popular cookie. In this case, the design space is the space of possible recipes and might include several key parameters such as quantities of various ingredients and baking times. Each evaluation of a recipe entails computing (or perhaps actually measuring) the number of calories in the proposed cookie. Bayesian optimization can be used to propose new candidate recipes such that good results are found with few evaluations. Now suppose the company also wants to ensure the taste of the cookie is not compromised when calories are reduced. Therefore, for each proposed low-calorie recipe, they perform a taste test with sample customers. Because different people, or the same people at different times, have differing opinions about the taste of cookies, the company decides to require that at least 95% of test subjects must like the new cookie.

artificial intelligence, constraint, machine learning, (15 more...)

arXiv.org Machine Learning

1403.5607

Country: North America > Canada > Alberta (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Asymptotically Exact, Embarrassingly Parallel MCMC

Neiswanger, Willie, Wang, Chong, Xing, Eric

arXiv.org Machine LearningMar-21-2014

Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication. First, we arbitrarily partition data onto multiple machines. Then, on each machine, any classical MCMC method (e.g., Gibbs sampling) may be used to draw samples from a posterior distribution given the data subset. Finally, the samples from each machine are combined to form samples from the full posterior. This embarrassingly parallel algorithm allows each machine to act independently on a subset of the data (without communication) until the final combination stage. We prove that our algorithm generates asymptotically exact samples and empirically demonstrate its ability to parallelize burn-in and sampling in several models.

artificial intelligence, machine learning, posterior, (16 more...)

arXiv.org Machine Learning

1311.478

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Text-Based Twitter User Geolocation Prediction

Han, B., Cook, P., Baldwin, T.

Journal of Artificial Intelligence ResearchMar-20-2014

Geographical location is vital to geospatial applications like local search and event detection. In this paper, we investigate and improve on the task of text-based geolocation prediction of Twitter users. Previous studies on this topic have typically assumed that geographical references (e.g., gazetteer terms, dialectal words) in a text are indicative of its authors location. However, these references are often buried in informal, ungrammatical, and multilingual data, and are therefore non-trivial to identify and exploit. We present an integrated geolocation prediction framework and investigate what factors impact on prediction accuracy. First, we evaluate a range of feature selection methods to obtain location indicative words. We then evaluate the impact of non-geotagged tweets, language, and user-declared metadata on geolocation prediction. In addition, we evaluate the impact of temporal variance on model generalisation, and discuss how users differ in terms of their geolocatability. We achieve state-of-the-art results for the text-based Twitter user geolocation task, and also provide the most extensive exploration of the task to date. Our findings provide valuable insights into the design of robust, practical text-based geolocation prediction systems.

accuracy, prediction, tweet, (12 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4200

AI Access Foundation

10869

Journal of Artificial Intelligence Research

Country:

Europe > Austria > Vienna (0.14)
Asia > South Korea (0.14)
North America > United States > New York (0.04)
(42 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Sparse Learning over Infinite Subgraph Features

Takigawa, Ichigaku, Mamitsuka, Hiroshi

arXiv.org Machine LearningMar-20-2014

We present a supervised-learning algorithm from graph data (a set of graphs) for arbitrary twice-differentiable loss functions and sparse linear models over all possible subgraph features. To date, it has been shown that under all possible subgraph features, several types of sparse learning, such as Adaboost, LPBoost, LARS/LASSO, and sparse PLS regression, can be performed. Particularly emphasis is placed on simultaneous learning of relevant features from an infinite set of candidates. We first generalize techniques used in all these preceding studies to derive an unifying bounding technique for arbitrary separable functions. We then carefully use this bounding to make block coordinate gradient descent feasible over infinite subgraph features, resulting in a fast converging algorithm that can solve a wider class of sparse learning problems over graph data. We also empirically study the differences from the existing approaches in convergence property, selected subgraph features, and search-space sizes. We further discuss several unnoticed issues in sparse learning over all possible subgraph features.

artificial intelligence, machine learning, subgraph feature, (14 more...)

arXiv.org Machine Learning

1403.5177

Country:

Europe (0.92)
North America > Canada (0.67)
Asia > Japan > Honshū (0.46)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Proximal Stochastic Gradient Method with Progressive Variance Reduction

Xiao, Lin, Zhang, Tong

arXiv.org Machine LearningMar-19-2014

We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping. We assume the whole objective function is strongly convex. Such problems often arise in machine learning, known as regularized empirical risk minimization. We propose and analyze a new proximal stochastic gradient method, which uses a multi-stage scheme to progressively reduce the variance of the stochastic gradient. While each iteration of this algorithm has similar cost as the classical stochastic gradient method (or incremental gradient method), we show that the expected objective value converges to the optimum at a geometric rate. The overall complexity of this method is much lower than both the proximal full gradient method and the standard proximal stochastic gradient method.

artificial intelligence, gradient, machine learning, (17 more...)

arXiv.org Machine Learning

1403.4699

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Bayesian Source Separation Applied to Identifying Complex Organic Molecules in Space

Knuth, Kevin H., Tse, Man Kit, Choinsky, Joshua, Maunu, Haley A., Carbon, Duane F.

arXiv.org Machine LearningMar-18-2014

Emission from a class of benzene-based molecules known as Polycyclic Aromatic Hydrocarbons (PAHs) dominates the infrared spectrum of star-forming regions. The observed emission appears to arise from the combined emission of numerous PAH species, each with its unique spectrum. Linear superposition of the PAH spectra identifies this problem as a source separation problem. It is, however, of a formidable class of source separation problems given that different PAH sources potentially number in the hundreds, even thousands, and there is only one measured spectral signal for a given astrophysical site. Fortunately, the source spectra of the PAHs are known, but the signal is also contaminated by other spectral sources. We describe our ongoing work in developing Bayesian source separation techniques relying on nested sampling in conjunction with an ON/OFF mechanism enabling simultaneous estimation of the probability that a particular PAH species is present and its contribution to the spectrum.

artificial intelligence, spectrum, us government, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/SSP.2007.4301277

1403.4626

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)
Europe (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report (0.40)

Industry:

Energy > Oil & Gas (0.91)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.56)
Government > Regional Government > North America Government > United States Government (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Can Cascades be Predicted?

Cheng, Justin, Adamic, Lada A., Dow, P. Alex, Kleinberg, Jon, Leskovec, Jure

arXiv.org Machine LearningMar-18-2014

On many social networking web sites such as Facebook and Twitter, resharing or reposting functionality allows users to share others' content with their own friends or followers. As content is reshared from user to user, large cascades of reshares can form. While a growing body of research has focused on analyzing and characterizing such cascades, a recent, parallel line of work has argued that the future trajectory of a cascade may be inherently unpredictable. In this work, we develop a framework for addressing cascade prediction problems. On a large sample of photo reshare cascades on Facebook, we find strong performance in predicting whether a cascade will continue to grow in the future. We find that the relative growth of a cascade becomes more predictable as we observe more of its reshares, that temporal and structural features are key predictors of cascade size, and that initially, breadth, rather than depth in a cascade is a better indicator of larger cascades. This prediction performance is robust in the sense that multiple distinct classes of features all achieve similar performance. We also discover that temporal features are predictive of a cascade's eventual shape. Observing independent cascades of the same content, we find that while these cascades differ greatly in size, we are still able to predict which ends up the largest.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Machine Learning

doi: 10.1145/2566486.2567997

1403.4608

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback