Country
A Survey on how Description Logic Ontologies Benefit from Formal Concept Analysis
Although the notion of a concept as a collection of objects sharing certain properties, and the notion of a conceptual hierarchy are fundamental to both Formal Concept Analysis and Description Logics, the ways concepts are described and obtained differ significantly between these two research areas. Despite these differences, there have been several attempts to bridge the gap between these two formalisms, and attempts to apply methods from one field in the other. The present work aims to give an overview on the research done in combining Description Logics and Formal Concept Analysis.
Iteration Complexity of Randomized Block-Coordinate Descent Methods for Minimizing a Composite Function
Richtárik, Peter, Takáč, Martin
In this paper we develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an $\epsilon$-accurate solution with probability at least $1-\rho$ in at most $O(\tfrac{n}{\epsilon} \log \tfrac{1}{\rho})$ iterations, where $n$ is the number of blocks. For strongly convex functions the method converges linearly. This extends recent results of Nesterov [Efficiency of coordinate descent methods on huge-scale optimization problems, CORE Discussion Paper #2010/2], which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing $\epsilon$ from the logarithmic term. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving true iteration complexity bounds. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. Finally, we demonstrate numerically that the algorithm is able to solve huge-scale $\ell_1$-regularized least squares and support vector machine problems with a billion variables.
Linear Latent Force Models using Gaussian Processes
Álvarez, Mauricio A., Luengo, David, Lawrence, Neil D.
Purely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modelling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology and geostatistics.
Fast Learning Rate of lp-MKL and its Minimax Optimality
In this paper, we give a new sharp generalization bound of lp-MKL which is a generalized framework of multiple kernel learning (MKL) and imposes lp-mixed-norm regularization instead of l1-mixed-norm regularization. We utilize localization techniques to obtain the sharp learning rate. The bound is characterized by the decay rate of the eigenvalues of the associated kernels. A larger decay rate gives a faster convergence rate. Furthermore, we give the minimax learning rate on the ball characterized by lp-mixed-norm in the product space. Then we show that our derived learning rate of lp-MKL achieves the minimax optimal rate on the lp-mixed-norm ball.
Fast Convergence Rate of Multiple Kernel Learning with Elastic-net Regularization
Suzuki, Taiji, Tomioka, Ryota, Sugiyama, Masashi
We investigate the learning rate of multiple kernel leaning (MKL) with elastic-net regularization, which consists of an $\ell_1$-regularizer for inducing the sparsity and an $\ell_2$-regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large but the number of non-zero components of the ground truth is relatively small, and prove that elastic-net MKL achieves the minimax learning rate on the $\ell_2$-mixed-norm ball. Our bound is sharper than the convergence rates ever shown, and has a property that the smoother the truth is, the faster the convergence rate is.
The Party Is Over Here: Structure and Content in the 2010 Election
Livne, Avishay (The University of Michigan) | Simmons, Matthew (The University of Michigan) | Adar, Eytan (The University of Michigan) | Adamic, Lada (The University of Michigan)
In this work, we study the use of Twitter by House, Senate and gubernatorial candidates during the midterm (2010) elections in the U.S. Our data includes almost 700 candidates and over 690k documents that they produced and cited in the 3.5 years leading to the elections. We utilize graph and text mining techniques to analyze differences between Democrats, Republicans and Tea Party candidates, and suggest a novel use of language modeling for estimating content cohesiveness. Our findings show significant differences in the usage patterns of social media, and suggest conservative candidates used this medium more effectively, conveying a coherent message and maintaining a dense graph of connections. Despite the lack of party leadership, we find Tea Party members display both structural and language-based cohesiveness. Finally, we investigate the relation between network structure, content and election results by creating a proof-of-concept model that predicts candidate victory with an accuracy of 88.0%.
Why do People Retweet? Anti-Homophily Wins the Day!
Macskassy, Sofus A. ( Fetch Technologies ) | Michelson, Matthew (Fetch Technologies)
Twitter and other microblogs have rapidly become a significant means by which people communicate with the world and each other in near realtime. There has been a large number of studies surrounding these social media, focusing on areas such as information spread, various centrality measures, topic detection and more. However, one area which has not received much attention is trying to better understand what information is being spread and why it is being spread. This work looks to get a better understanding of what makes people spread information in tweets or microblogs through the use of retweeting. Several retweet behavior models are presented and evaluated on a Twitter data set consisting of over 768,000 tweets gathered from monitoring over 30,000 users for a period of one month. We evaluate the proposed models against each user and show how people use different retweet behavior models. For example, we find that although users in the majority of cases do not retweet information on topics that they themselves Tweet about as or from people who are "like them" (hence anti-homophily), we do find that models which do take homophily, or similarity, into account fits the observed retweet behaviors much better than other more general models which do not take this into account. We further find that, not surprisingly, people's retweeting behavior is better explained through multiple different models rather than one model.
Online Identification and Tracking of Subspaces from Highly Incomplete Information
Balzano, Laura, Nowak, Robert, Recht, Benjamin
This work presents GROUSE (Grassmanian Rank-One Update Subspace Estimation), an efficient online algorithm for tracking subspaces from highly incomplete observations. GROUSE requires only basic linear algebraic manipulations at each iteration, and each subspace update can be performed in linear time in the dimension of the subspace. The algorithm is derived by analyzing incremental gradient descent on the Grassmannian manifold of subspaces. With a slight modification, GROUSE can also be used as an online incremental algorithm for the matrix completion problem of imputing missing entries of a low-rank matrix. GROUSE performs exceptionally well in practice both in tracking subspaces and as an online algorithm for matrix completion.
Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter
Lee, Kyumin (Texas A&M University) | Eoff, Brian David (Texas A&M University) | Caverlee, James (Texas A&M University)
The rise in popularity of social networking sites such as Twitter and Facebook has been paralleled by the rise of unwanted, disruptive entities on these networks- — including spammers, malware disseminators, and other content polluters. Inspired by sociologists working to ensure the success of commons and criminologists focused on deterring vandalism and preventing crime, we present the first long-term study of social honeypots for tempting, profiling, and filtering content polluters in social media. Concretely, we report on our experiences via a seven-month deployment of 60 honeypots on Twitter that resulted in the harvesting of 36,000 candidate content polluters. As part of our study, we (1) examine the harvested Twitter users, including an analysis of link payloads, user behavior over time, and followers/following network dynamics and (2) evaluate a wide range of features to investigate the effectiveness of automatic content polluter identification.
Creating Conversations: An Automated Dialog System
Gandy, Lisa (Northwestern University) | Hammond, Kristian (Northwestern University)
Online news sites often include a comments section where readers are allowed to leave their thoughts. These comments often contain interesting and insightful conversations between readers about the news article. However the richness of these conversations is often lost among other meaningless comments, and moreover all comments are found at the bottom of the web page. In this article, we discuss how our system inserts reader conversations into the news article to create a multimedia presentation called Shout Out. Shout Out features two virtual news anchors: one anchor reads the news and when appropriate the anchor pauses to have a conversation about the news with another anchor. This current iteration of Shout Out combines natural language techniques and reader conversations to create an engaging system.