vertex cover
Optimal Tagging with Markov Chain Optimization
Many information systems use tags and keywords to describe and annotate content. These allow for efficient organization and categorization of items, as well as facilitate relevant search queries. As such, the selected set of tags for an item can have a considerable effect on the volume of traffic that eventually reaches an item. In tagging systems where tags are exclusively chosen by an item's owner, who in turn is interested in maximizing traffic, a principled approach for assigning tags can prove valuable. In this paper we introduce the problem of optimal tagging, where the task is to choose a subset of tags for a new item such that the probability of browsing users reaching that item is maximized.
Learning Augmented Graph $k$-Clustering
Clustering is a cornerstone of unsupervised machine learning, widely applied in fields such as data organization, anomaly detection, and community detection in networks [Xu and Wunsch, 2005]. Among clustering problems, the k -means and k -median problems stand out as fundamental due to their simplicity and effectiveness. Traditional algorithms aim to partition data into k clusters, minimizing either the sum of squared distances (k-means) or the sum of absolute distances (k-median) to their respective cluster centers. The k -means algorithm has been a cornerstone of clustering research for decades, tracing its roots to foundational works by [MacQueen, 1967] and [Lloyd, 1982], who introduced the iterative optimization approach still used today. Extensions by [Hartigan and Wong, 1979] improved convergence, while [Forgy, 1965] proposed widely-used initialization techniques. The optimization principles underlying k -means were influenced by earlier algorithmic developments, such as Floyd's contributions to optimization [Floyd, 1962]. Improvements include k -means++ [Arthur and Vassilvitskii, 2007], which introduced a probabilistic seeding strategy to improve initialization quality and convergence, and Mini-Batch k -means[Sculley, 2010], which enabled clustering on massive datasets with reduced computational overhead.