Supervised Learning
Task-Specific Representation Learning for Web-Scale Entity Disambiguation
Kar, Rijula (IIT, Kharagpur) | Reddy, Susmija (IIT, Kharagpur) | Bhattacharya, Sourangshu (IIT, Kharagpur) | Dasgupta, Anirban (IIT, Gandhinagar) | Chakrabarti, Soumen (IIT, Bombay)
Named entity disambiguation (NED) is a central problem in information extraction. The goal is to link entities in a knowledge graph (KG) to their mention spans in unstructured text. Each distinct mention span (like John Smith, Jordan or Apache) represents a multi-class classification task. NED can therefore be modeled as a multitask problem with tens of millions of tasks for realistic KGs. We initiate an investigation into neural representations, network architectures, and training protocols for multitask NED. Specifically, we propose a task-sensitive representation learning framework that learns mention dependent representations, followed by a common classifier. Parameter learning in our framework can be decomposed into solving multiple smaller problems involving overlapping groups of tasks. We prove bounds for excess risk, which provide additional insight into the problem of multi-task representation learning. While remaining practical in terms of training memory and time requirements, our approach outperforms recent strong baselines, on four benchmark data sets.
Learning From Semi-Supervised Weak-Label Data
Dong, Hao-Chen (Nanjing University) | Li, Yu-Feng (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)
Multi-label learning deals with data objects associated with multiple labels simultaneously. Previous studies typically assume that for each instance, the full set of relevant labels associated with each training instance is given. In many applicationssuch as image annotation, however, itโs usually difficult to get the full label set for each instance and only a partial or even empty set of relevant labels is available. We call this kind of problem as "semi-supervised weak-label learning" problem. In this work we propose the SSWL (Semi-Supervised Weak-Label) method to address this problem. Both instance similarity and label similarity are considered for the complement of missing labels. Ensemble of multiple models are utilized to improve the robustness when label information is insufficient. We formulate the objective as a bi-convex optimization problem with an efficient block coordinate descent algorithm. Experiments validate the effectiveness of SSWL.
Testing to distinguish measures on metric spaces
Blumberg, Andrew J., Bhaumik, Prithwish, Walker, Stephen G.
We study the problem of distinguishing between two distributions on a metric space; i.e., given metric measure spaces $({\mathbb X}, d, \mu_1)$ and $({\mathbb X}, d, \mu_2)$, we are interested in the problem of determining from finite data whether or not $\mu_1$ is $\mu_2$. The key is to use pairwise distances between observations and, employing a reconstruction theorem of Gromov, we can perform such a test using a two sample Kolmogorov--Smirnov test. A real analysis using phylogenetic trees and flu data is presented.
On Structured Prediction Theory with Calibrated Convex Surrogate Losses
Osokin, Anton, Bach, Francis, Lacoste-Julien, Simon
We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for general structured prediction.
Precipitation Record Set in Northeast Nevada After Storm
A winter storm warning expired in the Lake Tahoe region Friday afternoon. The weather service said 11 inches (28 centimeters) of snow was recorded Thursday night and early Friday at the Northstar ski resort near Truckee, California and about 10 inches (25 centimeters) at Mt. Rose southwest of Reno. Up to 7 inches (18 centimeters) was reported at Heavenly in South Lake Tahoe, California.
Multi-Armed Bandits with Metric Movement Costs
Koren, Tomer, Livni, Roi, Mansour, Yishay
We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure $\mathcal{C}$ of the underlying metric which depends on its covering numbers. In finite metric spaces with $k$ actions, we give an efficient algorithm that achieves regret of the form $\widetilde(\max\set{\mathcal{C}^{1/3}T^{2/3},\sqrt{kT}})$, and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret $\widetilde{\Theta}(\max\set{k^{1/3}T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(k)$, and (ii) the interval metric with regret $\widetilde{\Theta}(\max\set{T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(1)$. For infinite metrics spaces with Lipschitz loss functions, we derive a tight regret bound of $\widetilde{\Theta}(T^{\frac{d+1}{d+2}})$ where $d \ge 1$ is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.