Goto

Collaborating Authors

 Supervised Learning


Task-Specific Representation Learning for Web-Scale Entity Disambiguation

AAAI Conferences

Named entity disambiguation (NED) is a central problem in information extraction. The goal is to link entities in a knowledge graph (KG) to their mention spans in unstructured text. Each distinct mention span (like John Smith, Jordan or Apache) represents a multi-class classification task. NED can therefore be modeled as a multitask problem with tens of millions of tasks for realistic KGs. We initiate an investigation into neural representations, network architectures, and training protocols for multitask NED. Specifically, we propose a task-sensitive representation learning framework that learns mention dependent representations, followed by a common classifier. Parameter learning in our framework can be decomposed into solving multiple smaller problems involving overlapping groups of tasks. We prove bounds for excess risk, which provide additional insight into the problem of multi-task representation learning. While remaining practical in terms of training memory and time requirements, our approach outperforms recent strong baselines, on four benchmark data sets.


Learning From Semi-Supervised Weak-Label Data

AAAI Conferences

Multi-label learning deals with data objects associated with multiple labels simultaneously. Previous studies typically assume that for each instance, the full set of relevant labels associated with each training instance is given. In many applicationssuch as image annotation, however, itโ€™s usually difficult to get the full label set for each instance and only a partial or even empty set of relevant labels is available. We call this kind of problem as "semi-supervised weak-label learning" problem. In this work we propose the SSWL (Semi-Supervised Weak-Label) method to address this problem. Both instance similarity and label similarity are considered for the complement of missing labels. Ensemble of multiple models are utilized to improve the robustness when label information is insufficient. We formulate the objective as a bi-convex optimization problem with an efficient block coordinate descent algorithm. Experiments validate the effectiveness of SSWL.


Police: Man Put Dismembered Wife in Suitcase, Set It Ablaze

U.S. News

Investigators say a homeless man accused of killing his wife, dismembering her body and riding with the remains in a suitcase aboard on a light-rail train in Los Angeles didn't draw any attention from fellow passengers.


Police: Man put dismembered wife in suitcase, set it ablaze

FOX News

LOS ANGELES โ€“ Investigators believe a homeless man killed his wife in an abandoned restaurant, chopped up her body, stuffed it into a suitcase and then calmly rode with it aboard a train before he burned her remains in a parking lot, Los Angeles police said Tuesday.


Testing to distinguish measures on metric spaces

arXiv.org Machine Learning

We study the problem of distinguishing between two distributions on a metric space; i.e., given metric measure spaces $({\mathbb X}, d, \mu_1)$ and $({\mathbb X}, d, \mu_2)$, we are interested in the problem of determining from finite data whether or not $\mu_1$ is $\mu_2$. The key is to use pairwise distances between observations and, employing a reconstruction theorem of Gromov, we can perform such a test using a two sample Kolmogorov--Smirnov test. A real analysis using phylogenetic trees and flu data is presented.


Southern California Temperature Records Set Amid Fire Danger

U.S. News

Fire officials have deployed additional resources to be able to respond quickly in case blazes break out. Firefighters made quick work of a small brush fire that briefly threatened homes before dawn in Malibu.


On Structured Prediction Theory with Calibrated Convex Surrogate Losses

arXiv.org Machine Learning

We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for general structured prediction.


Information retrieval document search using vector space model in R

@machinelearnbot

Note, there are many variations in the way we calculate the term-frequency(tf) and inverse document frequency (idf), in this post we have seen one variation. Below images show as the other recommended variations of tf and idf, taken from wiki.


Precipitation Record Set in Northeast Nevada After Storm

U.S. News

A winter storm warning expired in the Lake Tahoe region Friday afternoon. The weather service said 11 inches (28 centimeters) of snow was recorded Thursday night and early Friday at the Northstar ski resort near Truckee, California and about 10 inches (25 centimeters) at Mt. Rose southwest of Reno. Up to 7 inches (18 centimeters) was reported at Heavenly in South Lake Tahoe, California.


Multi-Armed Bandits with Metric Movement Costs

Neural Information Processing Systems

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure $\mathcal{C}$ of the underlying metric which depends on its covering numbers. In finite metric spaces with $k$ actions, we give an efficient algorithm that achieves regret of the form $\widetilde(\max\set{\mathcal{C}^{1/3}T^{2/3},\sqrt{kT}})$, and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret $\widetilde{\Theta}(\max\set{k^{1/3}T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(k)$, and (ii) the interval metric with regret $\widetilde{\Theta}(\max\set{T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(1)$. For infinite metrics spaces with Lipschitz loss functions, we derive a tight regret bound of $\widetilde{\Theta}(T^{\frac{d+1}{d+2}})$ where $d \ge 1$ is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.