HazyResearch/snorkel
Snorkel is a system for rapidly creating, modeling, and managing training data. Today's state-of-the-art machine learning models require massive labeled training sets--which usually do not exist for real-world applications. Instead, Snorkel is based around the new data programming paradigm, in which the developer focuses on writing a set of labeling functions, which are just scripts that programmatically label data. The resulting labels are noisy, but Snorkel automatically models this process--learning, essentially, which labeling functions are more accurate than others--and then uses this to train an end model (for example, a deep neural network in TensorFlow). Surprisingly, by modeling a noisy training set creation process in this way, we can take potentially low-quality labeling functions from the user, and use these to train high-quality end models.
Mar-14-2019, 23:09:14 GMT
- Technology: