Combine reinforces and unsupervised learning?

#artificialintelligence 

Something that reminds me of this is the framework of generative adversarial networks. The generator is updated to try to trick the discriminator (by gradient descent to minimize its accuracy), which is then updated again to deal with the new samples. This framework has been very popular in the past few years, but is very tricky to use in practice, and there's a lot of ongoing research (e.g. this paper from earlier this year and this one just published two days ago) in getting them to work more reliably. You could imagine, maybe, having human annotators do some kind of label smoothing for the discriminator: true data set samples get the label 1, terrible samples get label -1, okay ones get -.75, great ones get 0, maybe the best ones get labeled as .5 or even 1. I haven't thought through the consequences of this too much, but it might help.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found