Bayesian
Everything that Works Works Because it's Bayesian: Why Deep Nets Generalize?
We could not so far claim that deep networks trained with stochastic gradient descent are Bayesian. And it may be because SGD biases learning towards flat minima, rather than sharp minima. It turns out, (Hochreiter and Schmidhuber, 1997) motivated their work on seeking flat minima from a Bayesian, minimum description length perspective. Seeking flat minima makes sense from a minimum description length perspective.
Here's how we fix the Tay problem
Microsoft's intelligent chatbot Tay behaved badly last week (and this week too), but that shouldn't have shocked any of us. Interestingly, Microsoft has been operating a similarly designed service in China called Xiaoice, meaning "little Bing," which is most likely a step towards replacing elements of customer service and it has proved quite successful. Luckily, we have a new statistical learning paradigm (Bayesian statistical theory) at work, which we've been able to implement during the last few years due to recent advances in simulation theory. It forces human assumptions to be explicit in the mathematics, reducing the potential for unintentional human bias that still occurs in scientific research today (p-values is an excellent example of this insanity).
Here's how we fix the Tay problem
Microsoft's intelligent chatbot Tay behaved badly last week (and this week too), but that shouldn't have shocked any of us. Interestingly, Microsoft has been operating a similarly designed service in China called Xiaoice, meaning "little Bing," which is most likely a step towards replacing elements of customer service and it has proved quite successful. Luckily, we have a new statistical learning paradigm (Bayesian statistical theory) at work, which we've been able to implement during the last few years due to recent advances in simulation theory. It forces human assumptions to be explicit in the mathematics, reducing the potential for unintentional human bias that still occurs in scientific research today (p-values is an excellent example of this insanity).