Goto

Collaborating Authors

 dietterich



Inference for the Generalization Error

Neural Information Processing Systems

In order to to compare learning algorithms, experimental results reported in the machine learning litterature often use statistical tests of signifi(cid:173) cance. Unfortunately, most of these tests do not take into account the variability due to the choice of training set. We perform a theoretical investigation of the variance of the cross-validation estimate of the gen(cid:173) eralization error that takes into account the variability due to the choice of training sets. This allows us to propose two new ways to estimate this variance. We show, via simulations, that these new statistics perform well relative to the statistics considered by Dietterich (Dietterich, 1998).


Google's 'Sentient' Chatbot Is Our Self-Deceiving Future

The Atlantic - Technology

A Google engineer named Blake Lemoine became so enthralled by an AI chatbot that he may have sacrificed his job to defend it. "I know a person when I talk to it," he told The Washington Post for a story published last weekend. "It doesn't matter whether they have a brain made of meat in their head. Or if they have a billion lines of code." After discovering that he'd gone public with his claims, Google put Lemoine on administrative leave.


A Stanford Proposal Over AI's 'Foundations' Ignites Debate

WIRED

Last month, Stanford researchers declared that a new era of artificial intelligence had arrived, one built atop colossal neural networks and oceans of data. They said a new research center at Stanford would build--and study--these "foundational models" of AI. Critics of the idea surfaced quickly--including at the workshop organized to mark the launch of the new center. Some object to the limited capabilities and sometimes freakish behavior of these models; others warn of focusing too heavily on one way of making machines smarter. "I think the term'foundation' is horribly wrong," Jitendra Malik, a professor at UC Berkeley who studies AI, told workshop attendees in a video discussion.


Can we rely on AI?

#artificialintelligence

As artificial intelligence (AI) systems get increasingly complex, they are being used to make forecasts – or rather generate predictive model results – in more and more areas of our lives. But at the same time, concerns are on the rise about reliability, amid widening margins of error in elaborate AI predictions. How can we address these concerns? Management science offers a set of tools that can make AI systems more trustworthy, according to Thomas G Dietterich, professor emeritus and director of intelligent systems research at Oregon State University. During a webinar on the AI for Good platform hosted by the International Telecommunication Union (ITU), Dietterich told the audience that the discipline that brings human decision-makers to the top of their game can also be applied to machines.


Can AI be made trustworthy?

#artificialintelligence

As artificial systems (AI) get increasingly complex, they are being used to make forecasts – or rather generate predictive model results – in more and more areas of our lives. At the same time, concerns are on the rise about reliability, amid widening margins of error in elaborate AI predictions. Management science offers a set of tools that can make AI systems more trustworthy. The discipline that brings human decision-makers to the top of their game can also be applied to machines, according to Thomas G Dietterich, Professor Emeritus and Director of Intelligent Systems Research at Oregon State University. Human intuition still beats AI hands down in making judgment calls in a crisis. People – and especially those working in their areas of experience and expertise – are simply more trustworthy.


In defense of skepticism about deep learning

#artificialintelligence

Despite the promising results obtained with [representations developed from Web image], the experiments demonstrate that object classification with real-life robotic data is far from being solved."


Do We Need A Theory of AI?

#artificialintelligence

What would a theory of artificial intelligence look like, and how might it be achieved? When designing a new engine or airplane wing, engineers can apply theories that have withstood years of scientific scrutiny, such as the Laws of Thermodynamics or Newton's Laws of Motion. To what theories --if any --can artificial intelligence (AI) researchers and technology pioneers turn when designing neural networks or algorithms? We asked experts from the fields of computer science, theoretical physics, and philosophy for their insights. The Encyclopedia Britannia defines a scientific theory as a "systematic ideational structure of broad scope, conceived by the human imagination, that encompasses a family of empirical (experiential) laws regarding regularities existing in objects and events, both observed and posited."



Model evaluation, model selection, and algorithm selection in machine learning

#artificialintelligence

A single-PDF version of Model Evaluation parts 1-4 is available on arXiv: https://arxiv.org/abs/1811.12808 This final article in the series Model evaluation, model selection, and algorithm selection in machine learning presents overviews of several statistical hypothesis testing approaches, with applications to machine learning model and algorithm comparisons. This includes statistical tests based on target predictions for independent test sets (the downsides of using a single test set for model comparisons was discussed in previous articles) as well as methods for algorithm comparisons by fitting and evaluating models via cross-validation. Lastly, this article will introduce nested cross-validation, which has become a common and recommended a method of choice for algorithm comparisons for small to moderately-sized datasets. Then, at the end of this article, I provide a list of my personal suggestions concerning model evaluation, selection, and algorithm selection summarizing the several techniques covered in this series of articles. There are several different statistical hypothesis testing frameworks that are being used in practice to compare the performance of classification models, including conventional methods such as difference of two proportions (here, the proportions are the estimated generalization accuracies from a test set), for which we can construct 95% confidence intervals based on the concept of the Normal Approximation to the Binomial that was covered in Part I. Performing a z-score test for two population proportions is inarguably the most straight-forward way to compare to models (but certainly not the best!): In a nutshell, if the 95% confidence intervals of the accuracies of two models do not overlap, we can reject the null hypothesis that the performance of both classifiers is equal at a confidence level of (or 5% probability).