gofai
We're Making Progress in Explainable AI, but Major Pitfalls Remain
Machine learning algorithms are starting to exceed human performance in many narrow and specific domains, such as image recognition and certain types of medical diagnoses. We increasingly rely on machine learning algorithms to make decisions on a wide range of topics, from what we collectively spend billions of hours watching to who gets the job. But machine learning algorithms cannot explain the decisions they make. How can we justify putting these systems in charge of decisions that affect people's lives if we don't understand how they're arriving at those decisions? This desire to get more than raw numbers from machine learning algorithms has led to a renewed focus on explainable AI: algorithms that can make a decision or take an action, and tell you the reasons behind it.
Reviewing Rebooting AI
First of all, apologies for not posting as frequently as I used to. As you might imagine, blogging is not my full time job and I'm currently extremely involved in a very exciting startup (something I'm going to write about soon). On weekends and evening I'm busy with 7mo infant to help care for and altogether that leaves me with very little time. But I'll try to make it better soon, since a lot is going on in the AI space and signs of cooling are visible now all over the place. In this post I'd like to focus on the recent book by Gary Marcus and Ernest Davis, Rebooting AI.
How should we evaluate progress in AI?
The evaluation question is inseparable from questions about what sort of thing AI is--and both are inseparable from questions about how best to do it. Most intellectual disciplines have standard, unquestioned criteria for what counts as progress. Artificial intelligence is an exception. This has always caused trouble. The diverse evaluation criteria are incommensurable. They suggest divergent directions for research. They produce sharp disagreements about what methods to apply, which results are important, and how well the field is progressing. Can't AI make up its mind about what it is trying to do? Can't it just decide to be something respectable--science or engineering--and use a coherent set of evaluation criteria drawn from one of those disciplines? That doesn't seem to be possible. AI is unavoidably a wolpertinger, stitched together from bits of other disciplines. It's rarely possible to evaluate specific AI projects according to the criteria of a single one of them. This post offers a framework for thinking about what makes the AI wolpertinger fly. The framework is, so to speak, parameterized: it accommodates differing perspectives on the relative value of criteria from the six disciplines, and their role in AI research. How they are best combined is a judgement call, differing according to the observer and the project observed. Nevertheless, one can make cogent arguments in favor of weighting particular criteria more or less heavily.1 Choices about how to evaluate AI lead to choices about what problems to address, what approaches to take, and what methods to apply. I will advocate improving AI practice through greater use of scientific experimentation; pursuit particularly of philosophically interesting questions; better understanding of design practice; and greater care in creating spectacular demos. Follow-on posts will explain these points in more detail. This framework is meant mainly for AI participants.
Edge.org
I've changed my mind about how to handle the homunculus temptation: the almost irresistible urge to install a "little man in the brain" to be the Boss, the Central Meaner, the Enjoyer of pleasures and the Sufferer of pains. In Brainstorms (1978) I described and defended the classic GOFAI (Good Old Fashioned AI) strategy that came to be known as "homuncular functionalism," replacing the little man with a committee. The AI programmer begins with an intentionally characterized problem, and thus frankly views the computer anthropomorphically: if he solves the problem he will say he has designed a computer than can [e.g.,] understand questions in English . His first and highest level of design breaks the computer down into subsystems, each of which is given intentionally characterized tasks; he composes a flow chart of evaluators, rememberers, discriminators, overseers and the like. These are homunculi with a vengeance. . . . . Each homunculus in turn is analyzed into smaller homunculi, but, more important, into less clever homunculi.