Specialist knowledge is useless and unhelpful

by Justin Esarey

The title of this blog post is the title of a recent article in Slate about predicting behavior in complex systems using data analysis… something, I think, that most quantitative social scientists are pretty interested in. While it might be hard to get such a title past peer reviewers in a professional journal article, I think it's brilliant. It makes the substantive importance of the topic and the position that the author will take immediately clear, and does so in a way that attracts reader attention. It forsakes the pretense of scholarly dispassion–we know this author has something to say and is going to make a case for it–but I've had too many drinks with too many professors to believe in that, anyway. In my mind, taking a strong position has never been the the same as being “biased” in interpreting the evidence, or being inflexible once new information comes to light. Someday, I'd like to write a bit about the discipline of academic writing, and whether the socially-enforced veil of tentativeness is productive.

But I digress.

The title's great, but the discussion in the article is fascinating. Its basic argument is that Kaggle, a site for crowdsourcing predictive analytics with cash prizes for the winners, has demonstrated that theoretically-grounded expertise can't beat high-powered data mining when it comes to telling the future:

PA: That sounds very different from the traditional approach to building predictive models. How have experts reacted?

JH: The messages are uncomfortable for a lot of people. It's controversial because we're telling them: “Your decades of specialist knowledge are not only useless, they're actually unhelpful; your sophisticated techniques are worse than generic methods.” It's difficult for people who are used to that old type of science. They spend so much time discussing whether an idea makes sense. They check the visualizations and noodle over it. That is all actively unhelpful.

Wow! This isn't exactly a new point–Tetlock found a similar result in his celebrated book on expert political judgment–but it's certainly a very bold illustration of this general principle. And, most interesting to me, it shows that the idea of “hedgehog” and “fox” thinking isn't just a general attitude toward making decisions and interpreting evidence. We can interpret them as methodological strategies.

My read of much of the classic literature in the philosophy of science is that it endorses what I might term “skeptical hedgehog” thinking. For example, Popperian falsificationism is all about building frameworks of theory that allow us to deduce many interesting implications… and then doing your best to tear down those predictions. Kuhn's view is that science is usually about building out and testing the details of overarching theoretical frameworks, until they collapse under the weight of contradictory findings. Lakatos' revision is that these frameworks never truly collapse, but argue with one another for adherents. And so on.

This is hedgehog-style thinking, in that it emphasizes the creation of a single logically consistent framework for the interpretation of the world (or the unavoidability and usefulness of these frameworks). But it doesn't necessarily come with the negative overtones of Tetlock's usage, because this isn't stubborn adherence to a framework in light of evidence: all of these views either explicitly advise the clear-eyed comparison of expectations to outcomes and the discarding of frameworks that aren't useful, or recognize that individuals can't necessarily reach this standard of objectivity but that the scientific community can, over time. And in Popper's case, scientific understanding is closely linked to successful prediction; there is no attempt to wriggle out of the predictive straightjacket by claiming that we can have explanation without prediction (or the converse).

It's worth mentioning that this is the view being promoted by the NSF-funded EITM (Empirical Implications of Theoretical Models) program, something that I participated in as a student and used as a basic justification for some of my methodological work.

Fox-style thinking means, in short, coming up with a suite of good tricks to achieve the objective (successful prediction) and using whatever combination of those tricks yields the best result, regardless of whether these tricks are logically consistent with each other or with the techniques we used for related problems. Many of the methods endorsed in the Slate article, such as CART (classification and regression trees), are just this sort of approach. CART tries to achieve prediction success by creating a set of successive binary classification decisions (Is the person female? Do they earn more than $45,000 a year? Have they attended college?). Each answer narrows down the categorization of the case further, until ultimately we can narrow it down well enough to make a good point prediction for each category. The trick is in finding the right questions to ask to get the best prediction possible given the information fed to the algorithm. The optimal classification tree is decided upon atheoretically, essentially by maximizing some kind of accuracy criterion. The random decision forest procedure goes even further, constructing hundreds or thousands of different decision trees and having them “vote” on the most likely outcome. The only theory involved comes in selecting the variables to feed to the algorithm.

There are so many interesting questions raised by this article. Suppose that I program a computer to look at all the information available in the internet (economic and demographic data, news reports, television programs, legislative output), and whatever else. It applies some combination of processing and classification algorithms to come up with a forecast for the probability of civil conflict in the next 90 days. Let's say this forecast is completely accurate: if the probability forecast is k%, then k% of those cases result in civil conflict.

Having written this successful program, do I understand civil conflict? Does my computer understand it? If prediction = understanding, as in the Popperian mode, I suppose the answer is “yes.” But the predictions of this program would presumably break down outside of the narrow purpose for which they were created; most likely, they would be terrible at predicting conceptually related phenomena, such as non-war civil disturbances. Shouldn't understanding imply some degree of generalization?

If I wanted to share this understanding with students, what would I tell them? Would I just teach them the algorithm, or maybe show them some kind of classification tree that it creates? Would I tell them the variables that were identified as important by the classification algorithm, and make up some story about why they matter?

And what should we teach our students? If prediction implies understanding, and data mining achieves better predictions than theoretically-structured models, maybe we should just teach data engineering?