Substantively significant discussion of quantitative social science.

## Category: Uncategorized

### Blogs and Academic Tenure

A recent article in the Chronicle of Higher Education caught my attention the other day with its argument that academic blogging should be credited toward a person’s scholarly record when considering the person for tenure.

Let me start with two stipulations that weren’t explicitly made in the Chronicle article. First, presumably the credit is restricted to blogging about professionally relevant issues (research controversies, teaching approaches, policy debate, and so on) and proportional to impact (measured by readership and, when relevant, citation). Second, blogging must be a supplement to traditional research activity in peer-reviewed journals and books (that is, they are still a necessary component of a tenure case at a research institution).

With these stipulations made, I felt pretty good about including online work (like an active research blog) as a part of a tenure portfolio. This kind of work can evidence engagement by the academic community and the wider public in the scholar’s research, providing a clue to their impact on both groups.

I was motivated to think again about that argument when I read Hans Noel’s response to the Chronicle article (posted to his own blog with what I hope was a tinge of intentional irony).

Here’s the gist of Hans’ case:

For tenure, the university compiles a comprehensive file on the candidate’s accomplishments, including most importantly, letters from outside experts, who can vouch for the candidate’s contribution. Tenure decisions are based on all that information about whether or not the candidate knows what they are talking about.

What does this say about what kinds of things should “count” for tenure? It says that what counts are those things that indicate expertise in the field. A blog does not indicate expertise.

It’s hard to argue with the claim that having a blog, even a well-read blog, is not a dispositive indicator of expertise (or of valuable contributions made to the field). And I agree with much of what Hans says about the virtues of peer reviewed research. But we don’t consider a stack of peer-reviewed work automatically dispositive of expertise or value, either.

Rather, and as Hans points out, most institutions ask a set of 6-12 tenured professors to confidentially render this assessment by reviewing the totality of the file– including reading the scholar’s work. Further, the candidate’s own department and university also convene committees to make the same judgment, again based on the reading of the file (and the external professors’ assessments).

So, again extending the Chronicle author’s original argument, I think that this review process would be aided by adding relevant information about online scholarly activity, including blog posts and readership statistics thereof. Insomuch that the tenure file’s reviewers are able to read and interpret this information with an expert eye, I would think they would be able to make a judgment about whether it indicated the candidate’s expertise or value to the scholarly community.

There is no formula for concluding whether a scholar has expertise or makes contributions of value, and I don’t think the only contributions of value to the scholarly community are peer reviewed publications. So, it seems to me that the criterion for inclusion in a tenure file should be that the information provides more signal than noise on those dimensions. And I think that some online work meets that criteria.

I’m still submitting to journals, though.

### Measuring Bias in Published Work

In a series of previous posts, I’ve spent some time looking at the idea that the review and publication process in political science—and specifically, the requirement that a result must be statistically significant in order to be scientifically notable or publishable—produces a very misleading scientific literature. In short, published studies of some relationship will tend to be substantially exaggerated in magnitude. If we take the view that the “null hypothesis” of no relationship should not be a point at $\beta = 0$ but rather a set of substantively ignorable values at or near zero, as I argue in another paper and Justin Gross (an assistant professor at UNC-CH) also argues in a slightly different way, then this also means that the literature will tend to contain many false positive results—far more than the nominal $\alpha$ value of the significance test.

This opens an important question: is this just a problem in theory, or is it actually influencing the course of political science research in detectable ways?

To answer this question, I am working with Ahra Wu (one of our very talented graduate students studying International Relations and political methodology at Rice) to develop a way to measure the average level of bias in a published literature and then apply this method to recently published results in the prominent general interest journals in political science.

We presented our initial results on this front at the 2013 Methods Meetings in Charlottesville, and I’m sad to report that they are not good. Our poster summarizing the results is here. This is an ongoing project, so some of our findings may change or be refined as we continue our work; however, I do think this is a good time to summarize where we are now and seek suggestions.

First, how do you measure the bias? Well, the idea is to be able to get an estimate for $E[\beta | \hat{\beta} = \hat{\beta_{0}}$ and stat. sig.]. We believe that a conservative estimate of this quantity can be accomplished by simulating many draws of data sets with the structure of the target model but with varying values of $\beta$, where these $\beta$ values are drawn out of a prior distribution that is created to reflect a reasonable belief about the pattern of true relationships being studied in the field. Then, all of the $\hat{\beta}$ estimates can be recovered from properly specified models, then used to form an empirical estimate of $E[\beta | \hat{\beta} = \hat{\beta_{0}}$ and stat. sig.]. In essence, you simulate a world in which thousands of studies are conducted under a true and known distribution of $\beta$ and look at the resulting relationship between these $\beta$ and the statistically significant $\hat{\beta}$.

The relationship that you get between $E[\hat{\beta}$|stat. sig] and $\beta$ is shown in the picture below. To create this plot, we drew 10,000 samples (N = 100 each) from the normal distribution $k\sim\Phi(\mu=0,\,\sigma=\sigma_{0})$ for three values of $\sigma_{0}\in\{0.5,\,1,\,2\}$ (we erroneously report this as 200,000 samples in the poster, but in re-checking the code I see that it was only 10,000 samples). We then calculated the proportion of these samples for which the absolute value of $t=\frac{\beta+k}{\sigma_{0}}$ is greater than 1.645 (the cutoff for a two-tailed significance test, $\alpha=0.10$ ) for values of $\beta\in[-1,3]$.

As you can see, as $\hat{\beta}$ gets larger, its bias also grows–which is a bit counterintuitive, as we expect larger $\beta$ values to be less susceptible to significance bias: they are large enough such that both tails of the sampling distribution around $\beta$ will still be statistically significant. That’s true, but it’s offset by the fact that under many prior distributions extremely large values of $\beta$ are unlikely–less likely, in fact, than a small $\beta$ that happened to produce a very large $\hat{\beta}$! Thus, the bias actually rises in the estimate.

With a plot like this in hand, determining $E[\beta | \hat{\beta} = \hat{\beta_{0}}$ and stat. sig.] is a mere matter of reading the plot above. The only trick is that one must adjust the parameters of the simulation (e.g., the sample size) to match the target study before creating the matching bias plot.

Concordantly, we examined 177 quantitative articles published in the APSR (80 articles in volumes 102-107, from 2008-2013) and the AJPS (97 articles in volumes 54-57, from 2010-2013). Only articles with continuous and unbounded dependent variables are included in our data set. Each observation of the collected data set represents one article and contains the article’s main finding (viz., an estimated marginal effect); details of how we identified an article’s “main finding” are in the poster, but in short it was the one we thought that the author intended to be the centerpiece of his/her results.

Using this data set, we used the technique described above to estimate the average % absolute bias, $[|\hat{\beta}-\beta|/|\hat{\beta}|]$, excluding cases we visually identified as outliers. We used three different prior distributions (that is, assumptions about the distribution of true $\beta$ values in the data set) to create our bias estimates: a normal density centered on zero ($\Phi(\mu = 0, \sigma = 3)$), a diffuse uniform density between –1022 and 9288, and a spike-and-slab density with a 90% chance that $\beta = 0$ and a 10% chance of coming from the prior uniform density.

As shown in the Table below, our preliminary bias estimates for all of these prior densities hover in the 40-50% range, meaning that on average we estimate that the published estimates are $\approx$ 40-50% larger in magnitude than their true values.

 prior density avg. % absolute bias normal 41.77% uniform 40% spike-and-slab 55.44% *note: results are preliminary.

I think it is likely that these estimates will change before our final analysis is published; in particular, we did not adjust the range of the independent variable or the variance of the error term $\varepsilon$ to match the published studies (though we did adjust sample sizes); consequently, our final results will likely change. Probably what we will do by the end is examine standardized marginal effects—viz., t-ratios—instead of nominal coefficient/marginal effect values; this technique has the advantage of folding variation in $\hat{\beta}$ and $\hat{\sigma}$ into a single parameter and requiring less per-study standardization (as t-ratios are already standardized). So I’m not yet ready to say that these are reliable estimates of how much the typical result in the literature is biased. As a preliminary cut, though, I would say that the results are concerning.

We have much more to do in this research, including examining different evidence of the existence and prevalence of publication bias in political science and investigating possible solutions or corrective measures. We will have quite a bit to say in the latter regard; at the moment, using Bayesian shrinkage priors seems very promising while requiring a result to be large (“substantively significant”) as well as statistically significant seems not-at-all promising. I hope to post about these results in the future.

As a parting word on the former front, I can share one other bit of evidence for publication bias that casts a different light on some already published results. Gerber and Malhotra have published a study arguing that an excess of p-values near the 0.05 and 0.10 cutoffs, two-tailed, is evidence that researchers are making opportunistic choices for model specification and measurement that enable them to clear the statistical significance bar for publication. But the same pattern appears in a scenario when totally honest researchers are studying a world with many null results and in which statistical significance is required for publication.

Specifically, we simulated 10,000 studies (each of sample size n=100) where the true DGP for each study j is $y=\beta_{j}x+\varepsilon$, $x\sim U(0,1)$, $\varepsilon\sim\Phi(\mu=0,\,\sigma=1)$. The true value of $\beta_{j}$ has a 90% chance of being set to zero and a 10% chance of being drawn from $\Phi(\mu=0,\,\sigma=3)$ (this is the spike-and-slab distribution above). Consquently, the vast majority of DGPs are null relationships. Correctly-specified regression models $\hat{y}=\hat{\gamma}+\hat{\beta}x$ are estimated on each simulated sample. The observed (that is, published—statistically significant) and true, non-null distribution of standardized $\beta$ values (i.e., t-ratios) from this simulation are shown below.

This is a very close match for a diagram of t-ratios published in the Gerber-Malhotra paper, which shows the distribution of z-statistics (a.k.a. large-sample t-scores) from their examination of published articles in AJPS and APSR.

So perhaps the fault, dear reader, is not in ourselves but in our stars—the stars that we use in published tables to identify statistically significant results as being scientifically important.

### Academic Impostor Syndrome

This is a little outside my usual blogging oeuvre, but I saw an article in the Chronicle that I really think is worth a read:

It’s something that strongly spoke to my experience as an academic.

Methodologists are often required to demonstrate the utility of our method by using it to critique existing research. But I think we should all try our best to assume that other researchers are smart, honest, and well-meaning people; that we are engaged in a collective enterprise to understand our world; and that when criticisms come, they come from a position of respect and with the goal of understanding, not to “one-up” somebody or win a competition.

I have no idea how empirically accurate that description is, but it’s the kind of science that I want to do and I’m sticking with it on the theory that one should embody what they wish to see in the world.

### Goin’ rogue on p-values

I think it’s fair to say that anyone who’s spent any time teaching statistics has spent a good deal of that time trying to explain to students how to interpret the p-value produced by some test statistic, like the t-statistic on a regression coefficient. Most students want to interpret the p-value as $\Pr(\beta = 0 | \hat{\beta} = \hat{\beta}_{0})$, which is natural since this is the sort of thing that an ordinary person wants to learn from an analysis and a p-value is a probability. And all these teachers, including me of course, have explained that $p = \Pr(\hat{\beta} \geq \hat{\beta}_{0} | \beta = 0)$ or equivalently $\Pr(\hat{\beta} = \hat{\beta}_{0} | \beta \leq 0)$ if you don’t like the somewhat unrealistic idea of point nulls.

There was a recent article in the New York Times that aroused the ire of the statistical blogosphere on this front. I’ll let Andrew Gelman explain:

Today’s column, by Nicholas Balakar, is in error. …I think there’s no excuse for this, later on:

By convention, a p-value higher than 0.05 usually indicates that the results of the study, however good or bad, were probably due only to chance.

This is the old, old error of confusing p(A|B) with p(B|A). I’m too rushed right now to explain this one, but it’s in just about every introductory statistics textbook ever written. For more on the topic, I recommend my recent paper, P Values and Statistical Practice, which begins:

The casual view of the P value as posterior probability of the truth of the null hypothesis is false and not even close to valid under any reasonable model, yet this misunderstanding persists even in high-stakes settings (as discussed, for example, by Greenland in 2011). The formal view of the P value as a probability conditional on the null is mathematically correct but typically irrelevant to research goals (hence, the popularity of alternative—if wrong—interpretations). . . .

Huh. Well, I’ve certainly heard and said something like this plenty of times, but…

You are now leaving the reservation.

Consider the null hypothesis that $\beta \leq 0$. If we’re going to be Bayesians, then the posterior probability $\Pr(\beta\leq0|\hat{\beta}=\hat{\beta}_{0})$ is $\left(\Pr(\hat{\beta}=\hat{\beta}_{0}|\beta\leq0)\Pr(\beta\leq0)\right)/\left(\Pr(\hat{\beta}=\hat{\beta}_{0})\right)$, or $\left(\intop_{-\infty}^{0}f(\hat{\beta}=\hat{\beta}_{0}|\beta)f(\beta)d\beta\right)/\left(\intop f(\hat{\beta}=\hat{\beta}_{0}|\beta)f(\beta)d\beta\right)$.

Suppose that we are ignorant of $\beta$ before this analysis, and thus specify an uninformative (and technically improper) prior $f(\beta)=\varepsilon$, the uniform distribution over the entire domain of $\beta$. Then the denominator is equal to $\varepsilon$, as this constant can be factored out and the remaining component integrates to 1 as a property of probability densities. We can also factor out the constant $\varepsilon$ from the top of this function, and so this cancels with the denominator.

We are left with $\intop_{-\infty}^{0}f(\hat{\beta}=\hat{\beta}_{0}|\beta)f(\beta)d\beta$,which is just the p-value (where we consider starting with the likelihood density conditional on $\beta = 0$ with a horizontal line at $\hat{\beta}$, and then sliding the entire distribution to the left adding up the area swept under the likelihood by that line).

So: the p-value is the rational belief that an analyst should hold that the null hypothesis is true, when we have no prior information about the parameter.

This is by no means a novel result; I can recall learning something like it in one of my old classes. It is noted by Greenland and Poole’s 2013 article in Epidemiology (good luck getting access, though–I only knew about it through Andrew’s commentary). The only thing I’ve done here that’s just slightly different from some treatments that I’ve seen is that I’ve stated the null as an interval, $\beta \leq 0$, and the estimate information as a point. That avoids the criticism that point nulls are unrealistic, which seems to be one of Gelman’s objections in the aforementioned commentary; instead of integrating over the space of $\hat{\beta}$ as usual, sliding the value of $\hat{\beta}$ under its distribution to get the integral, I think of fixing $\hat{\beta}$ in place and sliding the entire distribution (i.e., $\beta$) to get the integral.

It’s still true that the p-value is not really the probability that the null hypothesis is true: that probability is zero or one (depending on the unknown truth). But the p-value is our optimal rational assessment about the chance that the null is true. That’s pretty easy to explain to lay people and pretty close to what they want. In the context of the article, I think it would be accurate to say that a p-value of 5% indicates that, if our model is true, the rational analyst would conclude that there is a 5% chance that this data were generated by a parameter in the range of the null hypothesis.

Accepting that the p-value really can have the interpretation that so many lay people wish to give it frees us up to focus on what I think the real problems are with focusing on p-values for inference. As Andrew notes on pp. 71-72 of his commentary, chief among these problems is that holding a 95% belief that the null is false after seeing just one study only incorporates the information and uncertainty embedded in this particular study, not our larger uncertainty about the nature and design of this study per se. That belief doesn’t encapsulate our doubts about measures used, whether the model is a good fit to the DGP, whether the results are the product of multiple comparisons inside of the sample, and just our general skepticism about all novel scientific results. If we embed all those sources of doubt into a prior, we are going to downweight both the size of the “signal” detected and the “signal-to-noise” ratio (e.g., our posterior beliefs about the possibility that the null hypothesis is true).

Isn’t it more important to criticize the use of p-values for these reasons, all of which are understandable by a lay person, rather than try to inculcate journalists into the vagaries of sampling theory? I think so. It might even prompt us to think about how to make the unavoidable decisions about evidence that we have to make (publish or discard? follow up or ignore?) in a way that’s more robust than asking “Is p<0.05?” but more specific than saying “just look at the posterior.” Of course, embedded in my suggestion is the assumption that Bayesian interpretations of statistical results are at least as valid as frequentist interpretations, which might be controversial.

Am I wrong? Am I wrong?

### An open letter to Senators Cruz and Cornyn, re: cutting the NSF’s Political Science program

Dear Senators Cruz and Cornyn,

I’m an assistant professor of Political Science at Rice University, and I hope that you’ll oppose Senator Coburn’s amendment to de-fund the Political Science program at the National Science Foundation (the Coburn amendment to HR 933 currently before the Senate).

Political Science has evolved into a data-intensive, methodologically sophisticated STEM discipline over the last 40 years. Our work is ultimately focused on the understanding and forecasting of politically important phenomena. We model and predict civil war outbreaks, coups, regime changes, election outcomes, voting behavior, corruption, and many other scientifically important topics. Techniques that we develop are used by national security agencies like the CIA and DOD to forecast events of political importance to the United States, and many of our PhDs go on to work directly for the government or contracting firms in this capacity. Indeed, many political scientists consult for these and other agencies to supplement our normal teaching and research.

The basic scientific work that underlies these activities and enables them to improve in accuracy is funded by the National Science Foundation. As in any science, much of this work is technical or deals with smaller questions. The technology that allows for image enhancement in spy satellites and telescopes was built upon statistical work in image processing and machine learning that seemed just as technical and trivial at first (as I recall, much of this work focused on enhancing a picture of a Playboy centerfold!). The technology that allows for sifting and identification of important information in large databases (used in various surveillance programs) stems from work on machine learning that ultimately grew from (among many other things) simple mathematical models of a single neuron.

We buy the NSF Political Science program for far less than we pay for a single F-35 fighter jet (about $11m vs. about$200m).

My sense is that many politicians believe that funding Political Science research is frivolous because we are doing the same work that pundits (or politicians themselves) do. But as the examples above illustrate, our research is heavily data-driven and targeted at understanding and predicting political phenomena, not in providing commentary, promoting policy change, or representing a political agenda. To be sure, some political scientists do that, just like biologists and physicists—on their own time, and not with NSF money.

I hope that you will see that investment in Political Science research is as important, and far cheaper, than the investments we make in the National Institutes of Health and physical science divisions of the NSF. Scientific advancement is not partisan and not ideological.

Dr. Justin Esarey
Assistant Professor of Political Science
Rice University (Houston, TX)

### Readin’ Up on Publication Bias

After last week’s post, I’ve been reading more of the literature out there on bias in the distribution of published effects. There’s a lot more out there than I thought! I thought it might be nice to have a little reading list put together and to think about where further development would be most useful.

I’ve already mentioned Ioannidis’ 2005 piece on “Why Most Published Research Findings Are False,” which is a great piece and a nice place to start (if you don’t want to go all the way back to the original publication of the “file drawer problem”). But I wasn’t aware of another piece on he wrote about “Why Most Discovered True Associations Are Inflated” in 2008, which makes the same point about bias that I made in my post. It’s well-worth a read! However, I’m not satisfied with the suggested correctives (as summarized by a contemporaneous post in Marginal Revolution that I now quote):

1. In evaluating any study try to take into account the amount of background noise.  That is, remember that the more hypotheses which are tested and the less selection which goes into choosing hypotheses the more likely it is that you are looking at noise.
2. Bigger samples are better.  (But note that even big samples won’t help to solve the problems of observational studies which is a whole other problem).
3. Small effects are to be distrusted.
4. Multiple sources and types of evidence are desirable.
5. Evaluate literatures not individual papers.
6. Trust empirical papers which test other people’s theories more than empirical papers which test the author’s theory.
7. As an editor or referee, don’t reject papers that fail to reject the null.

I think (1) and (6) tend to discourage creativity and unexpected discovery in science (a countervailing cost that should be considered before we force pre-registration on everyone), (2) and (3) don’t give a reader a good diagnostic way of evaluating whether a particular result is to be trusted or not (and don’t give the editor another way of screening papers, if they intend to follow suggestion (7)), and (4) and (5) are true but a little trivial (though point (5) could use repeating as often as possible IMO).

A similar point has been made in the fMRI literature by Tal Yarkoni (“Inflated fMRI Correlations Reflect Low Statistical Power”) which is good to know, especially if (like me) you’ve been interested in fMRI studies in political science. He didn’t know about Ioannidis’ paper, either! Of course, that was a few years ago, so he had a better excuse.

Gelman and Weakliem published a semi-related piece in the American Scientist which, in short, cautions people against trusting small studies that report large effect sizes where small effect sizes are expected. They also suggest performing a retrospective power analysis on published studies, which I think could be a good starting point for developing a more formal screening procedure.

One thing I like about a recent paper on “The Rules of the Game Called Psychological Science” is that it tries to use simulation to assess the impact of different publication strategies on the prevalence of false and biased results in the literature, which I think is a great idea. I also like the idea for testing for an excess of statistically significant results in a literature, an idea the paper attributes to Ioannidis and Trikalinos 2007, although again I am not crazy about the idea of simply yelling at authors and editors for failing to publish statistically insignificant findings without proposing a new diagnostic for assessing the noteworthiness of a scientific paper (presuming that we have criteria more specific than “I know a good paper when I see it” and more restrictive than “every well-designed study gets published”).

So, as far as I can tell right now, there is some value in communicating this message to applied political scientists but even more value in trying to develop diagnostic criteria for assessing published articles and more still in trying to propose afiltering/sorting criterion for publication that diminishes the frequency and magnitude of false results while still identifying the most noteworthy results and maintaining a high level of quality control.

### A brief reflection on stats blogging

The really great reaction I had to yesterday’s post about bias in published relationships got me thinking some “deep thoughts” about blogging as a statistical researcher.

Good news: the post I made yesterday got a lot of attention!

Bad news: there were a lot of (fortunately minor!) errors and bugs in the post that didn’t interfere with the overall point, but certainly were annoying!

Worse news: every time I tried editing things to clean up these errors, I often created even more formatting hassles such that I eventually strained my eye muscles from staring at the screen too hard!

I’ve been thinking about the blog as a written window into on-going research that I and my current graduate student(s), are working on. For me, it’s a way of setting out some ideas and thoughts in a systematic way that provides the initial structure for more formalized publication, with the added benefit of making that ongoing research available to the public and open for improvement and commentary by the scholarly community. It lets me gauge how important or interesting what I’m working on is to that community, and gets me suggestions on what to read and how to improve those ideas.

Concordantly, the things that I post are a lot more crystallized than an offhand conversation I might have at lunch with a colleague, but substantially less vetted and error-checked than they would be in a working paper or a publication.

So what happens when something I say catches the imagination and gets shared and re-posted? What, exactly, are the editorial standards for a blog post? Am I allowed to be a little wrong, or even totally wrong? Obviously any writer’s incentives are to be as precise and correct as possible in all things, so this is not a moral hazard issue.

I think that, on balance, I like the idea of blogging about research “in real time,” as it were, including some degree of mistakes and false starts that inevitably arise along the way. There are limits, of course–this isn’t Ulysses. But hearing people’s reactions to ideas and getting their suggestions as the project comes together is extremely helpful and also makes research a more social, enjoyable process for me.

Which leads me to issue #2: boy, I’m having a hard time finding desktop software that I really like! I’ve been using Windows Live Writer 2012 up til now, but I tried writing yesterday’s post with Word 2013’s blogging feature. It worked… except that all the MathType equations I used got blanked out, and so I had to go back and manually rewrite all the math equations using $\LaTeX$ notation. Which was delightful.

I also discovered the sourcecode feature of WordPress, which allows you to do stuff like:

set.seed(1239281)
x <- runif(20000, mean=0, sd=1)
plot(density(x))


Which is great! Except that I’ve had a hard time making Windows Live Writer play nicely with that kind of thing (it appears to want to insert all the usual HTML tags and what not into the code, which of course messes it up). So I’ve had to post it with WLW, and then go back to the WordPress client to clean up the code later. Not cool.

I ultimately figured out that you have to edit the HTML source in WLW, add the <PRE> and </PRE> html tags around your source code, and type the code directly into the HTML. That seems to work. I did try a plugin that supposedly handles all this for you, but wasn’t satisfied with the results.  EDIT: Nope. That didn’t work either because WLW wants to escape a < character as its HTML equivalent, &lt;, and apparently that doesn’t get interpreted correctly. So I’m back to using the WordPress on-line editor, which I guess is where I’m going to be stuck for the foreseeable future.

So I’m still waiting for a math/code enabled WYSIWYM platform for WordPress that’s as good as LyX is for writing papers in $\LaTeX$. And I guess I’ll just have to go on waiting…

### Another cool aggregator site

There’s another site I’ve found that collects posts from a bunch of statistics/quantitative social science blogs:

http://www.statsblogs.com/

### Cool aggregator site for R

I’ve signed up my blog to be a part of the R-bloggers aggregator site, a “blog of blogs” specifically about statistical analysis in the R programming environment. I highly recommend subscribing to its RSS feed, if you have a reader. There are a lot of interesting posts that I find there each week!

### Kuhn, Scientific Revolutions, and the Social Sciences

The New Atlantis published a recent retrospective on Kuhn's Structure of Scientific Revolutions that's well worth a look. I think the article rightly notes that social scientists have eagerly embraced Kuhn's ideas, in the sense that the average political scientist is quite likely to have at least spent a graduate seminar meeting or two discussing the relationship between Kuhn's thesis and the practice of social science.

Yet I can't help but think that it gets several important points wrong. For instance:

Despite these criticisms, many social scientists embraced — or perhaps appropriated — Kuhn’s thesis. It enabled them to elevate the status of their work. The social sciences could never hope to meet the high standards of empirical experimentation and verifiability that the influential school of thought called positivism demanded of the sciences. But Kuhn proposed a different standard, by which science is actually defined by a shared commitment among scientists to a paradigm wherein they refine and apply their theories. Although Kuhn himself denied the social sciences the status of paradigmatic science because of their lack of consensus on a dominant paradigm, social scientists argued that his thesis could still apply to each of those competing paradigms individually. This allowed social scientists to claim that their work was scientific in much the way Kuhn described physics to be.

I'm willing to be persuaded otherwise, but my subjective impression is that Popper has been far more influential on (quantitative) political science than Kuhn. It's Popperian falsificationism that informs the scientific process that we teach to our undergrads in the standard research methods curriculum. Popperian falsificationism drove the AJPS's previous policy of requiring explicit statements of hypothesis, test, and result in the abstract. Popperian falsificationism informs criticisms of non-experimental empirical social science offered by Green, inter alia.

Some things the author says are kinda true, but I think are taken in a misleading direction:

A scientific way of thinking permeated the writings of Auguste Comte and Karl Marx, and by the end of the century, with the work of Max Weber and Émile Durkheim, the era of social science had begun in earnest. Many of the early social scientists came to view society in terms of contemporary physics; they adopted the Enlightenment belief in science as the source of progress, and considered physics the archetypical science. They understood society as a mechanism that could be engineered and adjusted. These early social scientists began to deem philosophical questions irrelevant or even inappropriate to their work, which instead became about how the mechanism of society operated and how it could be fixed. The preeminence of physics and mechanistic thinking was passed down through generations of social scientists, with qualitative characterization considered far less valuable and less “scientific” than quantitative investigations. Major social scientific theories, from behaviorism to functionalism to constructivism and beyond, tacitly think of man and society as machines and systems.

I'm not an expert on Weber and Durkheim, but I have read some of their methodological writings. Durkheim wrote an entire book about the philosophical underpinnings of his method. Weber's method of verstehen, interpretive understanding, launched a thousand essays about qualitative interpretationalism in political science–most of them decidedly opposed to positivism in the social sciences. That's not to say that many quantitative political scientists don't cling to a somewhat philosophically backward version of epistemological positivism; I suspect that many, maybe even most, do. But methodologists, the teachers and developers of quantitative methods in political scientists, are ceaselessly ragging on people to take a more sophisticated view (just take a look at this naturalistic but non-logical-positivist book just released by Kevin Clarke).

Still, some of the things the author says are pretty interesting. For instance:

A recent paper in the journal Theory in Biosciences perfectly encapsulates the desire for a more biological perspective in the social sciences, arguing for “Taking Evolution Seriously in Political Science.” The paper outlines the deterministic dangers in the view of social systems as Newtonian machines, as well as the problems posed by the reductionist belief that elements of social systems can be catalogued and analyzed. By contrast, the paper argues that approaching social sciences from an evolutionary perspective is more appropriate philosophically, as well as more effective for scientific explanation. This approach allows us to examine the dynamic nature of social changes and to explain more consistently which phenomena last, which disappear, and which are modified, while still confronting persistent questions, such as why particular institutions change.

Reading over a preprint of the article, it makes some interesting (if slightly superficial) observations about the difference between an explanandum that's amenable to repeatable experimentation and one which is data-driven and factual but largely based on uncontrolled observational evidence. The latter is not unscientific, but certainly not idiographic (a new word I learned from this month's Perspectives on Politics). But I'd like to hear a more thorough and more philosophically developed epistemological framework that relates the practice of evolutionary biologists to some foundational perspective on the world in such a way that we would expect one to lead to a better understanding of the other. That would, I think, be necessary before I could get behind using evolutionary biologists as a model for political science.

The Theory in Biosciences article does say:

Many political scientists today are searching for a better understanding of the mechanisms of political change. The problem analytically, is that most political science models are static. For rational choice, this is due to the theoretical argument that any given institutional setting will eventually reach an equilibrium in which “no one has the incentive to change his or her choice” (Levi 1997: 27). Consequently the only source of change is exogenous. As Levi argues, “it is obvious that choices change regularly and constantly. . . To understand these changes requires a set of hypotheses concerning what exogenous shocks or alterations to the independent variables will have what effects on the actions of the individuals under study” (Levi, 1997: 28).39 Given the foundational assumptions and logic of rational choice, “endogenous institutional change appears,” as Hall and Taylor observe, “to be a contradiction in terms.”

Now that's something I completely agree with… and yet, I see very little research being done on truly dynamic theoretical models in political science. Dynamic statistical models, sure. But they can't substitute for institutional political theories that seamlessly integrate change over time into their explanatory framework.