EurekAlert from AAAS
Home About us
Advanced Search
20-Apr-2014 09:05
Beijing Time




Forgot Password?

Breaking News

Multimedia Gallery



Events Calendar

Selected Science Sources in China











Text Size Option


English (英文)

Chinese (中文)

In The Spotlight

The English-language version of this article originally appeared in the journal Science, published by the American Association for the Advancement of Science (AAAS), on 1 October 2010, Volume 330, page 18. Reference: This Chinese-language version of the article is being made freely available as a service to Chinese-speaking reporters. Please cite the journal Science and include the Web address, in any coverage of this article. Join the nonprofit AAAS, established in 1848, and subscribe to Science by logging onto this Web site:

US graduate education: academy rankings tell you a lot, but not who's no. 1 in any field

By Jeffrey Mervis

Perhaps it should be called the Mr. Potato Head of graduate school rankings.

Remember how easy it was to alter the appearance of that toy's bland, tubular face by sticking an ear or an eye in an unexpected place? Well, the latest analysis of the quality of U.S. research doctoral programs by the National Academies' National Research Council (NRC) can be manipulated in much the same way. But the exercise is hardly child's play.

This week's release of the long-awaited assessment, the first since 1995 and 3 years behind schedule, disgorges a massive amount of information about 5100 doctoral programs in 62 fields at 212 U.S. universities. More than a decade in the making, the assessment is meant to reflect the collective wisdom of the U.S. research community on what defines a top-quality graduate program. In an era of increased accountability, it's also designed to address questions from students, faculty members, university administrators, elected officials, and the public about the quality of any particular graduate program.

Yet that strength—the ability to serve as many audiences as possible—may also be the assessment's most controversial feature. Those who simply want to know who's No. 1 in neuroscience, for example, or read a list of the top 10 graduate programs in any particular field will walk away disappointed after massaging the report's Excel spreadsheets, available at or That's because, like Mr. Potato Head, the NRC assessment can look quite different depending on your definition of "best."

To be sure, NRC does rank programs—but oh so carefully. Instead of assigning a single score to each program in a particular field, the assessment ranks the program on five different scales. Each score is also presented as a range of rankings reflecting the 5th and 95th percentiles of the scores it received. The scales themselves are based on 20 characteristics (see table, p. 19) that the NRC panel deemed appropriate for a quantitative assessment. Two are supposed to portray the overall quality of the program—one derived from a reputational survey (the R scale), the other from a quantitative analysis (the S scale). Three others rely on subsets that address important dimensions of quality: research activity, student support and outcomes, and diversity. The report itself highlights the uncertainties generated by such an exercise by calling the results "illustrative rankings [that] are neither endorsed nor recommended by the NRC as an authoritative conclusion about the relative quality of doctoral programs."

Given all those caveats, some university administrators are taking the rankings with more than a grain of salt. "We're pleased with how well our own programs ranked," says Patricia Gumport, dean of the graduate school at Stanford University. "But we have concerns about the methodology. So we're not planning to use the range of rankings."

It's easy to see the source of Gumport's concern by looking at what the assessment says about Stanford's anthropology department, to pick just one example. The department is ranked between 13th and 47th on the R scale and between 3rd and 9th on the S scale. In addition, it falls between 3rd and 14th on research activity, between 1st and 43rd on student support and outcomes, and between 12th and 33rd on diversity.

"It's difficult to draw meaningful conclusions about the relative quality of programs from these ranges of rankings," says Gumport with impressive understatement. Instead, she and her deans plan to mine the database to compare the performance of the university's 47 programs on one or more characteristics, or to see how a particular Stanford program stacks up with its peers around the country on those characteristics.

Pieces of the whole. The NRC report asked faculty members to weigh the relative importance of these 20 characteristics in determining a quality graduate program..

That's exactly what NRC hopes will happen. "We wanted to give people the chance to create rankings based on variables that they thought were important," says Charlotte Kuh, the NRC staffer who has lived and breathed the $6.7 million assessment since it was launched in 2004 and who has made countless presentations to the graduate school community since then on its progress or lack thereof. That's especially true for different audiences, she adds: "While faculty have certain values, students may be worried about other things." For example, an undergraduate who's thinking of becoming a microbiologist can find out how long it takes students at University X to complete their Ph.D. degrees, or what share of graduates from University Y find academic jobs. Likewise, an engineering dean interested in increasing the number of women or minority faculty members and students can compare the gender and racial diversity of her programs with those of others.

The absence of a single score separates NRC's rankings from those done by several other organizations, in particular, U.S. News and World Report, whose influential annual assessments of the "best" universities emulate those for college sports teams by offering the type of ordinal ranking that readers seem to crave. "We felt it was more responsible to be accurate," say Jeremiah Ostriker, a professor of astrophysics and former provost at Princeton University who chaired the committee that carried out the assessment. "That's especially the case if a small change in a range of rankings could make one school seem better than another."

To understand why one number is never enough, you need to understand how the panel went about its business. Asked by the National Academies to prepare a data-based assessment, the committee first whipped up a batch of 20 program characteristics. Then it served up those characteristics in two different ways.

The first involved asking 87,000 faculty members to weigh the importance of each characteristic. Then it applied those weights, in combination with data on those measures from institutions, faculty members, and students, to rate each one of 4838 programs in 59 fields. (It collected data but did not rate three fields that fell below a minimum size and frequency.) That's the S, or quantitative, ranking.

The second method asked 8000 faculty members to rank the overall quality of up to 15 programs in their field. Then NRC used a regression analysis to determine the perceived weights that each faculty member used in rating each program. It did this 500 times, selecting a different set of raters each time. That's the R, or reputational, ranking.

That process differs markedly from the 1995 assessment, which ranked programs based directly on their reputations. "This time around, we were not interested in the reputations per se," explains Ostriker. "Reputations suffer from many flaws, including a halo effect, time lag, and so on."