Four great reasons to stop caring so much about the h-index

You’re surfing the research literature on your lunch break and find an unfamiliar author listed on a great new publication. How do you size them up in a snap?

izaNcrp.png

Google Scholar is an obvious first step. You type their name in, find their profile, and–ah, there it is! Their h-index, right at the top. Now you know their quality as a scholar.

Or do you?

The h-index is an attempt to sum up a scholar in a single number that balances productivity and impact. Anna, our example, has an h-index of 25 because she has 25 papers that have each received at least 25 citations.

Today, this number is used for both informal evaluation (like sizing up colleagues) and formal evaluation (like tenure and promotion).

We think that’s a problem.

The h-index is failing on the job, and here’s how:

1. Comparing h-indices is comparing apples and oranges.

Let’s revisit Anna LLobet, our example. Her h-index is 25. Is that good?

Well, “good” depends on several variables. First, what is her field of study? What’s considered “good” in Clinical Medicine (84) is different than what is considered “good” in Mathematics (19). Some fields simply publish and cite more than others.

Next, how far along is Anna in her career? Junior researchers have a h-index disadvantage. Their h-index can only be as high as the number of papers they have published, even if each paper is highly cited. If she is only 9 years into her career, Anna will not have published as many papers as someone who has been in the field 35 years.

Furthermore, citations take years to accumulate. The consequence is that the h-index doesn’t have much discriminatory power for young scholars, and can’t be used to compare researchers at different stages of their careers. To compare Anna to a more senior researcher would be like comparing apples and oranges.

Did you know that Anna also has more than one h-index? Her h-index (and yours) depends on which database you are looking at, because citation counts differ from database to database. (Which one should she list on her CV? The highest one, of course. :))

2. The h-index ignores science that isn’t shaped like an article.

What if you work in a field that values patents over publications, like chemistry? Sorry, only articles count toward your h-index. Same thing goes for software, blog posts, or other types of “non-traditional” scholarly outputs (and even one you’d consider “traditional”: books).

Similarly, the h-index only uses citations to your work that come from journal articles, written by other scholars. Your h-index can’t capture if you’ve had tremendous influence on public policy or in improving global health outcomes. That doesn’t seem smart.

3. A scholar’s impact can’t be summed up with a single number.

We’ve seen from the journal impact factor that single-number impact indicators can encourage lazy evaluation. At the scariest times in your career–when you are going up for tenure or promotion, for instance–do you really want to encourage that? Of course not. You want your evaluators to see all of the ways you’ve made an impact in your field. Your contributions are too many and too varied to be summed up in a single number. Researchers in some fields are rejecting the h-index for this very reason.

So, why judge Anna by her h-index alone?

Questions of completeness aside, the h-index might not measure the right things for your needs. Its particular balance of quantity versus influence can miss the point of what you care about. For some people, that might be a single hit paper, popular with both other scholars and the public. (This article on the “Big Food” industry and its global health effects is a good example.) Others might care more about how often their many, rarely cited papers are used often by practitioners (like those by CG Bremner, who studied Barrett Syndrome, a lesser known relative of gastroesophageal reflux disease). When evaluating others, the metrics you’re using should get at the root of what you’re trying to understand about their impact.

4. The h-index is dumb when it comes to authorship.

Some physicists are one of a thousand authors on a single paper. Should their fractional authorship weigh equally with your single-author paper? The h-index doesn’t take that into consideration.

What if you are first author on a paper? (Or last author, if that’s the way you indicate lead authorship in your field.) Shouldn’t citations to that paper weigh more for you than it does your co-authors, since you had a larger influence on the development of that publication?

The h-index doesn’t account for these nuances.

So, how should we use the h-index?

more than my h-index.pngMany have attempted to fix the h-index weaknesses with various computational models that, for example, reward highly-cited papers, correct for career length, rank authors’ papers against other papers published in the same year and source, or count just the average citations of the most high-impact “core” of an author’s work.

None of these have been widely adopted, and all of them boil down a scientist’s career to a single number that only measures one type of impact.

What we need is more data.

Altmetrics–new measures of how scholarship is recommended, cited, saved, viewed, and discussed online–are just the solution. Altmetrics measure the influence of all of a researcher’s outputs, not just their papers. A variety of new altmetrics tools can help you get a more complete picture of others’ research impact, beyond the h-index. You can also use these tools to communicate your own, more complete impact story to others.

So what should you do when you run into an h-index? Have fun looking if you are curious, but don’t take the h-index too seriously.

19 thoughts on “Four great reasons to stop caring so much about the h-index

  1. kbradnam says:

    Item number 2 is particularly relevant to those people who work on Model Organism Databases and other genomics portals. Such work concerns a year-round effort to curate existing data, add new data, all the time while managing to keep databases and websites up and running.

    Infrastructure such as FlyBase, WormBase, Ensembl etc. is central to so much scientific research, but many people working on those resources (I was one such person) will often end up with a single publication a year in Nucleic Acids Research annual ‘database’ issue.

  2. I would agree that there are many problems with the h-index, not least that it encourages scientists to form cabals to increase their h-index; i.e. you can be a co-author on my paper if I’m a co-author on yours. There is anecdotal evidence that has happened in France where the h-index has become an important measure of scientific worth. However, I’m not convinced that altmetrics will add anything but noise to the equation. Altmetrics, such as downloads, Tweets and Facebook likes, measure the instantaneous appeal of an article; this will be largely determined by the title, abstract and authors. Furthermore, as we have shown, article level metrics can be subject to huge stochasticity (Eyre-Walker and Stoletzki 2013 – http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001675); we recently estimated that the number of citations a paper has accumulated 5 years after publication was largely determined by stochastic factors, not the underlying merit of the science. Altmetrics are likely to be no different – for example if your paper just happens to get Tweeted by someone with lots of followers then you are likely to get a lot of re-Tweets, and this may lead to an increase in the number of downloads..etc. I think Altmetrics might submerge us under a mountain of noisy data.

    • Hi Adam, thanks for your thoughts. Agreed there is a lot of noise, but there is also signal…. as you said, what gets tweeted isn’t random, it is a measure of something. Sometimes it may well just be the instantaneous appeal of an article, agreed. We think that has its own worth as a thing to measure… science is better off if some of us publish papers that are instantaneously appealing to lots of people 🙂

      That isn’t all there is to it though — that would only be so interesting. There is also evidence there are other clusters of activity. Papers with low tweet counts and citation counts but high Mendeley readership numbers for example, maybe because the papers are good for teaching or reference for a method. We did some exploratory analysis that supports this idea in the paper here: http://arxiv.org/abs/1203.4745 (and broader-appeal blog post here: http://researchremix.wordpress.com/2012/01/31/31-flavours/). More research is needed to see what these clusters actually are and how we can use them to tease out the different flavours of scholarship.

    • That is why what we need even more the just more data is better data. All reference managers should report their statistics (added, read , annotated, cited?) I’m a big zotero fan but if they dont start doing this soon I will switch to mendeley eventhough Im not a big fan of their owners. Next I wish sherpa/romeo would include policies regarding sharing of a journals stats (readershop, downloads, vieuws,number of comments,…) And journals peer review policies (Open, semi-open or closed/annonymous). These things would all be of great improvement to science metrics. Even better would be to get rid off “authorship in a line” altogether and put more specific autor contibution desciptions, maybe somekind of preselected tags that can be used to specyfy who did what. Im sure some inovative journal or service could come up with a way to do this eventually.

  3. Prof. C. F. Desai says:

    Only fools try to quantify quality. These two are distinct entities. People have been trying to invent newer and newer methods to assess quality because they don not know what ‘quality’ means. Quality speaks for itself, you don’t have to strive to somehow prove how much better one scientist’s quality is than others’. If the ‘assessment officer’ is of good quality, he/she can judge the quality of the assesses

  4. h-index doesn’t make any sense. Although there are so many examples for that, few examples are given here. There is one scientist with h-index of —- and —– citations even at the beginning of her career. But she hasn’t written a single research paper in her life. Because she got the membership of many CERN groups in the world, she got her name among the thousand names given at the end of many research papers. She is not even a coauthor of any of those papers. Just because CERN members publish papers in so called US or European journals, google scholar search robot finds all those papers. Most of the CERN members got higher h-index and higher number of citations like that.
    On the other hand, there are some researchers who have written 70 to 100 papers. But they have a lower h-index below 10 and less number of citations, just because google search robot can’t find even many good journals. Google search robot easily finds US journals, because it thinks that US journals are reputed journals. When I was doing my Ph. D at early nineties, I read several research papers. I found one good paper with original data of ferrite thin films published by some Japanese scientists on a Japanese journal. Few years after that, I found that some US researchers have deposited the same material on the same substrate using same techniques. But the data of US researchers are worse than the data published by Japanese researchers. But US researchers have published their worse data even after one year in US journal of applied Physics. So how can someone say that US journals are the best?

Leave a Reply