How big does our text-mining training set need to be?

We got some great feedback from reviewers our new Sloan grant, including a suggestion that we be more transparent about our process over the course of the grant. We love that idea, and you’re now reading part of our plan for how to do that: we’re going to be blogging a lot more about what we learn as we go.

A big part of the grant is using machine learning to automatically discover mentions of software use in the research literature. It’s going to be a really fun project because we’ll get to play around with some of the very latest in ML, which currently The Hotness everywhere you look. And we’re learning a lot as we go. One of the first questions we’ve tackled (also in response to some good reviewer feedback) is: how big does our training set need to be? The machine learning system needs to be trained to recognized software mentions, and to do that we need to give it a set of annotated papers where we, as humans, have marked what a software mention looks like (and doesn’t look like). That training set is called the gold standard. It’s what the machine learning system learns from. Below is copied from one of our reviewer responses:

We came up with the number of articles to annotate through a combination of theory, experience, and intuition.  As usual in machine learning tasks, we considered the following aspects of the task at hand:

  • prevalence: the number of software mentions we expect in each article
  • task complexity: how much do software-mention words look like other words we don’t want to detect
  • number of features: how many different clues will we give our algorithm to help it decide whether each word is a software mention (eg is it a noun, is it in the Acknowledgements section, is it a mix of uppercase and lowercase, etc)

None of these aspects are clearly understood for this task at this point (one outcome of the proposed project is that we will understand them better once we are done, for future work), but we do have rough estimates.  Software mention prevalence will be different in each domain, but we expect roughly 3 mentions per paper, very roughly, based on previous work by Howison et al. and others.  Our estimate is that the task is moderately complex, based on the moderate f-measures achieved by Pan et al. and Duck et al. with hand-crafted rules.  Finally, we are planning to give our machine learning algorithm about 100 features (50 automatically discovered/generated by word2vec, plus 50 standard and rule-based features, as we discuss in the full proposal).

We then used these estimates.  As is common in machine learning sample size estimation, we started by applying a rule-of-thumb for the number of articles we’d have to annotate if we were to use the most simple algorithm, a multiple linear regression.  A standard rule of thumb (see https://en.wikiversity.org/wiki/Multiple_linear_regression#Sample_size) is 10-20 datapoints are needed for each feature used by the algorithm, which implies we’d need 100 features * 10 datapoints = 1000 datapoints.  At 3 datapoints (software mentions) per article, this rule of thumb suggests we’d need 333 articles per domain.  

From there we modified our estimate based on our specific machine learning circumstance.  Conditional Random Fields (our intended algorithm) is a more complex algorithm than multiple linear regression, which might suggest we’d need more than 333 articles.  On the other hand, our algorithm will also use “negative” datapoints inherent in the article (all the words in the article that are *not* software mentions, annotated implicitly as not software mentions) to help learn information about what is predictive of being vs not being a software mention — the inclusion of this kind of data for this task means our estimate of 333 articles is probably conservative and safe.

Based on this, as well as reviewing the literature for others who have done similar work (Pan et al. used a gold standard of 386 papers to learn their rules, Duck et al. used 1479 database and software mentions to train their rule weighting, etc), we determined that 300-500 articles per domain was appropriate. We also plan to experiment with combining the domains into one general model — in this approach, the domain would be added as an additional feature, which may prove more powerful overall. This would bring all 1000-1500 articles to the test set.

Finally, before proposing 300-500 articles per domain, we did a gut-check whether the proposed annotation burden was a reasonable amount of work and cost for the value of the task, and we felt it was.

References

Duck, G., Nenadic, G., Filannino, M., Brass, A., Robertson, D. L., & Stevens, R. (2016). A Survey of Bioinformatics Database and Software Usage through Mining the Literature. PLOS ONE, 11(6), e0157989. http://doi.org/10.1371/journal.pone.0157989

Howison, J., & Bullard, J. (2015). Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for Information Science and Technology (JASIST), Article first published online: 13 MAY 2015. http://doi.org/10.1002/asi.23538

Pan, X., Yan, E., Wang, Q., & Hua, W. (2015). Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers. Journal of Informetrics, 9(4), 860–871. http://doi.org/10.1016/j.joi.2015.07.012

Comparing Sci-Hub and oaDOI

Nature writer Richard Van Noorden recently asked us for our thoughts about Sci-Hub, since in many ways it’s quite similar to our newest project, oaDOI. We love the idea of comparing the two, and thought he had (as usual) good questions. His recent piece on Sci-Hub founder Alexandra Elbakyan quotes some of our responses to him; we’re sharing the rest below:

Like many OA advocates, we see lots to admire in Sci-Hub.

First, of course, Sci-Hub is making actual science available to actual people who otherwise couldn’t read it. Whatever else you can say about it, that is a Good Thing.

Second, SciHub helps illustrate the power of universal OA. Imagine a world where when you wanted to read science, you just…did? Sci-Hub gives us a glimpse of what that will look like, when universal, legal OA becomes a reality. And that glimpse is powerful, a picture that’s worth a thousand words.

Finally, we suspect and hope that SciHub is currently filling toll-access publishers with roaring, existential panic. Because in many cases that’s the only thing that’s going to make them actually do the right thing and move to OA models.

All this said, SciHub is not the future of scholarly communication, and I think you’d be hard pressed to find anyone who thinks it is. The future is universal open access.

And it’s not going to happen tomorrow. But it is going to happen. And we built oaDOI to be a step along that path. While we don’t have the same coverage as SciHub, we are sustainable and built to grow, along with the growing percentage of articles that have open access versions. And as you point out, we offer a simple, straightforward way to get fulltext.

That interface was not exactly inspired by SciHub, but rather I think an example of convergent evolution. The current workflow for getting scholarly articles is, in many cases, absolutely insane. Of course this is the legacy of a publishing system that is built on preventing people from reading scholarship, rather than helping them read it. It doesn’t have to be this hard. Our goal at oaDOI is to make it less miserable to find and read science, and in that we’re quite similar to SciHub. We just think we’re doing it in a way that’s more powerful and sustainable over the long term.

Collaborating on a $635k grant to improve credit for research software

We’re thrilled to announce Impactstory will be collaborating with James Howison at the University of Texas-Austin on a project to improve research software by helping its creators get proper credit for their work. The project will be funded by a three-year, $635k grant from the Alfred P. Sloan foundation.

Research software is an essential component of modern science. But the tradition-bound scholarly credit system does not appropriately reward the academic unsung heroes who create research software, putting further development of software-intensive science in jeopardy. Even when software is mentioned, the mentions are often informal, such as URLs in footnotes or just names in text. Howison, working with doctoral student Julia Bullard, found that 63% of mentions in a random sample of 90 biology articles were informal (Howison and Bullard, 2014).

We’re going to help fix that.

We’ll be working with James and his lab to make a huge database of every research software project used in every paper in the biomedicine, astronomy, and economics literatures. This database will filled in using a deep learning system that’ll automatically extract both formal and informal mentions of software, after being trained on a large, manually-coded gold standard dataset.

We’ll use this database to build and study three cool prototype tools:

  • CiteSuggest will analyze submitted text or code and make recommendations for normalized citations using the software author’s preferred citation,
  • CiteMeAs will help software producers make clear requests for their preferred citations, and
  • Software Impactstory will help software authors demonstrate the scholarly impact of their software in the literature.

We believe these tools will help transform the scholarly reward system into one where where software is a first-class research products, and its authors get full academic credit for their work. This in turn will support the software-intensive open science system we need for the future.

The project will build on our experience creating Depsy, a platform to track the scholarly impact of Python and R packages with an emphasis on dependencies, and on James’ extensive experience researching development in open source software and software in science. For lots more detail on the whole thing, check out the submitted proposal (edit Nov 9, 2016:  note this document is not a complete representation of the proposal, since the application and approval process also involved confidential back and forth with reviewers.  The reviewers added great comments and insight that we’re incorporating into the work as we go forward.)

Thank you, Sloan.  Thanks to Program Director Josh Greenberg for his continued advice and encouragement, and to the grant reviewers for well-informed and helpful feedback. And thanks especially to James, who had this idea in the first place, brought us on board, and has been a patient, good-natured, and ingenious collaborator in a lot of hard work already. We can’t wait to get started!

What’s your #OAscore?

We’re all obsessed with self-measurement.

We measure how much we’re Liked online. We measure how many steps we take in a day. And as academics, we measure our success using publication counts, h-indices, and even Impact Factors.

But we’re missing something.

As academics, our fundamental job is not to amass citations, but to increase the collective wisdom of our species. It’s an important job. Maybe even a sacred one. It matters. And it’s one we profoundly fail at when we lock our work behind paywalls.

Given this, there’s a measurement that must outweigh all the others we use (and misuse) as researchers: how much of our work can be read?

This Open Access Week, we’re rolling out this measurement on Impactstory. It’s a simple number: what percentage of your work is free to read online? We’d argue that it’s perhaps the most important number associated with your professional life (unless maybe it’s the percentage of your work published with a robust license that allows reuse beyond reading…we’re calculating that too). We’re calling it your Open Access Score.

We’d like to issue a challenge to every researcher: find out your open access score, do one thing to raise it, and tell someone you did. It takes ten minutes, and it’s a concrete thing you can do to be proud of yourself as a scholar.

Here’s how to do it:

  1. Make an Impactstory profile. You’ll need a Twitter account and nothing more…it’s free, nonprofit, and takes less than five minutes. Plus along the way you’ll learn cool stuff about how often your research has been tweeted, blogged, and discussed online.
  2. Deposit just one of your papers into an Open Access repository. Again: it’s easy. Here’s instructions.
  3. Once you’re done, update your Impactstory, and see your improved score.
  4. Tweet it. Let your community know you’ve made the world a richer, more beautiful place because you’ve made you’ve increased the knowledge available to humanity. Just like that. Let’s spread that idea.

Measurement is controversial. It has pros and cons. But when you’re measuring the right things, it can be incredibly powerful. This OA Week, join us in measuring the right things. Find your #OAscore, make it better, tweet it out. If we’re going to measure steps, let’s make them steps that matter.

 

Crossposted on the Open Access Week blog.

Why researchers are loving the new Impactstory

We put our heart and soul into the new Impactstory and have been on pins and needles to hear what you think.  Well it’s been a week and the verdict is in — we’re hearing that the new version is awesome, fantastic, and truly excellent, a home run and must-have–an academic profile that’s exciting and relevant.

And so much more. So much more, in fact, that we wanted to a little break from the frenzied responding, bugfixing, and feature-launching we’ve been doing this week and summarize a bit of what we’ve heard.

What do you like?

A lot of users have appreciated that it now takes seconds and is super easy to set up a profile that’s blazing fast and smooth to use: it’s instant insights about your research.

Unlike speed, beauty is in the eye of the beholder–but our beholders seem delightfully agreed that our new look is great, great, great.  Whether users are calling it fresh or beautifully crafted, or sleek or smooth or snazzy, everyone seems to agree that the new version looks awesome, it looks pretty damn awesome. And we are pretty thrilled to hear that.

They’re enjoying that it’s got some fun 🙂 And, we’re not surprised to hear that people like the new price point of Free, making it easier to recommend to others.  

What’s it good for?

Impactstory helps researchers find impacts of their work beyond just citations. People have found mentions they didn’t know about on Wikipedia, discussion in cool blog posts, and reviews on Faculty of 1000. And not just numbers, but impact across the globe. Not just numbers but connecting with people: for instance user Peter van Heusden tweeted, “Using @Impactstory I discovered someone who is consistently promoting work I’m involved in, but who I had no idea existed!”

All this amounts to more than just a lovely ego boost (although it’s that too!). People are telling us that it’s motivating them to adopt more Open Science practices like uploading research slides to a proper repository, getting an ORCID, adding works to their ORCID profile, and celebrating their non-paper publications.

How are you using it?

People are already sending their Impactstory profiles to their funders, and their funders are loving them.  Researchers have added their new profile to their CV, and are planning on using Impactstory data to define innovative ‘pathway to impact’ for UK grants and in tenure and promotion packets.

Folks are including it in workshops.  And even better — building things with our open data! Check out the ferret.io plugin, it rolled out impactstory support this week and it’s really cool 🙂

What have we been doing?

We’ve made a bunch of changes this week in response to your feedback:

  • imports all your publications, not just DOIs.  Everything on your ORCID profile now displays in your Impactstory profile, and we’re working on getting more openness and altmetrics data
  • twitter integration
    • connecting twitter updates your profile pic so you don’t have to fight with gravatar
    • you don’t have to enter email manually–even faster signup
    • we’ll be using your twitter feed for achievements in the future
  • there’s a new Open Sesame achievement
  • we changed the scores at the top of the profile beside your picture; they are now counts of your achievements
  • the achievements and the import process are better documented
  • we rolled out dozens of smaller features, usability enhancements, and bugfixes.

What’s next?

We’re on our way to the FORCE16 conference this week.  We’ll be rolling the feedback from the conference along with your continued feedback into continued improvement to the app.

And you?  Join in with everyone showing off their profile, spread the word (this is how we will grow), and if you don’t have a profile, get one, and tell us what you think!

Finally, thanks.

Finally, we’d like to thank the hundreds of passionate people who have helped us with money and with moral support along the way, from our early days till now. It’s safe to say the new Impactstory is a big hit.  It’s our hit, together.

 

The new Impactstory: Better. Freer.

We are releasing a new version of Impactstory!

https://impactstory.org/u/0000-0001-6728-7745

https://impactstory.org/u/0000-0001-6728-7745

We baked what we’ve learned from hundreds of conversations with researchers into a sleeker, leaner, more useful Impactstory.

Our new Achievements showcase your meaningful accomplishments, not just counts. Our new three-part score helps you track your buzz, engagement, and openness. And next-generation notification emails are improved to tell you what you want to know reliably every week.

And of course we’ve got a slew of other new features as well, including Depsy integration, ORCID sync-on-demand, and full support for mobile.

What’s more, we’re simplifying and streamlining everywhere, eliminating little-used features and doubling down on what users have told us they love. Profile creation is now only via ORCID, we only deal in DOIs, and citation metrics are gone. As a result, creating a profile takes just seconds, our support for diverse research products (preprints, datasets, etc) is bulletproof, and metrics are now consistently clear and up-to-date. Along with a complete code rewrite, these changes make Impactstory faster and more reliable than it’s ever been.

Last but not least, not only are we making Impactstory better: we’re making it cheaper. As in, all the way cheaper. Free!

Why? We heard you love the idea, but not the price–largely because your disciplines or departments aren’t quite ready to use altmetrics for evaluation. We can see this is starting to change, and want to help that change happen as quickly as possible. That means letting as many researchers as possible engage with altmetrics, right now. Free helps that happen.

Alternative sustainability models (like freemium features and new grants) will allow us to continue to build and maintain tools like Impactstory and Depsy to help change how researchers think about understanding and measuring the influence of their work.

Sound good? It is. We think you’ll love it. Go make yourself a profile and see what you learn: https://impactstory.org (and if you’re a current impactstory subscriber check your email for migration details).

We think this new Impactstory the best thing we’ve ever done, and it’s a big step towards creating the open science, altmetrics-powered future we believe in. Thanks building that future with us. We’re looking forward to hearing what you think!

Let’s value the software that powers science: Introducing Depsy

Today we’re proud to officially launch Depsy, an open-source webapp that tracks research software impact.

We made Depsy to solve a problem:  in modern science, research software is often as important as traditional research papers–but it’s not treated that way when it comes to funding and tenure. There, the traditional publish-or-perish, show-me-the-Impact-Factor system still rules.

We need to fix that. We need to provide meaningful incentives for the scientist-developers who make important research software, so that we can keep doing important, software-driven science.

Lots of things have to happen to support this change. Depsy is a shot at making one of those things happen: a system that tracks the impact of software in software-native ways.

That means not just counting up citations to a hastily-written paper about the software, but actual mentions of the software itself in the literature. It means looking how software gets reused by other software, even when it’s not cited at all. And it means understanding the full complexity of software authorship, where one project can involve hundreds of contributors in multiple roles that don’t map to traditional paper authorship.

Ok, this sounds great, but how about some specifics. Check out these examples:

  • GDAL is a geoscience library. Depsy finds this cool NASA-funded ice map paper that mentions GDAL without formally citing it. Also check out key author Even Rouault: the project commit history demonstrates he deserves 27% credit for GDAL, even though he’s overlooked in more traditional credit systems.
  • lubridate improves date handling for R. It’s not highly-cited, but we can see it’s making a different kind of impact: it’s got a very high dependency PageRank, because it’s reused by over 1000 different R projects on GitHub and CRAN.
  • BradleyTerry2 implements a probability technique in R. It’s only directly reused by 8 projects—but Depsy shows that one of those projects is itself highly reused, leading to huge indirect impacts. This indirect reuse gives BradleyTerry2 a very high dependency PageRank score, even though its direct reuse is small, and that makes for a better reflection of real-world impact.
  • Michael Droettboom makes small (under 20%) contributions to other people’s research software, contributions easy to overlook. But the contributions are meaningful, and they’re to high-impact projects, so in Depsy’s transitive credit system he ends up as a highly-ranked contributor. Depsy can help unsung heroes like Micheal get rewarded.
     

Depsy doesn’t do a perfect job of finding citations, tracking dependencies, or crediting authors (see our in-progress paper for more details on limitations). It’s not supposed to. Instead, Depsy is a proof-of-concept to show that we can do them at all. The data and tools are there. We can measure and reward software impact, like we measure and reward the impact of papers.

Embed impact badges in your GitHub README

Given that, it’s not a question of if research software becomes a first-class scientific product, but when and how. Let’s start having the conversations about when and how (here are some great places for that). Let’s improve Depsy, let’s build systems better than Depsy, and let’s (most importantly) start building the cultural and political structures that can use these systems.

For lots more details about Depsy, check out the paper we’re writing (and contribute!), and of course Depsy itself. We’re still in the early stages of this project, and we’re excited to hear your feedback: hit us up on twitter, in the comments below, or in the Hacker News thread about this post.

Depsy is made possible by a grant from the National Science Foundation.
edit nov 15 2015: change embed image to match new badge

Better than a free Ferrari: Why the coming altmetrics revolution needs librarians

This post was originally published as the forward to Meaningful Metrics: A 21st Century Librarian’s Guide to Bibliometrics, Altmetrics, and Research Impact [paywall, embargoed for 6mo]. It’s also persistently archived on figshare.

A few days ago, we were speaking with an ecologist from Simon Fraser University here in Vancouver, about an unsolicited job offer he’d recently received. The offer included an astonishing inducement: anyone from his to-be-created lab who could wangle a first or corresponding authorship of a Nature paper would receive a bonus of one hundred thousand dollars.

Are we seriously this obsessed with a single journal? Who does this benefit? (Not to mention, one imagines the unfortunate middle authors of such a paper, trudging to a rainy bus stop as their endian-authoring colleagues roar by in jewel-encrusted Ferraris.)  Although it’s an extreme case, it’s sadly not an isolated one. Across the world, A Certain Kind of administrator is doubling down on 20th-century, journal-centric metrics like the Impact Factor.

That’s particularly bad timing, because our research communication system is just beginning a transition to 21st-century communication tools and norms. We’re increasingly moving beyond the homogeneous, journal-based system that defined 20th century scholarship.

Today’s scholars increasingly disseminate web-native scholarship. For instance, Jason’s 2008 tweet coining the term “altmetrics” is now more cited than some of his peer-reviewed papers. Heather’s openly published datasets have gone on to fuel new articles written by other researchers. And like a growing number of other researchers, we’ve published research code, slides, videos, blog posts, and figures that have been viewed, reused, and built upon by thousands all over the world. Where we do publish traditional journal papers, we increasingly care about broader impacts, like citation in Wikipedia, bookmarking in reference managers, press coverage, blog mentions, and more. You know what’s not capturing any of this? The Impact Factor.

Many researchers and tenure committees are hungry for alternatives, for broader, more diverse, more nuanced metrics. Altmetrics are in high demand; we see examples at Impactstory (our altmetrics-focused non-profit) all the time. Many faculty share how they are including downloads, views, and other alternative metrics in their tenure and promotion dossiers, and how evaluators have enthused over these numbers. There’s tremendous drive from researchers to support us as a nonprofit, from faculty offering to pay hundreds of extra dollars for profiles, to a Senegalese postdoc refusing to accept a fee waiver. Other altmetrics startups like Plum Analytics and Altmetric.com can tell you similar stories.

At higher levels, forward-thinking policy makers and funders are also seeing the value of 21st-century impact metrics, and are keen to realize their full potential. We’ve been asked to present on 21st-century metrics at the NIH, NSF, the White House, and more. It’s not these folks who are driving the Impact Factor obsession; on the contrary, we find that many high-level policy-makers are deeply disappointed with 20th-century metrics as we’ve come to use them. They know there’s a better way.

But many working scholars and university administrators are wary of the growing momentum behind next-generation metrics. Researchers and administrators off the cutting edge are ill-informed, uncertain, afraid. They worry new metrics represent Taylorism, a loss of rigor, a loss of meaning. This is particularly true among the majority of faculty who are less comfortable with online and web-native environments and products. But even researchers who are excited about the emerging future of altmetrics and web-native scholarship have a lot of questions. It’s a new world out there, and one that most researchers are not well trained to negotiate.

We believe librarians are uniquely qualified to help. Academic librarians know the lay of the land, they keep up-to-date with research, and they’re experienced providing leadership to scholars and decision-makers on campus. That’s why we’re excited that Robin and Rachel have put this book together. To be most effective, librarians need to be familiar with the metrics research, which is currently advancing at breakneck speed. And they need to be familiar with the state of practice–not just now, but what’s coming down the pike over the next few years. This book, with its focus on integrating research with practical tips, gives librarians the tools they need.

It’s an intoxicating time to be involved in scholarly communication. We’ve begun to see the profound effect of the Web here, but we’re just at the beginning. Scholarship is on the brink of Cambrian explosion, a breakneck flourishing of new scholarly products, norms, and audiences. In this new world, research metrics can be adaptive, subtle, multi-dimensional, responsible. We can leave the fatuous, ignorant use of Impact Factors and other misapplied metrics behind us. Forward-thinking librarians have an opportunity to help shape these changes, to take their place at the vanguard of the web-native scholarship revolution. We can make a better scholarship system, together. We think that’s even better than that free Ferrari.

Why Nature’s “SciShare” experiment is bad for altmetrics

Early last week, Nature Publishing Group announced that 49 titles on Nature.com will be made free to read for the next year. They’re calling this experiment “SciShare” on social media; we’ll use the term as a shorthand for their initiative throughout this post.

Some have credited Nature on their incremental step towards embracing Open Access. Other scientists criticise the company for diluting true Open Access and encouraging scientists to share DRM-crippled PDFs.

As staunch Open Access advocates ourselves, we agree with our board member John Wilbanks: this ain’t OA. “Open” means open to anyone, including laypeople searching Google, who don’t have access to Nature’s Magic URL. “Open” also means open for all types of reuse, including tools to mine and build next-generation value from the scholarly literature.

But there’s another interesting angle here, beyond the OA issue: this move has real implications for the altmetrics landscape. Since we live and breathe altmetrics here at Impactstory, we thought it’d be a great time to raise some of these issues.

Some smart people have asked, “Is SciShare an attempt by Nature to ‘game’ their altmetrics?” That is, is SciShare an attempt to force readers to view content on Nature.com, thereby increasing total pageview statistics for the company and their authors?

Postdoc Ross Mounce explains:

If [SciShare] converts some dark social sharing of PDFs into public, trackable, traceable sharing of research via non-dark social means (e.g. Twitter, Facebook, Google+ …) this will increase the altmetrics of Nature relative to other journals and that may in-turn be something that benefits Altmetric.com [a company in which Macmillian, Nature’s parent company, is an investor].

No matter Nature’s motivations, SciShare, as it’s implemented now, will have some unexpected negative effects on researchers’ ability to track altmetrics for their work. Below, we describe why, and point to some ways that Nature could improve their SciShare technology to better meet researchers’ needs.

How SciShare works

SciShare is powered by ReadCube, a reference manager and article rental platform that’s funded by Macmillan via their science start-up investment imprint, Digital Science.

Researchers with subscription access to an article on Nature.com copy and paste a special, shortened URL (i.e. http://rdcu.be/bKwJ) into email, Twitter, or anywhere else on the Web.

Readers who click on the link are directed to a version of the article that they can freely read and annotate in their browser, thanks to ReadCube. Readers cannot download, print, or copy from the ReadCube PDF.

The ReadCube-shortened URL resolves to a Nature-branded, hashed URL that looks like this:

Screen Shot 2014-12-04 at 4.18.16 PM.png

The resolved URL doesn’t include a DOI or other permanent identifier.

In the ReadCube interface, users who click on the “Share” icon see a panel that includes a summary of Altmetric.com powered altmetrics (seen here in the lower left corner of the screen):

Screen Shot 2014-12-04 at 6.11.41 PM.png

The ReadCube-based Altmetric.com metrics do not include pageview numbers. Because ReadCube doesn’t work with assistive technology like screen readers, it also disallows for the tracking of the small portion of traffic that visually-impaired readers might account for.

That said, the potential for tracking new, ReadCube-powered metrics is interesting. ReadCube allows annotations and highlighting of content, and could potentially report both raw numbers and also describe the contents of the annotations themselves.

Number of redirects from the ReadCube-branded, shortened URLs could also be illuminating, especially when reported alongside direct traffic to the Nature.com-hosted version of the article. (Such numbers could provide hard evidence as to the proportion of OA vs toll access use of Nature journal articles.) And sources of Web traffic give a lot of context to the raw pageview numbers, as we’ve seen from publishers like PeerJ:

Screen Shot 2014-12-04 at 6.26.31 PM.png

After all, referrals from Reddit usually means something very different than referrals from PubMed.

Digital Science’s Timo Hannay hints that Nature will eventually report download metrics for their authors. There’s no indication as to whether Nature intends to disclose any of the potential altmetrics described above, however.

So, now that we know how SciShare works and the basics of how they’ve integrated altmetrics, let’s talk about the bigger picture. What does SciShare mean for researcher’s altmetrics?

How will SciShare affect researchers’ altmetrics?

Let’s start with the good stuff.

Nature authors will probably reap a big benefit in thanks to SciShare: they’ll likely have higher pageview counts for the Nature.com-hosted version of their articles.

Another positive aspect of SciShare is that it provides easy access to Altmetric.com data. That’s a big win in a world where not all researchers are aware of altmetrics. Thanks to ReadCube’s integration of Altmetric.com, now more researchers can find their article’s impact metrics. (We’re also pleased that Altmetric.com will get a boost in visibility. We’re big fans of their platform, as well as customers–Impactstory’s Twitter data comes from Altmetric.com).

SciShare’s also been implemented in such a way that the ReadCube DRM technology doesn’t affect researchers’ ability to bookmark SciShare’d articles on reference managers like Mendeley. Quick tests for Pocket and Delicious bookmarking services also seems to work well. That means that social bookmarking counts for an author’s work will likely not decline. (I point this out because when I attempted to bookmark a ReadCube.com-hosted article using my Mendeley browser bookmarklet Thursday, Dec. 4th, I was prevented from doing so, and actually redirected to a ReadCube advertisement. I’m glad to say this no longer seems to be true.)

Those are the good things. But there’s also a few issues to be concerned about.

SciShare makes your research metrics harder to track

The premise of SciShare is that you’ll no longer copy and paste an article’s URL when sharing content. Instead, they encourage you to share the ReadCube-shortened URL. That can be a problem.

In general, URLs are difficult to track: they contain weird characters that sometimes break altmetrics aggregators’ search systems, and they go dead often. In fact, there’s no guarantee that these links will be live past the next 12 months, when the SciShare pilot is set to end.

Moreover, neither the ReadCube URL–nor the long, hashed, Nature.com-hosted URL that it resolves to–contain the article’s DOI. DOIs are one of the main ways that altmetrics tracking services like ours at Impactstory can find mentions of your work online. They’re also preferable to use when sharing links because they’ll always resolve to the right place.

So what SciShare essentially does is introduce two new messy URLs that will shared online, and that have a high likelihood of breaking in the future. That means there’s a bigger potential for messier data to appear in altmetrics reports.

SciShare’s metrics aren’t as detailed as they could be

The Altmetric.com-powered altmetrics that ReadCube exposes are fantastic, but they lack two important metrics that other data providers expose: citations and pageviews.

On a standard article page on Nature.com, there’s an Article Metrics tab. The Metrics page includes data not only from Altmetric.com, but also CrossRef, Web of Science, and Scopus’s citation counts, and also pageview counts. And on completely separate systems like Impactstory.org and PlumX, still more citation data is exposed, sourced from Wikipedia and PubMed. (We’d provide pageview data if we could. But that’s currently not possible. More on that in a minute.)

ReadCube’s deployment of Altmetric.com data also decontextualizes articles’ metrics. They have chosen only to show the summary view of the metrics, with a link out to the full Altmetric.com report:

Screen Shot 2014-12-05 at 10.11.47 AM.png

Compare that to what’s available on Nature.com, where the Metrics page showcases the Altmetric.com summary metrics plus Altmetric.com-sourced Context statements (“This article is in the 98th percentile compared to articles published in the same journal”), snippets of news articles and blog posts that mention the article, a graph of the growth in pageviews over time, and a map that points to where your work was shared internationally:

Screen Shot 2014-12-04 at 3.59.38 PM.png

More data and more context are very valuable to have when presenting metrics. So, we think this is a missed opportunity for the SciShare pilot.

SciShare isn’t interoperable with all altmetrics systems

Let’s assume that the SciShare experiment results in a boom in traffic to your article on Nature.com. What can you do with those pageview metrics?

Nature.com–like most publishers–doesn’t share their pageview metrics via API. That means you have to manually look up and copy and paste those numbers each time you want to record them. Not an insurmountable barrier to data reuse, but still–it’s a pain.

Compare that to PLOS. They freely share article view and download data via API, so you can easily import those numbers to your profile on Impactstory or PlumX, or export them to your lab website, or parse them into your CV, and so on. (Oh, the things you can do with open altmetrics data!)

You also cannot use the ReadCube or hashed URLs to embed the article full-text into your Impactstory profile or share it on ResearchGate, meaning that it’s as difficult as ever to share the publisher’s version of your paper in an automated fashion. It’s also unclear whether the “personal use” restriction on SciShare links means that researchers will be prohibited from saving links publicly on Delicious, posting them to their websites, and so on.

How to improve SciShare to benefit altmetrics

We want to reiterate that we think that SciShare’s great for our friends at Altmetric.com, due to their integration with ReadCube. And the greater visibility that their integration brings to altmetrics overall is important.

That said, there’s a lot that Nature can do to improve SciShare for altmetrics. The biggest and most obvious idea is to do away with SciShare altogether and simply make their entire catalogue Open Access. But it looks like Nature (discouragingly) is not ready to do this, and we’re realists. So, what can Nature do to improve matters?

  • Open up their pageview metrics via API to make it easier for researchers to reuse their impact metrics however they want
  • Release ReadCube resolution, referral traffic and annotation metrics via API, adding new metrics that can tell us more about how content is being shared and what readers have to say about articles
  • Add more context to the altmetrics data they display, so viewers have a better sense of what the numbers actually mean
  • Do away with hashed URLs and link shorteners, especially the latter which make it difficult to track all mentions of an article on social media

We’re hopeful that SciShare overall is an incremental step towards full OA for Nature. And we’ll be watching how the SciShare pilot changes over time, especially with respect to altmetrics.

Update: Digital Science reports that the ReadCube implementation has been tested to ensure compatibility with most screen readers.

Impact Challenge Day 4: Connect with other researchers on Mendeley.com

Next up for our Impact Challenge is Mendeley.

Are you surprised? While there was pushback against Mendeley after it was unexpectedly bought by Elsevier a few years ago, and it is marketed more as a reference manager than a social network, Mendeley remains popular with many academics and librarians. It offers ways to connect with other researchers that you can’t find on other platforms.

Mendeley Web (the online counterpart to the desktop reference management software) is similar to Google Scholar in several ways. What’s distinctive about Mendeley is that it offers better opportunities to interact with other researchers and get your research in front of communities that might be interested in it, in a context where they’re largely interacting with scholarship they intend to actually read and cite.

Moreover, Mendeley’s Readership Statistics can tell you a lot about the demographics that have bookmarked your work–an important indicator of who’s reading your work and who might cite it in the future.

We’re also going to talk in this post about Zotero, which is quite similar to Mendeley. We’re big supporters of Zotero because it’s an open-source non-profit, and we see that as a killer feature for science tools. However, although it really shines as a reference manager, Zotero’s community features are less powerful–mostly because they have less activity. So we’ll provide links and information on how to do some of these steps in Zotero, but not in as much detail.

Step 1: Create a profile

Logon to Mendeley.com and click the “Create a free profile” button. Create a login and, on the next screen, enter your general field of study and your academic status (student, professor, postdoc, etc).

As you advance to the next screen, beware: Mendeley Desktop will automatically start downloading to your computer. (You’ll need it to make the next step a bit easier on yourself, but you can also make do without it. Your call.) Download it and install it if you plan to use it for the next step–importing your publications.

Zotero alternative: Logon to Zotero.org, click “Register” in the upper right-hand corner, and register for an account. Once you’ve validated your new account, click your username in the upper right-hand corner (where it says, “Welcome, username!”) and then click on the “Edit Profile” link on the next screen to head to the Profile section of your Zotero settings. There, you can create a profile.

Step 2: Import your publications

If you didn’t install Mendeley Desktop, here’s how to add your references manually using Mendeley Web:

  • Click the “My Library” tab, then the “Add Document” icon.

  • On the “Add New Document” dialog box that appears, select “My Publications” from the “Add to” drop-down menu, then use the “Type” drop-down menu to specify what type of document you’re adding to your “My Publications” list (article, book section, thesis, etc).

  • The dialog box will automatically expand, giving you many fields to fill out with descriptive information for that publication. Complete as many as possible, so others can find your publication more easily. If an Open Access link to the full-text of your publication exists, provide it in the URL box. And be sure to add a DOI, if you’ve got one. Click “Save” when finished.

  • Rinse and repeat as necessary, until all your articles are added to your profile.

If you’ve got Mendeley Desktop installed, your job is much easier. Export your publications in .bib format from Google Scholar (which we covered in yesterday’s challenge), and then:

  • Fire up Desktop and select “My Publications” from the “My Library” panel in the upper left corner of the screen.

  • Click File > Import > BibTeX (.bib) on the main menu.

  • On your computer, find the citations.bib file you exported from Google Scholar, select it, and click “Open”. Mendeley will begin to import these publications automagically.

  • In the dialog box that appears, confirm that you are the author of the documents that you’re importing, and that you have the rights to share them on Mendeley. Click “I agree.”

  • Click the “Sync” button at the top of the Desktop screen to Sync your local Mendeley library with your Mendeley Web library.

That’s it! You’ve just added all your publications to your Mendeley profile. And you know how to add any missing publications that didn’t auto-import, to boot.

Here’s what your profile page will look like, now that you’ve added publications to your My Publications library:

Screen Shot 2014-11-04 at 9.12.32 PM.png

Zotero alternative: to auto-import your publications from a BiBTeX file, follow these instructions. To manually add publications, follow these instructions.

Step 3: Follow other researchers

Now you’re ready to connect with other researchers. Consider this step akin to introducing yourself at a conference over coffee: informal, done in passing, and allowing others to put a face to a name.

First, you’ll need to find others to follow. Search for colleagues or well-known researchers in your field by name from the Mendeley search bar in the upper right-hand screen of Mendeley Web:

Screen Shot 2014-11-04 at 9.17.39 PM.png

Be sure to select “People” from the drop-down menu, so you search for profiles and not for papers that they’ve authored.

When you find their profile, click on their name in the search results, and then click the “Follow” button on the right-hand side of the profile:

Screen Shot 2014-11-04 at 9.20.56 PM.png

That’s it! Now you’ll receive updates on your Mendeley homepage when they’ve added a new publication to their profile or done something else on the site, like join a group.

Zotero alternative: Zotero works in a very similar way. Search for your colleague, find their profile, and click the red “Follow” button at the top-right of their profile to begin following them.

Step 4: Join groups relevant to your research

If Step 3 was like introducing yourself during a conference coffee break, Step 4 is like joining a “Birds of a Feather” group over lunch, to talk about common interests and get to know each other a bit better.

Screen Shot 2014-11-04 at 9.27.13 PM.png

Mendeley groups are places where researchers interested in a common topic can virtually congregate to post comments and share papers. It’s a good place to find researchers in your field who might be interested in your publications. And it’s also the single best place on the platform to learn about work that’s recently been published and is being talked about in your discipline.

To find a group, search for a subject using the search toolbar you used for Step 3, making sure to select “Groups” from the drop-down menu. Look through the search results and click through to group pages to determine if the group is still active (some groups were abandoned long-ago).

If so, join it! And then sit back and enjoy all the new knowledge that your fellow group members will drop on you in the coming days, which you can view from either the group page or your Mendeley homescreen.

And you can feel free to drop some knowledge on them, too. Share your articles, if relevant to the group’s scope. Pose questions and answer others’ questions. Openly solicit collaborators if you’ve got an interesting project in the pot that you need help on, like Abbas here has:

Screen Shot 2014-11-04 at 9.43.54 PM.png

Use groups like you would any other professional networking opportunity: as a place to forge new connections with researchers you might not have a chance to meet otherwise.

Zotero alternative: Zotero works in a very similar way. Search for a group topic, find a group you want to join, and click the red “Join Group” button at the top of the page.

Step 5: Learn who’s bookmarking your work

Once your work is on Mendeley, you can learn some basic information about who’s saving it in their libraries via Mendeley’s Readership Statistics. And that’s interesting to know because Mendeley bookmarks are a leading indicator for later citations.

To see the readership demographics for your publications, head to the article’s page on Mendeley. On the right side of the screen, you’ll see a small Readership Statistics panel:

Screen Shot 2014-11-04 at 9.58.08 PM.png

Readership Statistics can tell you how many readers you have on Mendeley (meaning, how many people have bookmarked your publication), what discipline they belong to, their academic status, and their country. Very basic information, to be sure, but it’s definitely more than you’d know about your readers if you were looking at the number of readers alone.

Zotero alternative: Zotero doesn’t yet offer readership statistics or any other altmetrics for publications on their site, but they will soon. Stay tuned!

Limitations

Perhaps the biggest limitation to Mendeley is their association with Elsevier. Mendeley was acquired by the publishing behemoth in early 2013, while the ghastly, Elsevier-backed Research Works Act fail was still fresh in many academics’ minds.

As danah boyd points out, even after Elsevier dropped support for the RWA and the “#mendelete” fracas ended, Elsevier was (and is) still doing a lot that’s not researcher-friendly. And yet, some of us continue to eat at McDonald’s knowing what goes into their chicken nuggets. Like any big organization, Elsevier does some stuff right and some stuff wrong, and it’s up to researchers to decide how it all balances out; there’s lots of room for reasonable folks to disagree. For what it’s worth: at Impactstory, one of us is a Zotero early adopter and code-contributor, one of us has switched from Mendeley to Zotero, and one of us uses both 🙂

Drawbacks to the platform itself? You can’t easily extract readership information for your publications unless you use Mendeley’s open API (too high a barrier for many of us to pass). So, you’ll need to cut-and-paste that information into your website, CV, or annual review, just as you would when using Google Scholar. (It’s relatively easy to extract readership numbers using third-party services like Impactstory, on the other hand. More on that in the days to come.)

A final drawback: if you want to add new publications, you’ll have to do it yourself. Mendeley doesn’t auto-add new publications to your profile like Google Scholar or other platforms can.

Homework

First, complete your profile by manually adding any works that the BibTeX import from Google Scholar didn’t catch.

Next, build your network by following at least five other researchers in your field, and joining at least two groups. On each of the groups you’ve joined, share at least one publication, whether it’s one you’ve authored or one written by someone else. Remember, make sure they’re relevant to the group, or else you’ll be pegged as a spammer.

Over the next few days, log onto Mendeley Web (or Zotero Web) at least one more time, and become acquainted with your homescreen timeline to stay abreast of new research that’s been added to groups or your colleagues’ profiles.

Finally, learn how to export your publications–and the rest of your library–from Mendeley, so you don’t have to reinvent the wheel attempting to set up a profile for your publications on another platform. Here’s how to get your library out of Mendeley in BibTeX format:

  • In Mendeley Desktop, select all publications you want to export.

  • From the main menu, click File > Export.

  • In the dialog box that appears, choose BibTeX from the drop-down menu, rename your bibliography if you want, and choose a safe place to store the .bib file. Click “Save” and you’re done!

Are you hangin’ in there?

You’ve now completed your Day 4 challenge, meaning you’re over halfway finished with Week 1, and over 10% finished with the entire month. That’s some free math, from us to you 🙂

X7GsMZcVH1erS.gif

Tomorrow, we’ll master LinkedIn. Get ready!