It’s time to insist on #openinfrastructure for #openscience

It’s time.  In the last month there’ve been three events that suggest now is the time to start insisting on open infrastructure for open science:

The first event was the publication of two separate recommendations/plans on open science, a report by the National Academies in the US, and Plan S by the EU on open access.  Notably, although comprehensive and bold in many other regards, neither report/plan called for open infrastructure to underpin the proposed open science initiatives.

Peter Suber put it well in his comments on Plan S:

the plan promises support for OA infrastructure, which is good. But it never commits to open infrastructure, that is, platforms running on open-source software, under open standards, with open APIs for interoperability, preferably owned or hosted by non-profit organizations. This omission invites the fate that befell bepress and SSRN, but this time for all European research.

The second event was the launch of Google’s Dataset Search — without an API.

Why do we care?  Because of opportunity cost.  Google Scholar doesn’t have an API, and Google has said it never will.  That means that no one has been able to integrate Google Scholar results into their workflows or products.  This has had a huge opportunity cost for scholarship.  It’s hard to measure, of course, opportunity costs always are, but we can get a sense of it: within 2 years of the Unpaywall launch (a product which does a subset of the same task but with an open api and open bulk data dump), the Unpaywall data has been built in to 2000 library workflows, the three primary A&I indexes, competing commercial OA discovery services, many reports, apps of countless startups, and more integrations in the works.  All of that value-add was waiting for a solution that others could build on.

If we relax and consider the Dataset Search problem solved now that Google has it working, we’re forgoing these same integration possibilities for dataset search that we lost out on for so long with OA discovery.  We need to build open infrastructure: the open APIs and open source solutions that Peter Suber talks about above.

As Peter Kraker put it on Twitter the other day: #dontLeaveItToGoogle.

The third event was of a different sort: a gathering of 58 nonprofit projects working toward Open Science.  It was the first time we’ve gathered together explicitly like that, and the air of change was palatable.

It’s exciting.  We’re doing this.  We’re passionate about providing tools for the open science workflow that embody open infrastructure.

If you are a nonprofit but you weren’t at JROST last month, join in!  It’s just getting going.

 

So.  #openinfrastructure for #openscience.  Everybody in scholarly communication: start talking about it, requesting it, dreaming it, planning it, building it, requiring it, funding it.  It’s not too big a step.  We can do it.  It’s time.

 

ps More great reading on what open infrastructure means from Bilder, Lin, and Neylon (2015) here and from Hindawi here.

pps #openinfrastructure is too long and hard to spell for a rallying cry.  #openinfra??  help 🙂

Reposted from Heather’s personal Research Remix blog.

Impactstory is hiring a full-time developer

We’re looking for a great software developer!  Help us spread the word!  Thanks 🙂

 

ABOUT US

We’re building tools to bring about an open science revolution.  

Impactstory began life as a hackathon project. As the hackathon ended, a few of us migrated into the hotel hallway to continue working, completing the prototype as the hotel started waking up for breakfast. Months of spare-time development followed, then funding. That was five years ago — we’ve got the same excitement for Impactstory today.

We’ve also got great momentum.  The scientific journal Nature recently profiled our main product:  “Unpaywall has become indispensable to many academics, and tie-ins with established scientific search engines could broaden its reach.”  We’re making solid revenue, and it’s time to expand our team.

We’re passionate about open science, and we run our non-profit company openly too.  All of our code is open source, we make our data as open as possible, and we post our grant proposals so that everyone can see both our successful and our unsuccessful ones.  We try to be the change we want to see 🙂

ABOUT THE POSITION

The position is lead dev for Unpaywall, our index of all the free-to-read scholarly papers in the world. Because Unpaywall is surfacing millions of formerly inaccessible open-access scientific papers, it’s growing very quickly, both in terms of usage and revenue. We think it’s a really transformative piece of infrastructure that will enable entire new classes of tools to improve science communication. As a nonprofit, that’s our aim.

We’re looking for someone to take the lead on the tech parts of Unpaywall.  You should know Python and SQL (we use PostgreSQL) and have 5+ years of experience programming, including managing a production software system.  But more importantly, we’re looking for someone who is smart, dedicated, and gets things done! As an early team member you will play a key role in the company as we grow.

The position is remote, with flexible working hours, and plenty of vacation time.  We are a small team so tell us what benefits are important to you and we’ll make them happen.

OUR TEAM

We’re at about a million dollars of revenue (grants and earned income) with just two employees: the two co-founders.  We value kindness, honesty, grit, and smarts. We’re taking our time on this hire, holding out for just the right person.

HOW TO APPLY

Sound like you? Email to team@impactstory.org with (1) what appeals to you about this specific job (this part is important to us), (2) a brief summary of your experience with directly maintaining and enhancing a production system (3) a copy of your resume or linkedin profile and (4) a link to your github profile. Thanks!

 

Edited Sept 25, 2018 to add minimum experience and more details on how to apply.

Elsevier becomes newest customer of Unpaywall Data Feed

We’re pleased to announce that Elsevier has become the newest customer of Impactstory’s Unpaywall Data Feed, which provides a weekly feed of changes in Unpaywall, our open database of 20 million open access articles. Elsevier will use the Unpaywall database to make open access content easier to find on Scopus.

Elsevier joins Clarivate Analytics, Digital Science, Zotero, and many other organizations as paying subscribers to the Data Feed.  Paying subscribers provide sustainability for Unpaywall, and fund the many free ways to access Unpaywall data, including complete database snapshots as well as our open API, Simple Query Tool, and browser extension. We’re proud that thousands of academic libraries and other institutions, as well as over 150,000 individual extension users, are using these free tools.

Impactstory’s mission is to help all people access all research products. Adding Elsevier as a Data Feed customer helps us further that mission. Specifically, the new agreement injects OA from our index into the workflows of the many Scopus users worldwide, helping them find and use open research they may never have seen before. So, we’re happy to welcome Elsevier as our latest Data Feed customer.

How do we know Unpaywall won’t be acquired?

Reposted with minor editing from a response Jason gave on the Global Open Access mailing list, July 12 2018.

We’re often asked: How do we know Unpaywall won’t be acquired?  What makes Unpaywall (and the company behind it, Impactstory) different than Bepress, SSRN, Mendeley, Publons, Kopernio, etc?

How can we be sure you won’t be bought by someone whose values don’t align with open science?

There are no credible guarantees I can offer that this won’t happen, and nor can any other organization. However, I think stability in the values and governance of Impactstory is a relatively safe bet.  Here’s why (note: I’m not a lawyer and the below isn’t legal advice, obvs):

We’re incorporated as a 501(c)3 nonprofit. This was not true of recently-acquired open science platforms like Mendeley, SSRN, and Bepress, which were all for-profits. We think that’s fine…the world needs for-profits. But we sure weren’t surprised when any of them were acquired. These are for-profit companies, which means they are, er:

For: Profit.  

Legally, their purpose is profit. They may benefit the world in many additional ways,  but their officers and board have a fiduciary duty to deliver a return to investors.

Our officers and board, on the other hand, have a legal fiduciary duty to fulfill our nonprofit mission, even where this doesn’t make much money. I think instead of “nonprofit” it should be called for-mission. Mission is the goal. That can be a big difference.  Jefferson Pooley did a great job articulating the value of the nonprofit structure for scholcomm organizations in more detail in a much-discussed LSE Impact post last year.

All that said, I’m not going to sit here and tell you nonprofits can’t be acquired…cos although that may be technically true, nonprofits can still be, in all-but-name, acquired. It’s just less common and harder.

So we like to also emphasize that the source code for these projects we are doing is open. That means that for any given project, its main asset–the code that makes our project work–is available for free to anyone who wants it. This makes us much less of an acquisition target. Why buy the cow when the code is free, as it were.

As a 501(c)3 nonprofit, we have a board of directors that helps keep us accountable and helps provide leadership to the organization as well. Past board members have included  Cameron Neylon and John Wilbanks, with a current board of me, Heather, Ethan White and Heather Joseph.  Heather, Ethan, John, and Cameron have each contributed mightily to the Open cause, in ways that would take me much longer than I have to fully chronicle (and most of you probably know anyway). We’re incredibly proud to have (and have had) them tirelessly working to help Impactstory stay on the right course. We think they are people that can be trusted.

Finally, and y’all can make up your own minds about this, I like to think our team has built up some credibility in the space. Me and Heather have both been working entirely on open-source, open science projects for the last ten years, and most of that work’s pretty easy to find if you want to check it out. In that time, it’s safe to assume we’ve turned down some better-paying projects that aligned less closely with the open science mission.

So, being acquired?  Not in our future.  But growth sure is, through grants and partnerships and customer relationships and lots of hard work… all in the service of making scholcomm more open.  Stay tuned 🙂

New partnership with Clarivate to help oaDOI find even more Open Access

We’re excited to announce a new partnership with Clarivate Analytics! 

This partnership between Impactstory and Clarivate will help fund better coverage of Open Access in the oaDOI database. The  improvements will grow our index of free-to-read fulltext copies, bringing the total number to more than 18 million, along with 86 million article records altogether. All this data will continue to be freely accessible to everyone via our open API.

The partnership with Clarivate Analytics will put oaDOI data in front of users at thousands of new institutions, by integrating our index into the popular Web of Science system.  The oaDOI API is already in use by more than 700 libraries via SFX, and delivers more than 500,000 fulltext articles to users worldwide every day.  It also powers the free Unpaywall browser extension, used by over seventy thousand people in 145 countries.  

You can read more about the partnership in Clarivate’s press release.  We’ll be sharing more details about improvements in the coming months.  Exciting!

Introducing Unpaywall: unlock paywalled research papers as you browse

Last Friday night we tweeted about a new Chrome extension we’ve been working on. It’s called Unpaywall, and it links you to free fulltext as you browse research articles. Hit a paywall? No problem: click the green tab and read it free.

Unpaywall is powered by an index of over ten million legally-uploaded, open-access resources, and it delivers. For example, in a set of 11k recent cancer research articles covered in mainstream media, Unpaywall users were able to read around half of them for free–even without any subscription, and even though most of them were paywalled.

So far the response to Friday’s tweet has been amazing — 500 retweets, and in just a few days we’ve gotten more than 1500 installations: Hockey stick growth!  🙂

 

And we’ve also gotten rave reviews, like this one from Sarah:

Why the excitement?  Finding free, legal, open access is now super easy — it happens automatically.  With the Unpaywall extension, links to open access are automatically available as you browse.

This is useful for researchers like Ethan.  It’s also really helpful for people outside academia, who don’t enjoy the expensive subscription benefits of institutional libraries. This is especially true for nonprofits:

…. and folks working to communicate scholarship to a broader audience:

Go give it a try and see what you think! The official release is April 4th, but you can already  install it, learn more, and follow @unpaywall. We’d love your help to spread the word about Unpaywall to your friends and colleagues. Together we can accelerate toward to a future of full #openaccess for all!

 

 

 

behind the scenes: cleaning dirty data

Dirty Data.  It’s everywhere!  And that’s expected and ok and even frankly good imho — it happens when people are doing complicated things, in the real world, with lots of edge cases, and moving fast.  Perfect is the enemy of good.

Thanks http://www.navigo.com.au/2015/05/cleaning-out-the-closet-how-to-deal-with-dirty-data/ for the image

Alas it’s definitely behind-the-scenes work to find and fix dirty data problems, which means none of us learn from each other in the process.  So — here’s a quick post about a dirty data issue we recently dealt with 🙂  Hopefully it’ll help you feel comradery, and maybe help some people using the BASE data.

We traced some oaDOI bugs to dirty records from PMC in the BASE open access aggregation database.

Most PMC records in BASE are really helpful — they include the title, author, and link to the full text resource in PMC.  For example, this record lists valid PMC and PubMed urls:

and this one lists the PMC and DOI urls:

The vast majority of PMC records in BASE look like this.  So until last week, to find PMC article links for oaDOI we looked up article titles in BASE and used the URL listed there to point to the free resource.

But!  We learned!  There is sometimes a bug!  This record has a broken PMC url — it lists http://www.ncbi.nlm.nih.gov/pmc/articles/PMC with no PMC id in it (see, look at the URL — there’s nothing about it that points to a specific article, right?).  To get the PMC link you’d have to follow the Pubmed link and then click to PMC from there.  (which does exist — here’s the PMC page which we wish the BASE record had pointed to).

That’s some dirty data.  And it gets worse.  Sometimes there is no pubmed link at all, like this one (correct PMC link exists):

and sometimes there is no valid URL, so there’s really no way to get there from here:

(pretty cool PMC lists this article from 1899, eh?.  Edge cases for papers published more than 100 years ago seems fair, I’ve gotta admit 🙂 )

Anyway.  We found this dirty PMC data in base is infrequent but common enough to cause more bugs than we’re comfortable with.  To work around the dirty data we’ve added a step — oaDOI now uses the the DOI->PMCID lookup file offered by PMC to find PMC articles we might otherwise miss.  Adds a bit more complexity, but worth it in this case.

 

 

So, that’s This Week In Dirty Data from oaDOI!  🙂  Tune in next week for, um, something else 🙂

And don’t forget Open Data Day is Saturday March 4, 2017.   Perfect is the enemy of the good — make it open.

oaDOI integrated into the SFX link resolver

We’re thrilled to announce that oaDOI is now available for integration with the SFX link resolver. SFX, like other OpenURL link resolvers, makes sure that when library users click a link to a scholarly article, they are directed to a copy the library subscribes to, so they can read it.

But of course, sometimes the library doesn’t subscribe. This is where oaDOI comes to the rescue. We check our database of over 80 million articles to see if there’s a Green Open Access version of that article somewhere. If we find one, the user gets directed there so they can read. Adding oaDOI to SFX is like adding ten million open-access articles to a library’s holdings, and it results in a lot more happy users, and a lot more readers finding full text instead of paywalls. Which is kind of our thing.

The best part is, it’s super easy set up, and of course completely free. Since SFX is used today by over 2000 institutions, we’re really excited about how big a difference this can make.

Edited march 28, 2017. There are now over 600 libraries worldwide using the oaDOI integration, and we’re handling over a million requests for fulltext every day.

 

Introducing oaDOI: resolve a DOI straight to OA

Most papers that are free-to-read are available thanks to “green OA” copies posted in institutional or subject repositories.  The fact these copies are available for free is fantastic because anyone can read the research, but it does present a major challenge: given the DOI of a paper, how can we find the open version, given there are so many different repositories?screen-shot-2016-10-25-at-9-07-11-am

The obvious answer is “Google Scholar” 🙂  And yup, that works great, and given the resources of Google will probably always be the most comprehensive solution.  But Google’s interface requires an extra search step, and its data isn’t open for others to build tools on top of.

We made a thing to fix that.  Introducing oaDOI:

We look for open copies of articles using the following data sources:

  • The Directory of Open Access Journals to see if it’s in their index of OA journals.
  • CrossRef’s license metadata field, to see if the publisher has reported an open license.
  • Our own custom list DOI prefixes, to see if it’s in a known preprint repository.
  • DataCite, to see if it’s an open dataset.
  • The wonderful BASE OA search engine to see if there’s a Green OA copy of the article. BASE indexes 90mil+ open documents in 4000+ repositories by harvesting OAI-PMH metadata.
  • Repository pages directly, in cases where BASE was unable to determine openness.
  • Journal article pages directly, to see if there’s a free PDF link (this is great for detecting hybrid OA)

oaDOI was inspired by the really cool DOAI.  oaDOI is a wrapper around the OA detection used by Impactstory. It’s open source of course, can be used as a lookup engine in Zotero, and has an easy and powerful API that returns license data and other good stuff.

Check it out at oadoi.org, let us know what you think (@oadoi_org), and help us spread the word!

Data-driven decisions with Net Promoter Score


Today we’re releasing some changes in the way users sign up for Impactstory profiles, based on research we’ve done to learn more about our users. It’s a great opportunity to share a little about what we learned, and to describe the process we used to do this research–both to add some transparency around our decision making, and to maybe help folks looking to do the same sorts of things. There’s lots to share, so let’s get to it:

Meet the Net Promoter Score

As part of our journey to find product-market fit for the Impactstory webapp, we’ve become big fans of the Net Proscreen-shot-2016-09-15-at-7-26-10-pmmoter Score (NPS), an increasingly popular way to assess how much value users are getting from one’s product. It’s appealingly simple: we ask users to rank how likely they’d be to recommend Impactstory to a colleague, on a scale of 1-10, and why. Answers of 9-10 are Promoters, from 1-6 are Detractors. You subtract %detractors from %supporters and there’s your score.

It’s a useful score. It doesn’t measure how much users like you. It doesn’t measure how much they generally support the idea of what you’re doing. It measures how much you are solving real problems for real users, right now. Solving those problems so well that users will put their own reputation on the line and sing your praises to their friends.

Until we’re doing that, we don’t have product-market fit, we aren’t truly making something people want, and we don’t have a sustainable business. Same as any startup.

As a nonprofit, we’ve got lots of people who support what we’re doing and (correctly!) see that we’re solving a huge problem for academia as a whole. So they’ve got lots of good things to say to us. Which: yay. That’s fuel and we love it. But it can disguise the fact that we may not be solving their personal problems. We need to get at that signal, to help us find that all-important product-market fit.

Getting the data

We used Promoter.io to manage creating, sending, and collecting emails surveys. It just works and it saved us a ton of time. We recommend it.  Our response rate was 28%, which is we figure pretty good for asking help via email from people who don’t know you or owe you anything, and without pestering them with any reminders. We sliced and diced users along many dimensions but they all had about the same response rate, which improves robustness of the findings. Since we assume users who have no altmetrics will hate the app, we only sent surveys to users with relatively complete profiles (at least three Impactstory badges).

Once we had responses, we followed up using Intercom, an app that nicely integrates most of our customer communication (feedback, support, etc). We got lots more qualitative feedback this way.

Once we had all our data, we exported the results into a spreadsheet and had us some Pivot Table Fun Time. Here’s the raw data in Google Docs (with identifying attributes removed to protect privacy) in case you’d like to dive into the data yourself.

Finally, we exported loads of user data from our Postgres app database hosted on Heroku. All that got added into the spreadsheet and pivot tables as well.

Here’s what we found

The overall NPS is 26, which is not amazing. But it is good. And encouragingly, it’s much better than we got when we surveyed users about our old, non-free version in March. Getting better is a good sign. We’ll take it.

Users who have made profiles in both versions (new and old) seem to agree. The overall NPS for these users was 58, which is quite strong. In fact, users of the old version were the group with the highest NPS overall in this survey. Since we made a lot of changes in the new app from the old, this wouldn’t have to have been true. It made us happy.

But we wanted more actionable results. So we sliced and diced everyone into subgroups along several dimensions, looking for features that can predict extra-high NPS in future sign-ups.

We found four of these predictive features. As it happens, each predictor changes the NPS of its group by the same amount: your NPS (on average) goes from 15 (ok) to 35 (good) if you

  1. have a Twitter account,
  2. have more than 20 online mentions of some kind (Tweets, Wikipedia, Pinterest, whatever) pointing to your publications,
  3. have made more than 55% of your publications green or gold open access, or
  4. have been awarded more than 6 Impactstory badges.

Of these, (4) is not super useful since it covaries a lot with numbers of mentions (2) and OA percentage (3); after all, we give out badges for both those things. A bit more surprisingly, users who have Twitter are likely to have more mentions per product, and less likely to have blank profiles, meaning Feature 1 accounts for some of the variance in Feature 2. So simply having a Twitter account is one of our best signals that you’ll love Impactstory.

Surprisingly, having a well-stocked ORCID profile with lots of your works in it doesn’t seem to predict a higher NPS score at all. This was unexpected because we figured the kind of scholcomm enthusiasts who keep their ORCID records scrupulously up-to-date would be more likely to dig the kind of thing we’re doing with Impactstory. Plus they’d have an easier and faster time setting up a profile since their data is super easy for us to import. Good to have the data.

About 60% of response included qualitative feedback. Analysing these, we found three themes:

  • It should include citations. Makes sense users would want this, given that citations are the currency of academia and all. Alas they ain’t gonna get it, not till someone comes out with a open and complete citation database. Our challenge is to help users be less bummed about this, hopefully be positioning Impactstory as a complement to indexes like Google Scholar rather than a competitor.
  • It’s pretty. That’s good to hear, especially since we want folks to share their profiles, make them part of their online identity. That’s way easier if you think it looks sharp.
  • It’s easy. Also great to hear, because the last version was not very easy, mostly as a result of feature bloat. It hurt to lose some features on this version, so it’s good to see the payoff was there.
  • It puts everything all in one place.  Presumably users were going to multiple places to gather all the altmetrics data that Impactstory puts in one spot. 

Here’s what we did

The most powerful takeway from all this was that users who have Twitter get more out of Impactstory and like it more. And that makes sense…scholars with Twitter are more likely be into this whole social media thing, and (in our experience talking with lots of researchers) more ready to believe altmetrics could be a useful tool.

So, we’ll redouble our focus on these users.

The way we’re doing that concretely right away is by changing the signup wizard to start with a “signup with Twitter” button. That’s a big deal because it means you’ll need a Twitter account to sign up, and therefore excludes some potential users. That’s a bummer.

But it’s excluding users who, statistically, are among the least likely to love the app. And it’s making it easier to sign up for the users that are loving Impactstory the most, and most keen to recommend us. That means better word of mouth, a better viral coefficient, and a chance to test a promising hypothesis for achieving product-market fit.

We’re also going to be looking at adding more Twitter-specific features like analysing users’ tweeted content and follower lists. More on that later.

To take advantage of our open-access predictor, we’ll be working hard to reach out to the open access community…we’re already having great informal talks with folks at SPARC and with the OA Button, and are reaching out in other ways as well. More on that later, too.

We’re excited about this approach to user-driven development. It’s something we’ve always valued, but often had a tough time implementing because it has seemed a bit daunting. And honestly, it is a bit daunting. It took a ton of time, and it takes a surprising amount of mental energy to be open-minded in a way that makes the feedback actionable. But overall we’re really pleased with the process, and we’re going to be doing it more, along with these kinds of blog posts to improve the transparency decision-making. Looking forward to hearing your thoughts!