[Science] Sci-Hub’s cache of pirated papers is so big, subscription journals are doomed, data analyst suggests
This is an interesting news to read on Science and if you are an avid alternative researcher to search for those materials. Below is a chat with ScienceInsider with bio data scientist Daniel Himmelstein at the University of Pennsylvania. If you like to read the full article, you can proceed to this site.
Q: What made you want to look at the size of Sci-Hub’s coverage?
A: It all started when Sci-Hub tweeted the list of all the articles that they had stored in their repositories on March 19. I thought: “Wow, we can learn so much about their operations and coverage that we couldn’t before.” Most people knew that Sci-Hub provided access to some of the scholarly literature, but the question was how much.
Q: How did you approach this calculation?
A: The main step was figuring out how many scholarly articles existed. For that, we used data from Crossref, which has a database of journal identifiers or DOIs [digital object identifiers]. It’s not the only one, but it’s by far the most common one for scholarly publishing. After making some exclusions, we compiled a list of 81.6 million articles. This step was important because it gave us the denominator for the equation. Previous people who’ve looked at Sci-Hub coverage didn’t really get this step right—to see what percent of the literature Sci-Hub has, you need to know the total amount.
Q: What were the main findings of your study?
A: The most simple result was that Sci-Hub contains 69% of all scholarly articles. We also found that the site preferentially covers articles from closed-access publishers and high-impact journals. [Editor’s Note: A breakdown can be found here.] I think it’s interesting that Elsevier and the American Chemical Society had some of the highest coverage and those are the publishers that have sued Sci-Hub. Maybe they realized that basically, their entire corpus was in Sci-Hub. There were a lot of journals where Sci-Hub has every single article.
Q: What about the other 31%?
A: Just because an article isn’t in Sci-Hub’s database, that doesn’t mean it can’t get it for you. We estimated that Sci-Hub was able to fulfil requests 99% of the time—that suggests the 31% of articles that aren’t covered by Sci-Hub are things that people really aren’t requesting.
Q: Did you look at how coverage varied by academic discipline?
A: Yes. There was some variation between fields, but I think it’s probably less than people have speculated in the past. The top was chemistry with 93% coverage, and at the low end was computer science at 76%. The results could be linked to publishing practices in those fields—we found closed-access journals had more coverage than open access.
Q: Sci-Hub has faced a number of legal challenges—do you think these will stop it?
A: In our paper, we have a graph plotting the history of Sci-Hub against Google Trends—each legal challenge resulted in a spike in Google searches [for the site], which suggests the challenges are basically generating free advertising for Sci-Hub. I think the suits are not going to stop Sci-Hub.
Q: How do you think Sci-Hub will evolve in future?
A: In the paper, we mentioned that there are technologies coming that would allow you to host files without any central point of failure, so going forward Sci-Hub, or a service like it, could still provide access to all these papers, but there wouldn’t be any domain or one person behind it. Right now, if the servers for Sci-Hub were found they could be seized and destroyed.
Q: Do you really foresee a time when librarians would endorse Sci-Hub over paying for journal access?
A: I don’t think librarians would ever endorse it, given the legal issues of instructing someone to do something illegal. But in a way they already do. There are many libraries nowadays that can’t provide 100% access to the scholarly literature. Globally, it’s a pretty small percentage of universities that offer full access.
Q: Is there anything publishers could do to stop new papers being added to Sci-Hub’s repository?
A: There are things they could do but they can really backfire terribly. The issue is the more protective the publishers are, the more difficult they make legitimate access, and that could drive people to use Sci-Hub.
Q: What do you hope the impact of this study will be?
A: I think the larger picture of this study is that this is the beginning of the end for subscription scholarly publishing. I think it is at this point inevitable that the subscription model is going to fail and more open models will be necessitated. One motivation for doing the study is that I want to bring that eventuality into reality more quickly.
Why is Science Better with Communism? The Case of Sci-Hub
“The roadmap will serve as a guide for teachers to ensure students achieve proficiency levels aligned to international standards,” said ...
University of California, Davis and the California Digital Library has shared out a study of viable models for a “flip” from the subscri...