Viral Science

The internet has been abuzz over the last few days about a preprint in PeerJ Preprints, “Gender bias in open source: Pull request acceptance of women vs. men.” Scott Alexander delivers a summary of popular-media responses to this, among other good discussion, none of which I’ll recapitulate here. I’m more interested in opening a window on how the peer review process works — one that you don’t really get from viral reporting on science. This is something I can shed at least a little light on, because coincidentally, two years ago I coauthored a paper, “More ties than we thought,” that went viral not once but twice — first the preprint on arXiv, then the peer-reviewed version in, also coincidentally, PeerJ.

Just to make things even more interesting, in the preprint version, our result was wrong.

Briefly, we revisited Thomas Fink and Yong Mao’s The 85 Ways to Tie a Tie in light of YouTube sartorialist Alex Krasny’s reverse-engineering of the “Eldredge” knot that the Merovingian wears in the movie The Matrix Reloaded. Fink and Mao use a formal grammar to describe their 85 possible tie knots, but they included an extra, implicit constraint — all their knots start from the wide end of the tie. We relaxed this constraint and came up with a formal grammar for the process of tying a tie knot from either end of the tie, which includes the Eldredge, the Trinity, and other modern knots. It also turns out to include thousands and thousands of really derpy-looking knots. This in itself is interesting, because that suggests that there’s more math to explore about what makes a knot look nifty — and about processes in general. Many processes produce results, but not all steps to an end produce an end you want, even for completely deterministic processes. The “generating functions” parts of our work are about exploring that process space and its outcomes, which has all kinds of interesting implications for machining, robotics, textiles (we’ve got some work in progress on braiding and knitting), and other areas of materials science. (Though mostly it is because Mikael Vejdemo-Johansson noticed there was something we didn’t have a mathematical understanding of, and now we have corrected that oversight.)

And in the preprint, we had a boneheaded off-by-one error.

68 venues reported on the preprint, not counting NPR affiliates separately. Nobody caught the error. Frankly, most of the preprint coverage was of the “Ha ha, look at the goofy things that university boffins get paid to study” variety, which means they either didn’t read closely enough to notice that two of the authors have no university affiliation or didn’t think it worth pointing out. Yes, our motivation was trivial, indeed silly, but that jives with a paper where half the authors worked on it in their Copious Free Time. It was “slow news day” coverage, but hey, there’s no such thing as bad publicity, right?

Anyway, we pushed on, because clearly we’d struck some kind of nerve and we figured it was worth publishing “for real” in academic terms, so we started shopping it around. Or, to be more specific, Mikael started shopping it around — credit where credit is due.

It got rejected a lot. After about a year of rejections, Mikael heard that PeerJ was starting up a computer science journal, we submitted it, and then one of the peer reviewers found our off-by-one error. After correcting this, we found that we had about ten times fewer ties than we’d claimed in the preprint, and we corrected our in-progress draft. There was some additional back-and-forth, in large part because we lucked into one of the intellectual powerhouses of generating functions as a reviewer and got feedback that helped us improve our analysis significantly. As an author I was really pleased with PeerJ’s reviewing process; it’s far closer to the “shepherding” process that some academic workshops use with accepted papers that could still use some work than the “three anonymous reviews and not much of a feedback loop” process that is typical of top-tier conferences. The final revision was accepted for the first issue of PeerJ CS last May, and the paper went viral again.

56 venues covered the peer-reviewed paper. There is exactly one element in common between the sets of venues that covered it online before and after peer review: the KTH press office. The Swedish newspaper Dagens Nyheter (“Daily News”) was the only paper to report both figures: the preprint version in the light-entertainment-and-silliness section, the updated figure in a print-only Father’s Day factoid bubble. Other than that, exactly zero of the outlets that reported the incorrect figure also reported on the peer-reviewed paper or the correction. I can understand their reasoning, because for most of these venues, the paper is a curiosity, and a curiosity about a curiosity doesn’t really rise to the level of reporting in their view.

The thing is, though, nobody makes serious decisions that affect other people based on what they know about tie knots. People do make serious decisions that affect other people based on what they know about gender bias — or, more importantly, what they think they know. How Vice, CNN, and other venues that are reporting on this preprint as if it’s already scientific consensus report will affect some of those decisions, and after looking at the data on my own experience of viral science reporting, “are they going to follow up when the paper is actually complete?” is a question readers deserve an answer to from reporters who are treating this paper like a genie that’s been let out of its bottle.

Because that’s the thing: a preprint is a completed draft, but it is not a completed journal article until it passes peer review. A preprint is for eliciting early feedback from a peer community, like getting beta readers for your fanfic. arXiv, where we published our preprint, has been the physics and mathematics communities’ preprint archive of choice for years, and in recent years computer scientists have started using it too; PeerJ is now providing this service for the domains in which it publishes. The media are not drawing this distinction.

The gender bias article isn’t a completed journal article until Emerson Murphy-Hill and the rest of his team finish discussing the preprint with those PeerJ reviewers who decide to provide feedback, do whatever editing, additional data gathering/processing, and re-analysis they need to do in light of that feedback, resubmit the paper for peer review (to PeerJ, or to some other venue — remember, they’re using PeerJ’s preprint service, which is basically the equivalent of arXiv except operated by PeerJ, but this doesn’t necessarily mean they even intend to submit it to PeerJ, or that PeerJ will give it any special consideration if they do), handle any feedback from peer reviewers, and submit a camera-ready version that the editors sign off on. Then it’s a matter of waiting until the issue their paper will be in comes out. This process takes a couple of months, minimum, which is forever in Internet time, so it’s likely that we’ll see another round of viral coverage when the peer-reviewed paper eventually comes out. How will that coverage compare with the preprint coverage?

To be clear: From what I’ve seen on PeerJ Preprints, I think that Murphy-Hill and colleagues are engaging with their scientific peers in the spirit of genuine inquiry and communication. One of those peers happens to be none other than Mikael Vejdemo-Johansson, who is a stickler for both rigor and good presentation. The peer review process is behaving like it is supposed to. I believe that a useful paper will come out of this process. Unfortunately, the popular press does not have a particularly good understanding of the peer review process, and is treating the data gathering and preliminary hypothesis testing that are the early steps of science as if they were already-established science. I am at a loss as to how to solve this problem in reporting.

6 Responses to Viral Science

sniffnoy says:

February 14, 2016 at 1:14 am

Note also that in some of the fields that use arXiv — in particular, mathematics — most of the things that go up on the arXiv (and that aren’t obviously crackpottery) are indeed right, so treating a preprint as probably correct, and not particularly distinguishing between preprints and published papers, is not necessarily so dangerous. But that’s a pretty bad habit to carry over to, say, sociology…

LikeLike

- Meredith L. Patterson says:
  
  February 14, 2016 at 1:17 am
  
  Yup, and that’s another distinction that’s easy to overlook unless one is familiar with not just the problem domain, but the culture that’s evolved around the problem domain.
  
  LikeLike
  
- sniffnoy says:
  
  February 14, 2016 at 1:24 am
  
  Er, slight correction, rather than “are indeed right”, I should say “don’t have a much higher error rate than published papers”.
  
  LikeLike
  
Jason Hoyt says:

February 14, 2016 at 12:58 pm

Hi Meredith. Thanks for writing up your thoughts on preprints! One thing, the preprint was published in “PeerJ Preprints” rather than “PeerJ” (which is only peer-reviewed). If you have a moment, perhaps you could update the name/s where appropriate. Unfortunately, the names are so close that this could also be a problem with the media in determining the peer-review status without careful attention.

LikeLike

- Meredith L. Patterson says:
  
  February 14, 2016 at 1:02 pm
  
  Hi Jason — very good point. I updated it in the first and last grafs; please let me know if there’s anywhere else the naming is unclear.
  
  LikeLike
  
Jason K. says:

February 19, 2016 at 2:21 am

We make a judgment error when we think that mass media generally cares about the accuracy of what they report. They only care to the degree that being inaccurate could blow back on them. Joe six-pack doesn’t understand what peer review means, so reporting a non-peer reviewed study has little potential for harm to the reporter. ‘The media just needs to learn what preprinting is’ is just wishful thinking. At best, about 20% of the population will pay enough attention to even check. The other 80% will just hear ‘study said x’.

There are two ways to stop this without legislative involvement, and both have to be done by the publisher. Either publishers of trial data have to find a way to hold distributors liable (and consistently do so), or you can’t allow unrestricted access to trial data. Expecting the media to stop of their own accord is like asking them to voluntarily take money out of their own pocket.

LikeLike

	Social, political, a… on Days of Rage
	GoAlpha: An Inspirat… on I See Trad People
	Asymmetries: “I can… on Splain it to Me
	No on Social Gentrification
	0568 – tripping thro… on Social Gentrification

Viral Science

About Meredith L. Patterson

6 Responses to Viral Science

Leave a comment Cancel reply

Subscribe to Blog via Email

Evergreen Posts

Recent Posts

Recent Comments

Archives

Blogroll

Recommended Reading

Viral Science

Share this:

About Meredith L. Patterson

6 Responses to Viral Science

Leave a comment Cancel reply

Subscribe to Blog via Email

Evergreen Posts

Recent Posts

Recent Comments

Archives

Blogroll