Good science depends on good peer review

I argue that the quality of both scientific research and how it is communicated is maintained and improved through a process analogous to Darwinian evolution. Maintaining status quo or achieving scientific advances are potentially threatened by ‘costs’, including costs in the effort required to maintain current or attain higher scientific quality, and financial costs of conducting high-level research. I describe through analogy with Darwinian evolution how, without peer-review and editorial oversight, scientific quality is expected to decrease on average in the long run. Several mechanisms are presented which, taken together, can contribute to limiting or counteracting this effect—some of the most promising being reviewer rewards, journal peerage, and education. I conclude that the scientific community needs to be proactive in promoting peer review and the reviewer commons, and ultimately scientific quality, because the erosion effect may be gradual and barely noticeable in the short-term, but have substantial effects over the long-term.

I would like to develop the idea that science itself is subject to evolution (Hull 2001), and that peer review is one of the key processes maintaining and advancing scientific quality (Riisgård et al. 2001, Ware 2008, Ioannidis 2014; otherwise put, peer review extends well beyond the improvements made to individual manuscripts. Together with collaborators I have previously argued that high quality, external reviewers are increasingly at a premium, creating an effect analogous to the "tragedy of the commons" in social evolution theory ("the tragedy of the reviewer commons"; Hochberg et al. 2009, Hochberg 2010. The tragedy of the reviewer commons is the over-solicitation of individual reviewers, resulting from the cumulative effect of journals, independently from one another, seeking to assess manuscripts. This over-solicitation may result either in some reviewers reducing the number of reports they conduct (Petchey et al. 2014) and/or the quality of their reports. Over the short term the commons can be managed either through top-down regulation or rewards, or through increased cognizance in the scientific community of common interests between scientists-as-researchers and scientists-as-reviewers. To the extent that the quality and reliability of peer review is in danger, so too is the very mechanism that maintains and augments scientific quality.
Scientific quality can diminish on average at the community level due to at least two causes. First, in these times of fast and furious publishing, scientists as individuals put reduced individual effort into the planning, execution and reporting of their research. Not sufficiently questioning oneself and obtaining critiques from peers can result in a routine of substandard science relative to what a given scientist is capable. Second, at This work is licensed under a Creative Commons Attribution 3.0 License.
iee 7 (2014) 78 the level of the scientific community, the tragedy of the reviewer commons results in insufficient community critique on scientific quality and ultimately leads to the publication of lower quality work. It could be argued that obtaining a larger number of reports per manuscript can compensate individual insufficient reviews; however, this just accelerates the tragedy (Hochberg et al. 2009).
Thus, we have a looming problem: over the longterm, the gradual increase in substandard work based on the 'costs' of high standards, and a peer-review system that is designed to maintain such standards, but is threatened by overuse. The overall effect can be viewed as a phenomenon whereby short-term changes in standards are slight and do not elicit a pervasive selective reaction, and as a consequence scientific standards ratchet down over the long term to a new norm (Hochberg 2004).
The objective of this opinion piece is to promote consciousness of the central role and importance of good peer review in maintaining scientific quality, and by extension, promoting scientific progress. It is tempting to view published work as 'validated' for its quality, as ensured by a professionally operated journal. Unfortunately, there exists no independent form of journal quality control in manuscript assessment. By first understanding why peer review is important, we can then consider the challenge of conserving and promoting it as an institution.

An Evolutionary Analogy
The basic ingredient of evolution is heritable trait differences among individuals. When these trait differences are linked to differential reproduction based on some aspect of the environment, and variation in traits is sufficient, then evolution by selection can occur. When selection is reduced and costs come to dominate, traits will tend to evolve to less costly levels, whereas if both selection and costs are negligible, then traits will drift. Thus, any application of evolutionary thought should identify the mechanism of heritability, the nature of traits, and selection on (and costs associated with) traits.
Does evolution apply to changes in the quality of science through time? In my opinion: yes. To see this, first consider that science is communicated in units; that is, as studies published in journals, in books, or on the Internet, through blogs, podcasts, White papers, etc. These units typically include context of how the problem at hand is interesting, what part of the puzzle is missing, how the missing part was investigated, what the findings were, and the significance of the findings. In article presentation, we usually call these the Introduction, Methods, Results, and Discussion, respectively. Below, I present my argument in the context of publication in scientific journals.
For science to evolve via selection, we need traits, trait variability, hereditability, and the differential 'fitness' of alternative traits. Scientific traits are complex and despite some arbitrariness in their definition, ultimately, distinguishable features can be identified and measured. Traits may include hypothesis definition, experimental design, scholarliness, experimental methods, statistical methods, etc. For illustration, assume that the trait variants are the methodological alternatives that can be used to test a given hypothesis. Differences in trait variants from article to article translate into distributions of quantitative values over one or more measures (axes), within the population of published articles. Trait heritability is the employment of specific methods by readers (the vehicle of heritability) of the article in their own (to be executed and published) work. Thus, the method is both the heritable unit (similar to genetic material in biological evolution) and the expressed trait. Author-driven changes in existing methods are possible, and this is analogous to genetic mutation and recombination. Finally, higher relative fitness is certain methodologies resulting in more papers using those methodologies; this necessitates more activity per vehicle (present readers-future authors) and/or more vehicles reading and/or citing the methodology. Note importantly that higher relative fitness does not necessarily mean that the method is more 'valid' (i.e., scientifically sound) than less fit alternatives, nor that the method does not increase in frequency due to correlations with other traits (e.g., novelty) that are under direct selection.
A central goal of every scientist is to communicate her/his study to other interested scientists. But, who or what determines the 'validity' of a method and by logical extension, the study? Can some components be invalid and others valid? Are there many valid alternative methods? Given some level of subjectivity in determining validity, what keeps scientists 'scientific' at all? Does the selection process occur before the study is presented for communication (e.g., submitted to a journal), and/or during the communication, and/or once communicated?
So as not to get bogged down in the semantics and complexity of scientific traits, let's rather treat themfor the sake of illustration-as simple one locus heritable traits. Although a gross oversimplification, let's further assume that the trait has one of two alternatives: valid method or faulty method. How does a researcher decide which to employ? She may use some combination of (i) copying (past experience, published study, or advice from other scientists), (ii) discovery (analysis and ingenuity), and (iii) arbitrary decisions. We quickly see that if copying and discovery are not the principal influence on method, then the method will drift at the population level of scientists. Although in itself not ensuring that the valid method will always be adopted, a population of scientists using (i) and/or (ii) iee 7 (2014) 79 Figure 1. Flow diagram of scientific selection. In this highly simplified diagram, an author submits a manuscript to a journal where the editorial board is both responsible for selection of reviewers and synthesizing, emphasizing, and de-emphasizing their remarks in making their publication decision. The manuscript, if viewed by editors as being potentially acceptable for publication, goes through one or more iterative revisions. The finished article is part of a community of articles, which influence readers as scientists and future authors. Scientists are also evidently influenced by other components of their environment (colleagues, seminars) and individual learning, but only independent reviewers and oversight by editorial boards ultimately exert selection (via their direct influence on manuscript revision and publication success) resulting in the maintenance and improvement of science over time at the community level.
could result in selection of one method or the other. If information is refined in the population in a stepwise process based on discussion and consensus, then evolution to validity can occur.

Authors, editors, and reviewers
As alluded to above, it is hardly surprising that in the absence of insight and differential costs, one method or another would eventually come to prevail from drift alone (iii). More interesting are the mechanisms promoting the emergence of the valid method, its protection from less valid alternatives, and subsequent evolution and diversification of refinements, valid alternatives, and how this may enable addressing more challenging scientific questions and lead to new discoveries. The basic stepping stones in this process are the maintenance of existing valid methods and their occasional improvement, and this will happen if among variants there is some combination of positive selection on valid traits and negative selection against invalid ones.
To see how evolution may occur, consider three sources of selection: (1) authors, (2) editors, and (3) reviewers. The roles portrayed by these functions are in some ways analogous to a legal system, with the scientist's work being considered for publication by a journal (the court), based on the advice of reviewers (the jury) and the decision of the editor (the judges). Authors, editors and reviewers interact in complex ways in influencing the dynamics of scientific quality. Figure  1 shows a simplified portrayal of these interactions.
Authors. Independent of the journal, can a scientist ensure scientific status quo or even improvement? A qualified 'yes' for status quo, because a careful scientist will tend to select methods that are scientifically sound, but this process relies on individual learning and/or the existence of knowledge benchmarks in terms of quality education and quality publications. Quality improvement-a process akin to mutation or recombination-is a 'yes' because ameliorations often come from scientists themselves. Nevertheless and importantly, both maintenance and improvement rely on the correct identification of 'quality' by the scientist, and the decision to actually adopt perceived quality, even if it is more time consuming, expensive, etc. than alternatives of lower standard. Thus, some scientists may either employ or modify existing approaches to achieve cost effectiveness, possibly not realizing or not weighing the importance of a reduced scientific standard (e.g., viewing the method as 'valid' because of precedence). Moreover, authors may indeed cite past work of the highest scientific quality, but their own work may be, as described above, subject to costs and constraints. In sum, the author may or may not cut back on scientific quality and may or may not generate the seeds of scientific improvement, but as explained below, ultimately these variants are selected by the 'courts' and their independent advisors, that is, editorial boards and reviewers, respectively.

Editors.
The independent judge is typically the role of journal editors. Few journals, if any, have external checks on the objectivity of their publication decisions. Specifically, given that many papers may be, respectively assessed as 'scientifically valid', journals with strong selection (i.e., high rejection rates) may use many criteria, only one of which is quality beyond baseline 'validity'. Indeed, selection on scientific quality may be weak if, in publication decisions, journals heavily weight features such as perceived excitement or the communication quality of the manuscript. Rather, the main role of the editor is oversight: to put reviewers' remarks into perspective, both for revisions and the publication decision. As an aside, there is a subtle, and in my view, important effect associated with journal independence, and multiple, arguably subjective, criteria used in editorial publication decisions: a paper rejected from one journal may be published in another due to nothing more than differing opinions/criteria between journal editorial boards (and their reviewers). However, authors generally choose to submit their papers first to those journals that they view as the best for broadcasting their work (readership, reputation, impact). That readers tend to preferentially emulate and cite science published in the most esteemed journals creates a selective effect on scientific quality associated with those published articles, and in my view this means that (regardless of multiple criteria and subjective assessments) the top journals have the greatest responsibility for ensuring that the papers they do publish are of the highest scientific standards. This highlights the complex nature of how individual author-individual journal interactions percolate at the scientific community level and, in turn, impact science dynamics.

Reviewers.
The final mechanism presented here that could affect the evolution of scientific quality is the 'jury', that is, independent reviewers with respect to the authors and the journal. Peer reviewers serve two main functions. First, they provide an opinion to editors whether a manuscript should be accepted, returned to the authors for revision, or rejected. In and of itself, this does not substantially differ from the publication decision role of editors. Second, the reviewer provides a report for the authors. The reviewer's report is usually anonymous and comments on different aspects of the study. Thus, insofar as the reviewer is scientifically qualified (Thurner and Hanel 2011), her critiques will either not affect or through correction or suggestion will increase the scientific quality of the manuscript.

Independent assessment and quality control
My argument is that an independent jury-reviewers who have no vested interest in the scientific question, nor conflicts of interest with authors-is essential to the selection process on maintaining or improving scientific standards, and without them, science would ultimately suffer for the reasons explained above: lower scientific standards, based on costs and constraints, become the norm for some scientists, and at the community level this both increases variation in scientific quality and pushes the average level downward. Other scientists may perceive that this new norm is acceptable and emulate it, further contributing to stagnation or erosion of scientific quality.
Having an independent jury does not obviate conflicts of interest (positive or negative) and issues with reviewer quality (Rothwell et al. 2000), both of which can result in scientifically sound work being incorrectly assessed, or authors not being able to meet reviewer demands for (unnecessary) revisions. Moreover, reviewer opinions that go beyond essential assessments of scientific quality may result the rejection of highly novel work, or of results that call into question current theories or paradigms. This highlights the importance of quality control by journal editors. To achieve this, editorial boards need to frequently assess their protocol and oversight of the review process, with the objective of reducing both reviewer error and bias. All too often, publication decisions and revision requirements are based on subjective reviewer opinions of, for example, citations to previous work, themes treated in introduction or discussion sections, and perceived scientific novelty. Uncritically basing publication decisions and revision requirements on reviewers' reports results in reviewers effectively replacing editors as journal "gatekeepers" (Statzner and Resh 2010). Unfortunately, most journals have little or no means for reviewer quality control, because reviewers are in high demand, usually anonymous to authors and readers, and are rarely compensated for their time in conducting reviews.

Recommendations
Despite shortcomings and daunting challenges, peer review with editorial oversight appears to be the best means we currently have of achieving scientific quality control and evaluating improvement. What can be done to conserve, augment and improve this institution? This is a very active area of debate (e.g., http://www.nature.com/ nature/peerreview/debate/), and several mechanisms promise to change review and oversight: 1. Open peer review, online science, science only journals, one or more of which can be found at PeerJ (peerj.com), Biology Direct (biologydirect.com), arXiv (arxiv.org), PLoS ONE (plosone.org), bioRxiv (biorxiv.org), and F1000Research (f1000research.com). In open peer review, reviewer reports and identities are made public. Pros of open peer review include inciting a degree of reviewer responsibility (replacing to some extent the quality control role of journal editors) and fostering interactions between authors and reviewers (Byrnes et al. 2014). Moreover, open peer review tends to discourage unjustly positive or negative views and 'summary' reports. A disadvantage is that reviewers do not know with certainty how their signed review will be perceived by authors and readers, and may decline reviewing for fear that differing opinions, misunderstandings or unintentional mistakes will become associated with their names and, especially for younger scientists, affect their careers. As a result, some iee 7 (2014) 81 reviewers who would have correctly identified important scientific shortcomings do not conduct reports. In online science, the reviewers may either volunteer or be solicited by the journal. Reviewed manuscripts can then be subsequently updated or, should the authors choose, published elsewhere. A possible issue with online science is that-despite simplifying the publication process and often being free of charge-there is, at least for some of these journals, little or no editorial oversight. Science only journals aim to make methodological validity the main (and sometimes only) criterion for acceptance. The advantage of this approach is that reviewers focus on a single important criterion and in so doing are more effective in baseline selection.
2. Rewards and peerage. Rewards (e.g., Fox and Petchey 2010, Lortie 2011, Verissimo andRoberts 2013) and peerage (Axios http://axiosreview.org/; Peerage of Science http://www.peerageofscience.org/) both act to foster the reviewer commons. Whereas rewards incite reviewer responsibility for the quality of their report, peerage simplifies manuscript assessment, reducing pressure on the reviewer commons (Hochberg et al. 2009). Allesina (2012) has recently modelled the publication process, comparing scenarios in which either authors decide on journals, journals decide to review or reject initial submissions, or journals bid on manuscripts (a possible feature of Peerage). Specifically, he found that journal bidding led to more rapid publication, more publications per author, publication in better journals, and improved matches between journals and manuscripts.
3. Education. The transmission of good practice starts with mentors-be they supervisors, course professors or experienced scientists. Mentors provide priceless overviews into how to maintain scientific standards. This highlights the central role of mentors in transmitting the culture of science, be it conducting research, writing scientific articles, or reviewing the work of others (Donaldson et al. 2010, Hochberg 2010.

Conclusion
I would suggest that there is no single "magic bullet" and that each of the three mechanisms may contribute to maintaining quality status quo and promote progress in basic science, including methodological improvements and otherwise unforeseen scientific discoveries (Courchamp et al. in press). In my view, open peer review will take time to be accepted in the scientific community and have difficulty overcoming some of its shortcomings. I believe rather that mixed models of anonymous and signed reports is the most productive way forward, so that all potential reviewers can contribute within their own comfort zones. Regardless of how external review and author replies and revisions are conducted, some degree of dedicated, independent editorial oversight will be required to ensure quality and impartiality, which will be a challenge for online science. Finally, science only has the merit of focussing judgement on scientific quality, but to be effective needs to control the proliferation of least publishable units and n th scientific demonstrations of the same effect.
In sum, independent peer review overseen by independent journal editors appears necessary to maintain and improve scientific quality in the long run. Science is in many respects a culture, and cultures evolve (Danchin et al. 2011) and potentially erode (Hochberg 2004). Discovery may occur serendipitously, but its contribution to scientific progress ultimately depends on selection for quality. We need to proactively seek solutions to maintain and foster what generations of scientists have constructed, both through their own work and their improvement of the science of others.