Fixing Science

May 13, 2018 by Alex Freeman

What is wrong in science and what Octopus can do to help

Those working in science today recognise a whole host of problems. Octopus aims to tackle many of these through changing the way that science is published.

I trained as a biologist, but then spent my career in the media. When I came back to academia in October 2016 I was really shocked to see how much the incentive structure in sciennce was driving researchers to become - essentially 'journalists'. Scientists and their work are judged almost exclusively on how many papers they can get published in journals and how many people read them, just like journalists. But this isn't the way to reward good scientific work. In fact, it often rewards exactly the opposite. So I see a complete overhaul of the scientific publishing system as the only way to start rewarding good scientific practice, and to encourage a collaborative, efficient, and therefore more successful and meritocractic global endeavour.

Below I list some of the problems I see in science, and how I think Octopus might help alleviate them:

‘Publish or perish’ mentality driving authorship

Because scientific papers are seen as the main output of scientific research at the moment, it naturally tends to become the main thing that researchers focus their efforts around. Recruitment, promotion and funding panels tend to look for both quantity of publications and 'quality' - but the easiest way to judge the quality of a paper without having to read and understand it is to look at the Journal in which it is published. Journals come with a heirarchy of reputation, but are papers published in the journals perceived as being the most prestigious actually representative of 'the best science'? This is certainly an issue of great debate - traditional commercial journals are after (paying) readers and so are keen to publish 'newsworthy' papers.

Octopus aims to tackle this issue by providing much more transparent summary measures of a researcher's output. Firstly, by encouraging researchers to rate the publications of others - giving them a 1-5 score on predefined qualities (carefully chosen to reflect the things that qualify as 'good science') - it will be possible for anyone to look at an individual researcher's page and quickly see how their peers rate different aspects of their work. Secondly, by encouraging reviewing and rating of others' work - and by allowing thore reviews also to be rated - researchers who give constructive, well-appreciated reviews will also be easy to see. This should also allow employers to reward collaborative work. Of course the system will have to be well-designed to avoid people 'gaming' it, but I believe any moves towards creating more useful and fine-grained metrics will be appreciated by both researchers and employers alike, and being able to define the aspects of research that we as a scientific community deem important will help drive scientific research in the directions we want it to take.

Low sample sizes, p-hacking and other practices that bias results towards a ‘significant’ result

Given the drive for researchers to get papers published by journals, and the drive for journals to gain paying readers, there is an inevitable pressure for researchers to produce papers with 'statistically significant' results. This has resulted in the development of a number of practices which are more likely to lead to such results when they are not actually reflective of the underlying data. This is often not currently detected (or even detectable) in the completed papers submitted to journals as it involves manipulations of the data that are not described (sometimes due to space constraints imposed by journals).

With Octopus, there is no drive for readership, no advantage to manipulating data to get a significant result - all contributions are equally welcome, and publishing the raw data (unmanipulated) is very much encouraged. There is no limitation on space or length of contributions. Contributions will be reviewed and rated on pre-defined criteria which will not include whether the results are 'statistically signficant' or not - what matters is how carefully and thoughtfully they have been collected and recorded (and, later, interpreted). This should remove the unhelpful pressures to make all results 'positive' and 'significant'.

Over-enthusiastic interpretation of results

As above, in order to make a paper sound more exciting, novel and hence likely to bring in readership, researchers are currently under pressure to interpret their results in such a way.

Again, by removing this pressure and instead setting our own criteria for judging an interpretation (such as 'supported fully by all data available'), researchers can be freed from such pressures. In fact, by removing the need to publish only complete 'papers', researchers will be able to publish small datasets (which would not themselves be sensibly interpretable), and instead encouraged to publish an interpretation or discussion which takes into account all published data in a field. Publishing a discussion which doesn't link to all published data will likely earn poor ratings from peers. This should make a huge difference to the sorts of interpretations and discussions had in science: no longer will there be a constant stream of seemingly contradictory interpretations every time a paper is published, encouraged by the need to be 'novel' in order to get published.

Publication bias

Related to the above problems, currently 'negative' results (those which do not reach statistical significance, those which are not 'novel' and lead to new interpretations, or those which do not support an author's personal stated favourite hypothesis) are difficult to get published in traditional journals. This means that there is the potential for huge amounts of data to go unpublished and that missing data may indeed be extremely important - if published it may change the overall conclusions and perceptions within a field. There have been a number of innovations to encourage the publishing of such results, but the most 'presitigious' (and hence influential) journals still rarely publish them.

Because Octopus asks users to rate published data as an individual item (rather than as part of a paper), and on pre-defined measures, the quality of the data collection and recording itself will be separated from any consideration of the possible interpretation of what that data shows. This will remove any conflation of 'quality of data' with 'what the data might mean'. This should remove the issue of publication bias.

A ‘replication crisis’

Because traditional journal publishing puts so much emphasis on novelty of conclusions (in order to gain readers), researchers are very much discouraged from repeating the work of others ('replication'), when in fact this is a very necessary part of scientific research for a number of reasons. The fact that many published and well-accepted studies turn out not only to be very difficult to replicate (since not enough detail is given about the methods used throughout the process) but also often give very different results when replicated has been dubbed the 'replication crisis'. As above, initiatives have been started to try to encourage replication studies, but these are not mainstream.

Octopus aims to avoid these problems and encourage replication by firstly making the 'methods' or 'protocol' a piece of publishable - and rateable/reviewable work in its own right. Given that the methods will be rated on pre-defined criteria (such as how well it lends itself to replication) this should encourage and reward careful and accurate recording of protocols. Secondly, the fact that in Octopus, publication of the method or protocol is separated from the publication of any data means that multiple datasets collected to the same protocol are each individually rewarded, encouraging such replication efforts.

Unnecessary repetition of research

The opposite to the lack of proper replication is the current unnecessary repetitio of work that results from work NOT being published (eg. negative results). If work is not published, others cannot know that it has been done, and therefore end up doing it again (and again, if it is still 'unsuccessful' and so again not published).

Octopus will avoid this by making it advantageous for researcher to publish all their work - even 'just' hypotheses.

Poor methodological/statistical expertise within groups (eg failure to deal with confounders) and lack of opportunities for researchers not in large groups.

A traditional scientific 'paper' in many fields requires expertise in a number of different areas. For instance, statistical analysis of data is a specialised skill and many research groups in experimental sciences struggle to find or nurture such expertise within their teams. The same can be said for experimental protocol development, data collection or any other of the specialisms required in experimental sciences. However, it is not possible to publish a traditional paper without bringing together all such skills into one team. This results in two effects. One is that many papers are published in which one or more sections have been poorly conducted - threatening the validity of the entire paper. Another is that researchers working alone (perhaps in a small institution or in a country without an established presence in a particular scientific field) can find it very difficult indeed to publish any work single-handedly and hence be excluded from the scientific establishment.

Because Octopus breaks up the concept of a traditional paper, it allows individual researchers to publish their own work, entirely within their own skillset (eg. to collect data without having to analyse and interpret it; to analyse data across any range of disciplines, bringing expertise gained in one type of analysis to another field etc). This will form a much more collaborative network - now researchers across the globe can effectively 'collaborate' on research, handing the baton from one to another down the chain of scientific research, from hypothesis to intrepretation and finally to real-world applications.

Data hoarding

Currently many research groups hold onto their raw data - partly because it allows them to gain as many publications out of it as possible (at the expense of rival groups), and partly because it is difficult to publish and share raw data. This means that analysis cannot be checked or re-run, and meta-analyses are near impossible.

Octopus does not aim to be a data repository itself. It does, though, aim to reward those who publish a link to their full raw data (in another repository) from their published data summary. Making this something that is explicitly rewarded in the ratings system will make it overtly encouraged. As mentioned above, making it unnecessary to publish an analysis and discussion alongside every publication of data, and yet very much discouraged from publishing a discussion which does not take into account all previously published datasets and analyses, will disrupt the idea of publishing conlusions based only on one dataset (and ignoring the work of others).

Unhelpful conflicts and competition between research groups/individuals

Science is an extremely competitive arena, with large amounts of funding at stake. All the factors already mentioned add up to a fairly toxic and competitive environment with little reward for collaboration.

Octopus hopes to break that system up, and overly encourage collaborative working. As already mentioned, those who write constructive and highly-appreciated reviews will see that reflected in their own metrics. Funding will no longer need to be given in such big chunks in order to sustain large groups working together to complete all stages of a traditional 'paper'. A single researcher can work in collaboration with the rest fo the scientific community, concentrating on combining their own skills to the work of others (eg. analysing the data of any other group). Groups can specialise in data collection, or protocol development, or any other stage of the scientific process without needing to complete all the others as well to gain rewards ('publication').

Poor targeting of funding

Currently, applying for grants is a long and wasteful process, involving submission of large amounts of proposal work to a committee of peers for review, and the funding comes in realtively large pots (at least in part in order to make the process cost effective). This is extremely wasteful. Funders also, like employers, have to rely on 'publication records' to assess the quality of researchers and their teams.

Since hypotheses, methods and protocols and data are all independently published in Octopus, and there peer-reviewed, it would no longer be necessary for funders to repeat all these steps themselves. Instead, they could easily identify well-rated protocols and offer funding to several groups to collect data according to it, etc. This will not only make funding more efficient, but more effective (encouraging replication across different laboratories etc), and meritocractic. Funders will also be able to use the more subtle and detailed metrics available in Octopus to assess the work of applicants.

Slowness of research progress (at all stages: funding allocation, publication etc)

Scientific research is painfully slow. This is not just because of the necessary slow and careful nature of the work - it is because aspects such as funding (see above) and publication of results are incredibly inefficient processes. Currently in the traditional publication process, a submitted paper takes months if not years to prepare because it has to contain multiple sections (introduction, methods, results, discussion etc) and carefully formatted references. When the authors submit it to a journal it is sent for review by several peers - a process taking many weeks - then sent back to the authors to make alterations - the process repeated until a journal editor is satisfied - the paper typeset and then reviewed by the authors again, and then finally published (or it is rejected, in which case the authors start again with another journal). This usually takes months and often years. Meanwhile, patients could be dying!

In Octopus, because the unit of publication is a small section of the scientific process rather than an entire paper from start to finish, preparation of the work for publication could be very quick indeed. It is likely to involve fewer authors. It does not require the (repetitive) writing of sections of a paper - linking to the top 'Problem' in the chain will automatically mean that readers will have access to all necessary introductory text. References will be inline digital links and so not need tedious formatting. Once all authors have agreed on the text and each clicked 'publish' the work will be instantly and globally available for others to read and review. Any problems discovered by others in the work will be clearly visible there for other readers. Ahyone can then take that work on to another stage, or collaborate by making constructive reviews (which may lead to the original authors publishing a new version, acknowledging the helpful input of others). This will make the whole process of scientific research speed up a great deal.

Lack of clarity over responsibilities of authors on papers (required to identify good or bad practice)

Because most traditional papers require a lot of collaboration between researchers with different skills (in many cases one author will know little about the work done by another author on the same paper), the list of authors does not reflect who did what on the paper. If misconduct is proven on a paper, all authors could be held responsible. By contrast, junior researcher who has done the majority of the work on an outstanding paper may not get the credit they deserve, named amongst a number of other authors. Recently, traditional papers increasingly include a section on 'author contributions' but these are not universal.

Because Octopus breaks down publishing into smaller units of work, authors will be much more clearly associated with the work they actually contributed.

Repeat publication of same work in different journals (self-plagiarism) & plagiarism

The drive to get more publications causes a pressure for researchers to try to publish the same work in different journals - something which is difficult to spot given that traditional publications are distributed across so many different journals and usually published in pdfs which are not easily machine-read.

'Digital-first' platforms like Octopus allow easy machine-reading to compare text and spot plagiarism or self-plagiarism. Additionally, if ALL scientific publishing is done in Octopus, it will be even easier for the platform itself to have built-in plagiarism detection.

Lack of accessibility of research in expensive journals

Traditional journals are commercial entities. Subscription to them is expensive, and hence outside of top insitutions, most academics do not have access to much research that is published. This lack of access to knowledge is obviously discriminatory and harmful to a collaborative and inclusive global scientific research environment. Additionally, traditional publications are now almost exclusively in English. Researchers who do not speak good English are therefore much less likely to be able to read, let alone publish, in the traditional scientific journals. This again leads to a waste of talent and makes science effectively closed to many scross the globe.

Octopus would be freely available to all to read, and for all registered users to publish in. This does raise two important issues: how can it be a sustainable infrastructure (can we design it to have minimal ongoing costs, and can the scientific community pull together to be able to afford those?), and how can registered users be authenticated in order to avoid anonymous logins (which could lead to spam or abusive comments) whilst still being open to any researcher to publish in, regardless of their insitution (which may not be an academic institution)? These are issues that it would be great to discuss! In terms of translation, there are now very impressive automatic translation tools, and these are only likely to improve in quality. Although researchers can now already use them to read digital content, I plan Octopus to have built-in translation for those writing in a non-English language as well. This will make it genuinely language-agnostic.

Predatory journals

Because there are now so many researchers keen to publish their work (in order to gain credit for it), there is now a thriving market in 'predatory journals' - essentially 'vanity publishing' for science. Authors pay to publish in these journals (this in itself is accepted practice, as many journals aim to be free for readers by instead charging authors). However, work in these journals is not (effectively) peer-reviewed. Instead they publish anything. The journals have very 'normal-sounding' names, making it difficult for potential employers to recognise in a list of publications which were published in peer-reviewed journals and which in 'predatory' journals.

Octopus, by replacing all journals (in experimental science), will remove the market for predatory journals. Instead, although it will be possible for anyone to 'publish' in it, the quality of the work will be immediately obvious from the reviews and ratings.

Conflicts of interest

It's important that any conflicts of interest in science are transparently declared. Much research is commercially paid-for, and that should be declared. Researchers may have personal financial interests, or personal relationships that need to be declared. Although many traditional journals ask authors to declare any relevant conflicts of interest, researchers may not know what counts as 'relevant' and different journals have different definitions and standards.

In Octopus, there would be a standard form to declare all potential conflicts of interest, which would only need completing in one place, which would then be available for all others to see at any time. This would make it easier for researchers to declare potential conflicts transparently.

Lack of use of research results in actual practice (due to difficulty in finding relevant research by those who might use it, lack of incentives for academics to encourage practical useage etc.)

Such large numbers of papers are now published daily that it is impossible for practitioners such as doctors, politicians or teachers to keep up to date on the latest findings. This requires 'evidence synthesis', which can be a difficult process as it involves bringing together research from across hundreds of journals and drawing conclusions from those different papers. Non-academics don't even have full access to original research papers (due to the fact that they have to be paid for by subscription). There are currently few incentives for academics to ensure that their findings lead to changes in practice.

Because the final stage in Octopus' chain of the scientific process is 'translation to practice' this stage will be equally valued as any other in the scientific process. The chain structure of Octopus will also mean that all research relating to a particular 'problem' will be linked to it, meaning that 'evidence synthesis' will be much, much easier - it will all be there in that one chain of links. And everything published in Octopus will be free for everyone to read.

The problems with Octopus

Although Octopus could have so many advantages, there are two issues that need solving. The biggest is how to ensure that users cannot create anonymous logins. I would love to use the ORCID system developed for academics to have a unique ID, but currently anyone can create an ORCID - it only requires an email address, meaning that any number of ORCIDs could be created by one person (or even a Bot). How could we authenticate users to Octopus?

Another issue is costs. I stringly feel that we can all pull together to create Octopus with minimal running costs. By creating a self-regulating community with a distributed underlying database (which academic establishments could help host) many costs could be minimised - but there will still be some. I don't want anyone to have to pay to be a member of or read Octopus - how else might its running costs be covered?

The other biggest issue I hear raised is how people will be encouraged to use Octopus rather than carry on publishing in journals, given the incentive system in place. I have a lot of thoughts on that, which I will post in my next blog!

Contributing to Octopus: We are just starting out. Please feel free to browse and contribute! GitHub