The problem is seen with the names of genes, according to a paper published by Mark Ziemann, Yotam Eren and Assam El-Osta on Tuesday.
Gene names, according to the paper, get converted to dates and floating point numbers when entered in an Excel spreadsheet. The examples cited were gene symbols such as SEPT2 (Septin 2) and MARCH1 (Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase) which get converted by default to "2-Sep" and "1-Mar", respectively.
This problem was initially made public in 2004.
|
|
The authors said since then they had found that Excel converted gene names to dates in supplementary data of recently published papers eg. "SEPT2" was converted to "2006/09/02".
They used scripts to examine supplementary files in Excel format from 18 journals published from 2005 to 2015.
"In total, we screened 35,175 supplementary Excel files, finding 7467 gene lists attached to 3597 published papers," the three authors wrote.
"We downloaded and opened each file with putative gene name errors. Ten false-positive cases were identified. We confirmed gene name errors in 987 supplementary files from 704 published articles. Of the selected journals, the proportion of published articles with Excel files containing gene lists that are affected by gene name errors is 19.6%."
They said the kind of errors described could be spotted by copying a column of gene names, pasting it into a new sheet and then sorting the column whereupon any gene symbols converted to dates would appear as numbers at the top of the column.
