Thursday, 25 August 2016 08:37

Excel causing problems in gene names for a decade

By

Microsoft Excel has been giving scientists headaches for more than a decade, but nothing has been done to fix the issues which have been identified.

The problem is seen with the names of genes, according to a paper published by Mark Ziemann, Yotam Eren and Assam El-Osta on Tuesday.

Gene names, according to the paper, get converted to dates and floating point numbers when entered in an Excel spreadsheet. The examples cited were gene symbols such as SEPT2 (Septin 2) and MARCH1 (Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase) which get converted by default to "2-Sep" and "1-Mar", respectively.

This problem was initially made public in 2004.

The paper also said that Riken identifiers were being automatically converted to floating point numbers eg. from accession "2310009E13" to "2.31E+13".

The authors said since then they had found that Excel converted gene names to dates in supplementary data of recently published papers eg. "SEPT2" was converted to "2006/09/02".

They used scripts to examine supplementary files in Excel format from 18 journals published from 2005 to 2015.

"In total, we screened 35,175 supplementary Excel files, finding 7467 gene lists attached to 3597 published papers," the three authors wrote.

"We downloaded and opened each file with putative gene name errors. Ten false-positive cases were identified. We confirmed gene name errors in 987 supplementary files from 704 published articles. Of the selected journals, the proportion of published articles with Excel files containing gene lists that are affected by gene name errors is 19.6%."

They said the kind of errors described could be spotted by copying a column of gene names, pasting it into a new sheet and then sorting the column whereupon any gene symbols converted to dates would appear as numbers at the top of the column.

CHIEF DATA & ANALYTICS OFFICER BRISBANE 2020

26-27 February 2020 | Hilton Brisbane

Connecting the region’s leading data analytics professionals to drive and inspire your future strategy

Leading the data analytics division has never been easy, but now the challenge is on to remain ahead of the competition and reap the massive rewards as a strategic executive.

Do you want to leverage data governance as an enabler?Are you working at driving AI/ML implementation?

Want to stay abreast of data privacy and AI ethics requirements? Are you working hard to push predictive analytics to the limits?

With so much to keep on top of in such a rapidly changing technology space, collaboration is key to success. You don't need to struggle alone, network and share your struggles as well as your tips for success at CDAO Brisbane.

Discover how your peers have tackled the very same issues you face daily. Network with over 140 of your peers and hear from the leading professionals in your industry. Leverage this community of data and analytics enthusiasts to advance your strategy to the next level.

Download the Agenda to find out more

DOWNLOAD NOW!

Sam Varghese

website statistics

Sam Varghese has been writing for iTWire since 2006, a year after the site came into existence. For nearly a decade thereafter, he wrote mostly about free and open source software, based on his own use of this genre of software. Since May 2016, he has been writing across many areas of technology. He has been a journalist for nearly 40 years in India (Indian Express and Deccan Herald), the UAE (Khaleej Times) and Australia (Daily Commercial News (now defunct) and The Age). His personal blog is titled Irregular Expression.

VENDOR NEWS & EVENTS

REVIEWS

Recent Comments