A BIG DATA APPROACH IN MUTATION ANALYSIS AND PREDICTION

Authors

DOI:

https://doi.org/10.24193/subbi.2017.1.06

Keywords:

Big data, genetics, software, machine learning.

Abstract

Although the technology advancement in the last few years has been exponentially growing, there are still a lot of medical problems that don’t have an accessible solution. One of these problems is the one that genetics is facing: the absence of a solution for inspecting the previously reported genetic mutations. In order to confirm a mutation, the specialists need to narrow it down based on their experience and, if present, the few documented precedent cases. This paper focuses on presenting a solution for analyzing big amounts of historical genetic data in an efficient, fast and user-friendly way. As a proof of concept, it demonstrates the huge role that Big Data has in genetic mutations aggregation and it can be considered a starting point for similar solutions that aim to continuously innovate genetics. The effectiveness of our proposal is highlighted by comparing it with similar existing solutions.

Author Biography

Silvana ALBERT, Faculty of Mathematics and Computer Science, Babeș-Bolyai University, Cluj-Napoca, Romania. Email: albert.silvana@cs.ubbcluj.ro

Department of Computer Science, Faculty of Mathematics and Computer Science, Babeș-Bolyai University, Cluj-Napoca, Romania. Email: albert.silvana@cs.ubbcluj.ro

References

S. Ayme and J. Schmidtke. Networking for rare diseases: a necessity for europe. Bundesgesundheitsblatt, 2007.

S. Bamford, E. Dawson, S. Forbes, J. Clements., R. Pettett, A. Dogan, A. Flanagan, J. Teague, P.A. Futreal, M.R. Stratton, and R. Wooster. The cosmic (catalogue of somatic mutations in cancer) database and website. Br. J. Cancer, 91(2):355–8, July 2004.

Berkeley University. Understanding Evolution - The causes of mutations. http://evolution.berkeley.edu/evolibrary/article/evo 20. Online; 2017.

A. Cockcroft and D. Sheahan. The Netflix Technology Blog. https://medium.com/netflix-techblog/benchmarking-cassandra-scalability-on-aws-over-a-million-writes-per-second-39f45f066c9e. Online; 2011.

B. Feldman, E.M. Martin, and T. Skotnes. Big data in healthcare hype and hope. Technical report, Dr. Bonnie 360, October 2012.

L. Fernandes, M. OtConnor M., and V. Weaver. Big data, bigger outcomes. AHIMA, 83(10):38–43, 2012.

S. Finlay. Predictive Analytics, Data Mining and Big Data: Myths, Misconceptions and Methods. Business in the Digital Economy. Palgrave Macmillan UK, 2014.

French National Institute for Health and Medical Research. The portal for rare diseases and orphan drugs. http://www.orpha.net/consor/cgi-bin/index.php. Online; 2017.

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. The WEKA data mining software: An update. SIGKDD Explor. Newsl., 11(1):10–18, November 2009.

R. Hecht and S. Jablonski. NoSQL evaluation: A use case oriented survey. In 2011 International Conference on Cloud and Service Computing, pages 336–341, Dec 2011.

Institute of Medical Genetics in Cardiff. The Human Gene Mutation Database. http://www.hgmd.cf.ac.uk/ac/index.php. Online; 2017.

A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35–40, April 2010.

S. Maiella, A. Rath, C. Angin, F. Mousson, and O. Kremp. [orphanet and its consortium: where to find expert-validated information on rare diseases]. Revue neurologique, 169(Suppl 1):S3–8, 2013.

J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A Byers-Hung. Big data: The next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute, June 2011.

Ministry for Primary Industries. COSMIC, the Catalogue Of Somatic Mutations In Cancer. http://cancer.sanger.ac.uk/cosmic. Online; v80, released 13-Feb-17.

T. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition, 1997. [17] P.D. Stenson, M. Mort, E.V. Ball, K. Evans, M. Hayden, S. Heywood, M. Hussain, A.D. Phillips, and D.N. Cooper. The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Human Genetics, pages 1–13, 2017.

G. Stoesser, W. Baker, and A. Broek. The embl nucleotide sequence database. Nucleic Acids Research, 30:21–26, 2002.

T. A. M. C. Thantriwatte and C. I. Keppetiyagama. NoSQL query processing system for wireless ad-hoc and sensor networks. In 2011 International Conference on Advances in ICT for Emerging Regions (ICTer), pages 78–82, Sept 2011.

B. Wang, L. Ruowang, and W. Perrizo. Big Data Analytics in Bioinformatics and Healthcare. IGI Global, Hershey, PA, USA, 1st edition, 2014.

R. Wullianallur and V. Raghupathi. Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1):1–3, 2014.

B. Zenger. Can big data solve healthcares big problems? Health Byte, 2012.

Downloads

Published

2017-06-01

How to Cite

ALBERT, S. (2017). A BIG DATA APPROACH IN MUTATION ANALYSIS AND PREDICTION. Studia Universitatis Babeș-Bolyai Informatica, 62(1), 75–89. https://doi.org/10.24193/subbi.2017.1.06

Issue

Section

Articles

Most read articles by the same author(s)