Big Data: an exploration of research, technologies and application cases
Big Data has become a worldwide trend and although still lacks a scientific or academic consensual concept, every day it portends greater market growth that surrounds and the associated research areas. This paper reports a systematic review of the literature on Big Data considering a state of the art about techniques and technologies associated with Big Data, which include capture, processing, analysis and data visualization. The characteristics, strengths, weaknesses and opportunities for some applications and Big Data models that include support mainly for modeling, analysis, and data mining are explored. Likewise, some of the future trends for the development of Big Data are introduced by basic aspects, scope, and importance of each one. The methodology used for exploration involves the application of two strategies, the first corresponds to a scientometric analysis and the second corresponds to a categorization of documents through a web tool to support the process of literature review. As results, a summary and conclusions about the subject are generated and possible scenarios arise for research work in the field.
K.C. Li, H. Jiang, L. T. Yang, and A. Cuzzocrea, Big Data: Algorithms, Analytics, and Applications, Chapman &. CRC Press, 2015.
H. Mohanty, P. Bhuyan, and D. Chenthati, Big Data: A Primer, vol. 11. Springer, 2015.
W. M. P. van der Aalst, “Data Scientist: The Engineer of the Future,” in Enterprise Interoperability VI, no. 7, K. Mertins, F. Bénaben, R. Poler, and J.-P. Bourrières, Eds. Springer International Publishing, 2014, pp. 13–26.
M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mob. Networks Appl., vol. 19, no. 2, pp. 171–209, Apr. 2014.
L. A. Montenegro Mora, “¿Cómo elaborar un artículo de revisión?,” San Juan de Pasto, Nariño, Colombia, 2013.
Elsevier, “Scopus The largest database of peer-reviewed literature,” Scopus Elsevier. 2016. [Online]. Available: https://www.elsevier.com/solutions/scopus.
S. Robledo Giraldo, G. Osorio Zuluaga, and C. López Espinosa, “Networking en pequeña empresa: una revisión bibliográfica utilizando la teoría de grafos,” Rev. Vínculos, vol. 11, no. 2, pp. 6–16, 2014.
J. Dean and S. Ghemawat, “MapReduce,” Commun. ACM, vol. 51, no. 1, p. 107, Jan. 2008.
M. Armbrust, I. Stoica, M. Zaharia, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, and A. Rabkin, “A view of cloud computing,” Commun. ACM, vol. 53, no. 4, p. 50, Apr. 2010.
P. Zikopoulos and C. Eaton, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, 1st ed. McGraw-Hill Osborne Media, 2011.
T. White, Hadoop: The Definitive Guide, 2nd ed. United States of America: O’Reilly Media, Inc, 2010.
D. Bollier, “The Promise and Peril of Big Data,” Washington, DC, 2010.
C. L. P. Chen and C.-Y. Zhang, “Data-intensive applications, challenges, techniques and technologies: A survey on Big Data,” Inf. Sci. (Ny)., vol. 275, pp. 314–347, 2014.
J. Chen, Y. Chen, X. Du, C. Li, J. Lu, S. Zhao, and X. Zhou, “Big data challenge: a data management perspective,” Front. Comput. Sci., vol. 7, no. 2, pp. 157–164, Apr. 2013.
X. Jin, B. W. Wah, X. Cheng, and Y. Wang, “Significance and Challenges of Big Data Research,” Big Data Res., vol. 2, no. 2, pp. 59–64, Jun. 2015.
Xindong Wu, Xingquan Zhu, Gong-Qing Wu, and Wei Ding, “Data mining with big data,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 1, pp. 97–107, Jan. 2014.
T. A. S. Foundation, “Welcome to ApacheTM Hadoop®!,” hadoop. 2016. [Online]. Availa-ble: http://hadoop.apache.org/
M. Klein, R. Sharma, C. H. Bohrer, C. M. Avelis, and E. Roberts, “Biospark: scalable analysis of large numerical datasets from biological simulations and experiments using Hadoop and Spark,” Bioinformatics, vol. 33, no. 2, pp. 303–305, Jan. 2017.
A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz, “Hadoop GIS: a high performance spatial data warehousing system over mapreduce,” Proc. VLDB Endow., vol. 6, no. 11, pp. 1009–1020, 2013.
A. M. Aly, H. Elmeleegy, Y. Qi, and W. Aref, “Kangaroo,” in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining - WSDM ’16, 2016, pp. 397–406.
R. Lämmel, “Google’s MapReduce programming model — Revisited,” Sci. Comput. Program., vol. 70, no. 1, pp. 1–30, Jan. 2008.
K. Lee, K. Jung, J. Park, and D. Kwon, “ARLS: A MapReduce-based output analysis tool for large-scale simulations,” Adv. Eng. Softw., vol. 95, pp. 28–37, May 2016.
J.-D. Wang, “Extracting significant pattern histories from timestamped texts using MapReduce,” J. Supercomput., vol. 72, no. 8, pp. 3236–3260, Aug. 2016.
H. Zhang and N. Xiao, “Parallel implementation of multilayered neural networks based on Map-Reduce on cloud computing clusters,” Soft Comput., vol. 20, no. 4, pp. 1471–1483, Apr. 2016.
Y. Ji, Y. Tian, F. Shen, and J. Tran, “Experimental Evaluations of MapReduce in Biomedical Text Mining,” in Information Technology: New Generations, Springer, 2016, pp. 665–675.
S. Singh and N. Ahuja, “Article recommendation system based on keyword using map-reduce,” in 2015 Third International Conference on Image Information Processing (ICIIP), 2015, pp. 548–550.
T. A. S. Foundation, “Apache HBase,” Apache HBase. 2016. [Online]. Available: http://hbase.apache.org/
G. C. Deka, “A Survey of Cloud Database Systems,” IT Prof., vol. 16, no. 2, pp. 50–57, Mar. 2014.
T. A. S. Foundation, “The Apache Cassandra Project,” Apache Cassandra. 2015.
E. Dede, B. Sendir, P. Kuzlu, J. Hartog, and M. Govindaraju, “An Evaluation of Cassandra for Hadoop,” in 2013 IEEE Sixth International Conference on Cloud Computing, 2013, vol. 2013, pp. 494–501.
T. A. S. Foundation, “Apache Mahout: Scalable machine learning and data mining,” Apache Mahout. 2016.
G. Ingersoll, “Introducing Apache Mahout,” IBM developerWorks. 2009. [Online]. Availa-ble: http://www.ibm.com/developerworks/java/library/j-mahout/
G. Ingersoll, “Apache Mahout: Aprendizaje escalable con máquina para todos,” IBM developerWorks. 2012. [Online]. Available: http://www.ibm.com/developerworks/ssa/library/j-mahout-scaling/
S. M. D. MUJEEB and L. K. NAIDU, “A Relative Study on Big Data Applications and Techniques,” Int. J. Eng. Innov. Technol., vol. 4, no. 10, pp. 133–138, 2015.
J. Han, J. Pei, and M. Kamber, “Data mining: concepts and techniques,” 3rd ed., E. Inc., Ed. Morgan Kaufmann Publishers, 2011, p. 703.
R. L. Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, and R. Namburu, Data Mining for Scientific and Engineering Applications, vol. 2. Boston, MA: Springer US, 2013.
R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, Machine learning: An artificial intelligence approach. Springer Science & Business Media, 2013.
P. Domingos, “A few useful things to know about machine learning,” Commun. ACM, vol. 55, no. 10, p. 78, Oct. 2012.
I. Portugal, P. Alencar, and D. Cowan, “The Use of Machine Learning Algorithms in Recommender Systems: A Systematic Review,” arXiv, vol. 4, pp. 1–16, Nov. 2015.
M. Crawford, T. M. Khoshgoftaar, J. D. Prusa, A. N. Richter, and H. Al Najada, “Survey of review spam detection using machine learning techniques,” J. Big Data, vol. 2, no. 1, p. 23, Dec. 2015.
Wei-Yang Lin, Ya-Han Hu, and Chih-Fong Tsai, “Machine Learning in Financial Crisis Prediction: A Survey,” IEEE Trans. Syst. Man, Cybern. Part C (Applications Rev., vol. 42, no. 4, pp. 421–436, Jul. 2012.
R. Dash and P. K. Dash, “A hybrid stock trading framework integrating technical analysis with machine learning techniques,” J. Financ. Data Sci., vol. 2, no. 1, pp. 42–57, Mar. 2016.
J. Patel, S. Shah, P. Thakkar, and K. Kotecha, “Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques,” Expert Syst. Appl., vol. 42, no. 1, pp. 259–268, Jan. 2015.
E. Cuevas, D. Zaldívar, and M. Perez-Cisneros, Applications of Evolutionary Computation in Image Processing and Pattern Recognition, 1st ed., vol. 100. Cham: Springer International Publishing, 2016.
K.-F. Man, K. S. TANG, and S. Kwong, Genetic Algorithms: Concepts and Designs. Springer Science & Business Media, 2012.
G. Luque and E. Alba, Parallel Genetic Algorithms: Theory and Real World Applications, vol. 367. Springer, 2011.
U. Maulik, S. Bandyopadhyay, and A. Mukhopadhyay, Multiobjective Genetic Algorithms for Clustering: Applications in Data Mining and Bioinformatics. Springer Science & Business Media, 2011.
A. E. Doub, M. L. Small, A. Levin, K. LeVangie, and T. R. Brick, “Identifying users of traditional and Internet-based resources for meal ideas: An association rule learning approach,” Appetite, vol. 103, pp. 128–136, Aug. 2016.
H. Sundell, R. Konig, and U. Johansson, “Pragmatic Approach to Association Rule Learning in Real-World Scenarios,” in 2015 International Conference on Computational Science and Computational Intelligence (CSCI), 2015, pp. 356–361.
R. Sarno, R. D. Dewandono, T. Ahmad, M. F. Naufal, and F. Sinaga, “Hybrid association rule learning and process mining for fraud detection,” IAENG Int. J. Comput. Sci., vol. 42, no. 2, pp. 1–14, 2015.
S. Jaramillo Valbuena and J. M. Londoño, “Sistemas para almacenar grandes volúmenes de datos,” Rev. Gerenc. Tecnológica Informática, vol. 13, no. 37, pp. 17–28, 2015.
S. Sagiroglu and D. Sinanc, “Big data: A review,” in 2013 International Conference on Collaboration Technologies and Systems (CTS), 2013, pp. 42–47.
D. E. O’Leary, “Artificial Intelligence and Big Data,” IEEE Intell. Syst., vol. 28, no. 2, pp. 96–99, Mar. 2013.
A. Gandomi and M. Haider, “Beyond the hype: Big data concepts, methods, and analytics,” Int. J. Inf. Manage., vol. 35, no. 2, pp. 137–144, Apr. 2015.