The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.
oai:noad.sci.am:136201
Cybernetics and Information Technologies
Institute for Informatics and Automation Problems of the National Academy of Sciences of the Republicof Armenia ; Université Fédérale Toulouse Midi-Pyrénées, Toulouse, France ; National Polytechnic University of Armenia
Chen, J., Y. Chen, X. Du, C. Li, J. Lu, S. Zhao, X. Zhou. Big Data Challenge: A Data Management Perspective. – Frontiers of Computer Science, Vol. 7, 2013, No 2, pp. 157-164. ; Lublinsky, B., K. T. Smith, A. Yakubovich. Professional Hadoop Solutions. Indiana, USA, John Wiley & Sons, 2013, p. 504. ; Zaharia, M., R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi. Apache Spark: A Unified Engine for Big Data Processing. – Communications of the ACM, Vol. 59, 2016, No 11, pp. 56-65. ; Cheng, D., X. Zhou, P. Lama, J. Mike, C. Jiang. Energy Efficiency Aware Task Assignment with DVFS in Heterogeneous Hadoop Clusters. – IEEE Transactions on Parallel and Distributed Systems, Vol. 29, 2017, No 1, pp. 70-82. ; Nitu, V., A. Kocharyan, H. Yaya, A. Tchana, D. Hagimont, H. Astsatryan. Working Set Size Estimation Techniques in Virtualized Environments: One Size Does Not Fit All – ACM Meas. Anal. Comput. Syst., Vol. 2, 2018, pp. 1-21. ; Kothuri, P., D. Garcia, J. Hermans. Developing and Optimizing Applications in Hadoop.– Journal of Physics: Conference Series, Vol. 898, 2017, No 5. ; Dean, J., S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. – Communications of the ACM, Vol. 51, 2008, No 1, pp. 107-113. ; Won, H., M. C. Nguyen, M. S. Gil, Y. S. Moon, K. Y. Whang. Moving Metadata from Ad Hoc Files to Database Tables for Robust, Highly Available, and Scalable HDFS. – The Journal of Supercomputing, Vol. 73, 2017, No 6, pp. 2657-2681. ; Uthayakumar, J., T. Vengattaraman, P. Dhavachelvan. A Survey on Data Compression Techniques: From the Perspective of Data Quality, Coding Schemes, Data Type and Applications. – Journal of King Saud University – Computer and Information Sciences, 2018. ; Liu, L. Y., J. F. Wang, R. J. Wang, J. Y. Lee. Design and Hardware Architectures for Dynamic Huffman Coding – IEEE Proceedings-Computers and Digital Techniques, Vol. 142, 1995, No 6, pp. 411-418. ; Fenwick, P. M. The Burrows-Wheeler Transform for Block Sorting Text Compression: Principles and Improvements. – The Computer Journal, Vol. 39, 1996, No 9, pp. 731-740. ; Fang, J., J. Chen, Z. Al-Ars, P. Hofstee, J. Hidders. Work-in-Progress: A High-Bandwidth Snappy Decompressor in Reconfigurable Logic. – In: Proc. of IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), Turin, Italy, 30 September – 5 October 2018, pp. 1-2. ; Liu, W., F. Mei, C. Wang, M. O’Neill, E. E. Swartzlander. Data Compression Device Based on Modified LZ4 Algorithm. – IEEE Transactions on Consumer Electronics, Vol. 64, 2018, No 1, pp. 110-117. ; Rattanaopas, K., S. Kaewkeeree. Improving Hadoop MapReduce Performance with Data Compression: A Study Using Wordcount Job. – In: Proc. of 14th IEEE International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON’17), 2017, pp. 564-567. ; Haider, A., X. Yang, N. Liu, X. H. Sun, S. He. IC-Data: Improving Compressed Data Processing in Hadoop. – In: Proc. of 22nd IEEE International Conference on High Performance Computing (HiPC’15), 2015, pp. 356-365. ; Chen, Y., A. Ganapathi, R. H. Katz. To Compress or Not to Compress-Compute vs IO Tradeoffs for Mapreduce Energy Efficiency. – In: Proc. of 1st ACM SIGCOMM Workshop on Green Networking, 2010, pp. 23-28. ; Lang, W., J. M. Patel. Energy Management for MapReduce Clusters. – In: Proc. of VLDB Endowment, Vol. 3, 2010, No 1-2, pp. 129-139. ; Li, W., H. Yang, Z. Luan, D. Qian. Energy Prediction for Mapreduce Workloads. – In: Proc. of 9th IEEE International Conference on Dependable, Autonomic and Secure Computing, 2011, pp. 443-448. ; Wirtz, T., R. Ge. Improving Mapreduce Energy Efficiency for Computation Intensive Workloads. – In: Proc. of IEEE International Green Computing Conference and Workshops, 2011, pp. 1-8. ; Leverich, J., C. Kozyrakis. On the Energy (in) Efficiency of Hadoop Clusters. – ACM SIGOPS Operating Systems Review, Vol. 44, 2010, No 1, pp. 61-65. ; Tiwari, N., S. Sarkar, U. Bellur, M. Indrawan. An Empirical Study of Hadoop’s Energy Efficiency on a HPC Cluster. – Procedia Computer Science, Vol. 29, 2014, pp. 62-72. ; Tatineni, M., J. Greenberg, R. Wagner, E. Hocks, C. Irving. Hadoop Deployment and Performance on Gordon Data Intensive Supercomputer. – In: Proc. of Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, 2013, pp. 1-3. ; Narkhede, S., T. Baraskar. HMR Log Analyzer: Analyze Web Application Logs over Hadoop MapReduce. – International Journal of UbiComp (IJU), Vol. 4, 2013, No 3, pp. 41-51. ; Krishna, K., M. N. Murty. Genetic k-Means Algorithm. – IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 29, No 3, 1999, pp. 433-439. ; Zhao, W., H. Ma., Q. He. Parallel K-Means Clustering Based on MapReduce. – In: CloudCom 2009. LNCS 5931. Berlin, Springer, 2009, pp. 674-679. ; Astsatryan, H., V. Sahakyan, Y. Shoukourian, P. H. Cros, M. Dayde, J. Dongarra, P. Oster. Strengthening Compute and Data Intensive Capacities of Armenia. – In: Proc. of 14th IEEE RoEduNet International Conference – Networking in Education and Research (NER’15), Craiova, Romania; September 2015, pp. 28-33. ; Astsatryan, H., W. Narsisian, A. Kocharyan, G. da Costa, A. Hankel, A. Oleksiak. Energy Optimization Methodology for e-Infrastructure Providers. – Willey Concurrency and Computation: Practice and Experience, Vol. 29, 2017, No 10. DOI: 10.1002/cpe.4073. ; Nitu, V., A. Kocharyan, H. Yaya, A. Tchana, D. Hagimont, H. Astsatryan. Working Set Size Estimation Techniques in Virtualized Environments: One Size Does Not Fit All. – Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 2, 2018, No 1, pp. 1-22.
ACM Digital Library ; Baidu Scholar ; Cabell's Whitelist ; CNKI Scholar (China National Knowledge Infrastructure) ; CNPIEC - cnpLINKer ; Dimensions ; DOAJ (Directory of Open Access Journals) ; EBSCO (relevant databases) ; EBSCO Discovery Service ; Engineering Village ; Genamics JournalSeek ; Google Scholar ; Inspec ; Japan Science and Technology Agency (JST) ; J-Gate ; JournalGuide ; JournalTOCs ; KESLI-NDSL (Korean National Discovery for Science Leaders) ; Mathematical Reviews (MathSciNet) ; Microsoft Academic ; MyScienceWork ; Naver Academic ; Naviga (Softweco) ; Primo Central (ExLibris) ; ProQuest (relevant databases) ; Publons ; QOAM (Quality Open Access Market) ; ReadCube ; SCImago (SJR) ; Scopus ; Semantic Scholar ; Sherpa/RoMEO ; Summon (ProQuest) ; TDNet ; Ulrich's Periodicals Directory/ulrichsweb ; WanFang Data ; Web of Science - Emerging Sources Citation Index ; WorldCat (OCLC)
Apr 23, 2021
Apr 23, 2021
67
https://noad.sci.am/publication/149766
Հրատարակության անուն | Ամսաթիվ |
---|---|
Astsatryan Hrachya, Performance Optimization System for Hadoop and Spark Frameworks | Apr 23, 2021 |
Kocharyan Aram Ekane Brice Teabe Boris Son Tran Giang Astsatryan Hrachya Hagimont Daniel
Astsatryan Hrachya Hayrapetyan Andranik Narsisian Wahi Asmaryan Shushanik Saghatelyan Armen Muradyan Vahagn Giuliani Gregory Guigoz Yaniss Ray Nicolas
Astsatryan Hrachya Narsisian Wahi Gyulgyulyan Eliza Baghdasaryan Vardan Poghosyan Armen Mamasakhlisov Yevgeni Wittenburg Peter
Astsatryan Hrachya Grogoryan Hayk Gyulgyulyan Eliza Hakobyan Anush Kocharyan Aram Narsisian Wahi Sahakyan Vladimir Yuri Shoukourian Rita Abrahamyan Zarmandukht Petrosyan Julien Aligon
Astsatryan Hrachya Narsisian Wahi Georges da Costa
Ասցատրյան Հրաչյա Գիչունց Էդիտա
Astsatryan Hrachya Asmaryan Shushanik Saghatelyan Armen Bigagli Lorenzo Mazzetti Paolo Nativi Stefano Guigoz Yaniss Pierre Lacroix Gregory Giuliani Nicolas Ray
Astsatryan Hrachya Abajyan H. Narsisian Wahi Georges da Costa Gurout Tom