Performance Optimization System for Hadoop and Spark Frameworks - NOAD-ը

Անցնել հիմնական ընտրացանկին
Անցնել փնտրման շարժիչին
Անցնել բովանդակությանը
Անցնել ներքևի տողատակին

Performance Optimization System for Hadoop and Spark Frameworks

Ներբեռնել

Նկարագրություն
Տեղեկատվություն

Վերնագիր: Performance Optimization System for Hadoop and Spark Frameworks

Հեղինակ:

Astsatryan Hrachya ; Kocharyan Aram ; Hagimont Daniel ; Lalayan Arthur

Տեսակ:

Ամփոփում:

The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.

Հրատարակիչ:

Հանձնման ամսաթիվը:

Ընդունման ամսաթիվը:

Փոփոխման ամսաթիվը:

Նույնականացուցիչ:

oai:noad.sci.am:136201

DOI:

10.2478/cait-2020-0056

Լեզու:

Ամսագրի կամ հրապարակման վերնագիր:

Cybernetics and Information Technologies

Հատոր:

Համար:

URL:

լրացուցիչ տեղեկատվություն:

The paper is supported by the European Union’s Horizon 2020 researchinfrastructures programme under grant agreement No 857645, project NI4OS Europe(National Initiatives for Open Science in Europe).

Կազմակերպության անվանում:

Institute for Informatics and Automation Problems of the National Academy of Sciences of the Republicof Armenia ; Université Fédérale Toulouse Midi-Pyrénées, Toulouse, France ; National Polytechnic University of Armenia

Տարի:

Հիշատակումներ:

Chen, J., Y. Chen, X. Du, C. Li, J. Lu, S. Zhao, X. Zhou. Big Data Challenge: A Data Management Perspective. – Frontiers of Computer Science, Vol. 7, 2013, No 2, pp. 157-164. ; Lublinsky, B., K. T. Smith, A. Yakubovich. Professional Hadoop Solutions. Indiana, USA, John Wiley & Sons, 2013, p. 504. ; Zaharia, M., R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi. Apache Spark: A Unified Engine for Big Data Processing. – Communications of the ACM, Vol. 59, 2016, No 11, pp. 56-65. ; Cheng, D., X. Zhou, P. Lama, J. Mike, C. Jiang. Energy Efficiency Aware Task Assignment with DVFS in Heterogeneous Hadoop Clusters. – IEEE Transactions on Parallel and Distributed Systems, Vol. 29, 2017, No 1, pp. 70-82. ; Nitu, V., A. Kocharyan, H. Yaya, A. Tchana, D. Hagimont, H. Astsatryan. Working Set Size Estimation Techniques in Virtualized Environments: One Size Does Not Fit All – ACM Meas. Anal. Comput. Syst., Vol. 2, 2018, pp. 1-21. ; Kothuri, P., D. Garcia, J. Hermans. Developing and Optimizing Applications in Hadoop.– Journal of Physics: Conference Series, Vol. 898, 2017, No 5. ; Dean, J., S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. – Communications of the ACM, Vol. 51, 2008, No 1, pp. 107-113. ; Won, H., M. C. Nguyen, M. S. Gil, Y. S. Moon, K. Y. Whang. Moving Metadata from Ad Hoc Files to Database Tables for Robust, Highly Available, and Scalable HDFS. – The Journal of Supercomputing, Vol. 73, 2017, No 6, pp. 2657-2681. ; Uthayakumar, J., T. Vengattaraman, P. Dhavachelvan. A Survey on Data Compression Techniques: From the Perspective of Data Quality, Coding Schemes, Data Type and Applications. – Journal of King Saud University – Computer and Information Sciences, 2018. ; Liu, L. Y., J. F. Wang, R. J. Wang, J. Y. Lee. Design and Hardware Architectures for Dynamic Huffman Coding – IEEE Proceedings-Computers and Digital Techniques, Vol. 142, 1995, No 6, pp. 411-418. ; Fenwick, P. M. The Burrows-Wheeler Transform for Block Sorting Text Compression: Principles and Improvements. – The Computer Journal, Vol. 39, 1996, No 9, pp. 731-740. ; Fang, J., J. Chen, Z. Al-Ars, P. Hofstee, J. Hidders. Work-in-Progress: A High-Bandwidth Snappy Decompressor in Reconfigurable Logic. – In: Proc. of IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), Turin, Italy, 30 September – 5 October 2018, pp. 1-2. ; Liu, W., F. Mei, C. Wang, M. O’Neill, E. E. Swartzlander. Data Compression Device Based on Modified LZ4 Algorithm. – IEEE Transactions on Consumer Electronics, Vol. 64, 2018, No 1, pp. 110-117. ; Rattanaopas, K., S. Kaewkeeree. Improving Hadoop MapReduce Performance with Data Compression: A Study Using Wordcount Job. – In: Proc. of 14th IEEE International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON’17), 2017, pp. 564-567. ; Haider, A., X. Yang, N. Liu, X. H. Sun, S. He. IC-Data: Improving Compressed Data Processing in Hadoop. – In: Proc. of 22nd IEEE International Conference on High Performance Computing (HiPC’15), 2015, pp. 356-365. ; Chen, Y., A. Ganapathi, R. H. Katz. To Compress or Not to Compress-Compute vs IO Tradeoffs for Mapreduce Energy Efficiency. – In: Proc. of 1st ACM SIGCOMM Workshop on Green Networking, 2010, pp. 23-28. ; Lang, W., J. M. Patel. Energy Management for MapReduce Clusters. – In: Proc. of VLDB Endowment, Vol. 3, 2010, No 1-2, pp. 129-139. ; Li, W., H. Yang, Z. Luan, D. Qian. Energy Prediction for Mapreduce Workloads. – In: Proc. of 9th IEEE International Conference on Dependable, Autonomic and Secure Computing, 2011, pp. 443-448. ; Wirtz, T., R. Ge. Improving Mapreduce Energy Efficiency for Computation Intensive Workloads. – In: Proc. of IEEE International Green Computing Conference and Workshops, 2011, pp. 1-8. ; Leverich, J., C. Kozyrakis. On the Energy (in) Efficiency of Hadoop Clusters. – ACM SIGOPS Operating Systems Review, Vol. 44, 2010, No 1, pp. 61-65. ; Tiwari, N., S. Sarkar, U. Bellur, M. Indrawan. An Empirical Study of Hadoop’s Energy Efficiency on a HPC Cluster. – Procedia Computer Science, Vol. 29, 2014, pp. 62-72. ; Tatineni, M., J. Greenberg, R. Wagner, E. Hocks, C. Irving. Hadoop Deployment and Performance on Gordon Data Intensive Supercomputer. – In: Proc. of Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery, 2013, pp. 1-3. ; Narkhede, S., T. Baraskar. HMR Log Analyzer: Analyze Web Application Logs over Hadoop MapReduce. – International Journal of UbiComp (IJU), Vol. 4, 2013, No 3, pp. 41-51. ; Krishna, K., M. N. Murty. Genetic k-Means Algorithm. – IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 29, No 3, 1999, pp. 433-439. ; Zhao, W., H. Ma., Q. He. Parallel K-Means Clustering Based on MapReduce. – In: CloudCom 2009. LNCS 5931. Berlin, Springer, 2009, pp. 674-679. ; Astsatryan, H., V. Sahakyan, Y. Shoukourian, P. H. Cros, M. Dayde, J. Dongarra, P. Oster. Strengthening Compute and Data Intensive Capacities of Armenia. – In: Proc. of 14th IEEE RoEduNet International Conference – Networking in Education and Research (NER’15), Craiova, Romania; September 2015, pp. 28-33. ; Astsatryan, H., W. Narsisian, A. Kocharyan, G. da Costa, A. Hankel, A. Oleksiak. Energy Optimization Methodology for e-Infrastructure Providers. – Willey Concurrency and Computation: Practice and Experience, Vol. 29, 2017, No 10. DOI: 10.1002/cpe.4073. ; Nitu, V., A. Kocharyan, H. Yaya, A. Tchana, D. Hagimont, H. Astsatryan. Working Set Size Estimation Techniques in Virtualized Environments: One Size Does Not Fit All. – Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 2, 2018, No 1, pp. 1-22.

Ինդեքսավորում:

ACM Digital Library ; Baidu Scholar ; Cabell's Whitelist ; CNKI Scholar (China National Knowledge Infrastructure) ; CNPIEC - cnpLINKer ; Dimensions ; DOAJ (Directory of Open Access Journals) ; EBSCO (relevant databases) ; EBSCO Discovery Service ; Engineering Village ; Genamics JournalSeek ; Google Scholar ; Inspec ; Japan Science and Technology Agency (JST) ; J-Gate ; JournalGuide ; JournalTOCs ; KESLI-NDSL (Korean National Discovery for Science Leaders) ; Mathematical Reviews (MathSciNet) ; Microsoft Academic ; MyScienceWork ; Naver Academic ; Naviga (Softweco) ; Primo Central (ExLibris) ; ProQuest (relevant databases) ; Publons ; QOAM (Quality Open Access Market) ; ReadCube ; SCImago (SJR) ; Scopus ; Semantic Scholar ; Sherpa/RoMEO ; Summon (ProQuest) ; TDNet ; Ulrich's Periodicals Directory/ulrichsweb ; WanFang Data ; Web of Science - Emerging Sources Citation Index ; WorldCat (OCLC)

Խորագիր և բանալի բառեր:

Hadoop Spark data compression CPU/IO tradeoff performance optimization

Օբյեկտի հավաքածուներ:

Digital Library > Ակադեմիական ինստիտուտներ > ՀՀ ԳԱԱ Ինֆորմատիկայի և ավտոմատացման պրոբլեմների ինստիտուտ > Հրապարակումներ

Վերջին անգամ ձևափոխված:

Apr 23, 2021

Մեր գրադարանում է սկսած:

Apr 23, 2021

Օբյեկտի բովանդակության հարվածների քանակ:

305

Օբյեկտի բոլոր հասանելի տարբերակները:

https://noad.sci.am/publication/149766

Ցույց տալ նկարագրությունը RDF ձևաչափով:

Ցույց տալ նկարագրությունը OAI-PMH ձևաչափով։

Հրատարակության անուն	Ամսաթիվ
Astsatryan Hrachya, Performance Optimization System for Hadoop and Spark Frameworks	Apr 23, 2021

Օբյեկտներ

Նման

A remote memory sharing system for virtualized computing infrastructures

Kocharyan Aram Ekane Brice Teabe Boris Son Tran Giang Astsatryan Hrachya Hagimont Daniel

article

Paving the way toward an environmental National Spatial Data Infrastructurein Armenia

Astsatryan Hrachya Asmaryan Shushanik Saghatelyan Armen Bigagli Lorenzo Mazzetti Paolo Nativi Stefano Guigoz Yaniss Pierre Lacroix Gregory Giuliani Nicolas Ray

Article

An Integrated Web-based Interactive Data Platform for Molecular Dynamics Simulations

Astsatryan Hrachya Narsisian Wahi Gyulgyulyan Eliza Baghdasaryan Vardan Poghosyan Armen Mamasakhlisov Yevgeni Wittenburg Peter

Article

Weather Data Visualization and Analytical Platform

Astsatryan Hrachya Grogoryan Hayk Gyulgyulyan Eliza Hakobyan Anush Kocharyan Aram Narsisian Wahi Sahakyan Vladimir Yuri Shoukourian Rita Abrahamyan Zarmandukht Petrosyan Julien Aligon

Article

Prise for “Achievements in the fields of Computer Science and Applied Mathematics”

Astsatryan Hrachya

Award

SaaS for Energy Efficient Utilization of HPC Resources of Linear Algebra Calculations

Astsatryan Hrachya Narsisian Wahi Georges da Costa

Article

Կոմպլեքս Հերմիտյան մատրիցների ֆակտորիզացիաներիարտադրողականությունները GPU արագագործչիճարտարապետությունում

Ասցատրյան Հրաչյա Գիչունց Էդիտա

Հոդված

Computational Astrophysics Towards Exascale Computing and Big Data

Astsatryan Hrachya Knyazyan Aram Mickaelian Areg

Conference

Ավելին

Կապ

Հասցե

ՀՀ ԳԱԱ Ինֆորմատիկայի և ավտոմատացման պրոբլեմների ինստիտուտ
1, Պ. Սևակ փող., 0014 Երևան

Հեռախոս

(+374) 10 282030

Էլ. փոստ

noadadmin@sci.am

Այցելեք մեզ։

http://iiap.sci.am/index.php?langcode=2

Այս էջը օգտագործում է 'cookie-ներ'։ Ավելի տեղեկատվություն