Object

Title: Using Apache Spark for Biological Data Processing

Co-author(s) :

Berberyan Levon

Abstract:

Apache Spark is an open source distributed general-purpose cluster-computing framework. It’s one of the most efficient technologies for processing massive amounts of distributed data. Spark provides DataSet and GraphX API-s which allow high level functions for manipulations of data distributed among the nodes of a cluster. It provides powerful optimization which considers the distributed nature of data. Currently there are sources of huge amounts of biological data which are publicly available for usage. One of such sources is UniProt providing a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. Uniprot provides its semantic data in RDF format with a SPARQL interface for querying among it. In current work we store UniProt’s datasets into our cluster’s distributed storage deployed in the cloud. We provide some basic functionality implemented in Spark using DataSet and GraphX API-s for Scala language to provide queries on the UniProt’s data.

Identifier:

oai:noad.sci.am:136224

Language:

English

URL:

click here to follow the link

Affiliation:

Institute for Informatics and Automation Problems of NAS RA

Country:

Armenia

Year:

2019

Time period:

September 23-27

Conference title:

CSIT Conference 2019

Place:

Yerevan, Armenia

Object collections:

Last modified:

May 3, 2021

In our library since:

May 3, 2021

Number of object content hits:

36

All available object's versions:

https://noad.sci.am/publication/149789

Show description in RDF format:

RDF

Show description in OAI-PMH format:

OAI-PMH

This page uses 'cookies'. More information