Sie sind auf Seite 1von 3

IAETSD Journal for Advanced Research in Applied Sciences, Volume 4, Issue 1, Jan-June /2017

ISSN (Online): 2394-8442

REST based API Engine for Exome-Seq Genomic Data Analysis


[1]
R Hamsini, [2]Dr N K Cauvery, [3] Santhosh Gandham
[1]
Dept. of Information Technology, RVCE, , [2]HOD Dept. of Information Technology RVCE,
[3]
Product Manager, InterpretOmics India Pvt. Ltd.
ramesh.hamsini@gmail.com

ABSTRACT
Genomics can be measured a discipline in genetics. The APIs also known as Genomics API's are a
collection of special protocols that assists developers in handling multiple genomics sources for building seamless, practical
applications resulting in the advancement of each genomic and clinical analysis. The paper provides APIs that can be
used by the client to avail the services provided by the Exome-Seq data analysis workflow. The APIs are responsible for
handling each request sent by the client respective for Exam-Sequence data analysis workflow.

KeywordsAPI; Exome-seq;REST API.

I. INTRODUCTION
Genomics has the potential to revolutionize biological and healthcare research, and solve some of the world's burning problems. However, many
challenges are involved in genomics data analysis. For example, by 2030, the world cancer burden is likely to double, and while genomics
research is expected to bring about an improved understanding of the disease, currently only about 15% of cancer research results are
reproducible. In order to solve such challenges, researchers across the globe need to collaborate, and require affordable, repeatable, and testable
powerful analysis pipelines.

The advent of genomics has lead to huge amounts of biological data. Each type of data produced has its own set of properties describing an
experiment's conditions and results. Biological data varies in its characteristics. Experiments are diverse having different types of data depending
on the technology from which they derived

Data deriving from biological laboratories differs greatly from that seen in medical records. Medical records maintain relatively small data in
comparison to that of biology. Medical information most often is discrete making it possible to split it into smaller parts for both transfer and
storage [1]. The data in medical records is simple often taking the form of low-tech text and images. Some types of biological data differ from
these aspects by its large size, having distinct, varied formats along with being complex and unpalatable. Biological data proves hard to manage.

An application may also be complex, integrating multiple data from multiple data sources. The goal of complex applications is to provide many
sources of data for the end-user to base their analysis. A combination of different data types integrated together provides multiple lines of
evidence in which to base their conclusions. In order to enable ease of usage to the end users, APIs are developed.

II. WHY APIS WITH GENETICS


Genomics data in large volume, its diversity for data sharing, and the complexity has lead to the creation of API-Application programming
Interface. This approach provides modular, secure and interoperable access for genomic data from different platforms, applications and
organizations. The APIs also known as Genomics API's are a collection of special protocols that assists developers in handling multiple
genomics sources for building seamless [2], practical applications resulting in the advancement of each genomic and clinical analysis.

API is a mechanism that allows one organization to share data or other resources with the public or a controlled list of individual users. Software
developers can make use of APIs to incorporate data into more complex applications. Users benefit from APIs by having access to third-party
applications created by the software developers, which provide data from multiple sources.

Thus, developers interact directly with the API to apply its rules within their applications; users, on the other hand, run the analysis using the
additional data that the developers extract using the APIs. So, in this way, APIs can be thought of as a tool that can promote reuse of the same
resources by different applications. Additionally, it is possible to have one common app accessing different applications through their respective
APIs to help integrate data from multiple sources. By facilitating data integration functions, APIs help to satisfy the need for data sharing in
research.

To Cite This Article: R Hamsini, Dr N K Cauvery and Santhosh Gandham. REST based API Engine for Exome-
Seq Genomic Data Analysis. Journal for Advanced Research in Applied Sciences ; Pages: 124-126
125. R Hamsini, Dr N K Cauvery and Santhosh Gandham,. REST based API Engine for Exome-Seq Genomic
Data Analysis. Journal for Advanced Research in Applied Sciences; Pages: 124-126

III. GENOMICS API


A Genomics API mostly works with some of complex information generated from high throughput experiments. One of the biggest challenges
in working with genomics data is the extreme variety of data and an even greater variety of file formats. Every organization, it seems, uses a
different constellation of file formats, with further subtle differences arising from the different technologies and platforms that generated the
data. As a result [3], there is no single universally usable tool to access all genomics data across organizations. Development of Genomics APIs
is an important milestone toward interoperability in the genomics world. By building a common framework to model the different entities in the
genomics and clinical world, the Genomics API can enable effective sharing and integration of data for the advancement of human health.

Though the data integration problem is well defined, the process of putting together a robust, well-defined Genomics API has been a
multifaceted challenge .Genomic data are extremely heterogeneous and come in myriad formats. Genomic data include information within gene
expression databases like NCBI GEO, EBI Array Express, as well as sequencing archives such Refuse and NCBI Sequence Archive. There are a
variety of formats and data types, associated with sequencing data. Currently, sequencing platforms output genomic data in FASTQ, SAM,
BAM, or VCF formats, as opposed to the traditional FASTA format alone [4]. Some databases only contain raw read information whereas others
can contain detailed information down to the variant level.

These data structure inconsistencies are growing along with the emergence of novel genomics technologies, thus making the process of data
integration more cumbersome over time. After data integration, another potential problem associated with genomic data sharing is
confidentiality. The genome itself can be considered personally identifiable information and simple anonymization techniques are insufficient.
Ethical considerations may require special precautions be put in place when dealing with genomic data sharing. This data privacy concern adds
an additional layer of complexity when designing a Genomics API to ensure adequate privacy is attained.

The API Engine for Exome-Seq workflow provides the standard methods like create, delete, search, update, etc. for accessing each of the
resources through standard HTTP request calls of POST, DELETE, GET, and PUT. Both aligned and unaligned reads can be imported into API
in BAM and FASTQ formats, respectively. Variant data can be imported directly from VCF files. The API extracts slices of genomic data (either
alignments or variants) and outputs them as a JSON dictionary to the user. The fields contained in the dictionary can be formatted according to
the requirements of the user.

IV. EXOME SEQ ANALYSIS


Exam is the expressing genes of a genome. The technique of sequencing all the exam or expression genes is known as Whole exam sequencing
(WXS or WES) or Exam sequencing. This approach identifies genetic variants that alter the sequences of protein which is provided at the lower
cost compared to whole genome sequencing [5]. Whole Exome-Seq is widely used as the target sequencing method making it most popular in
basic and clinical research.

The API engine is built on the exam- sequence analysis workflow. The APIs developed for exam- sequence application classifies, annotates and
performs population based on statistics of the variants. It is aimed at selective enrichment of exotic regions to identify novel, rare and common
variants that might impact gene structure and functions. The usage scenario of the Exome-Seq application is as follows:

1. The user log in using his/her credentials.


2. After successful login, the user is presented with the dashboard where he/she can select the previously created project or he/she can
create their own.
3. The user chooses the project to be runner or analyzed.
4. The user is presented with different stages that have to be runner and analyzed.
5. Once the analyze process in finished, he/she can visualize the results.

Security Risk

With the aim to limit the security chances and to agree to the security directions, the user data is not stored i.e. login or password data is not
stored in order to prevent non authorized clients to gain access to data. The user is given access to his/her account after user logs in using his/her
credentials on the client application. The credentials are sent to server through an application programming interface (API). After successful
authentication, the server responds with a security token which the client reuses for every subsequent request to the server through API. Every
request related to the Exome-Seq application is send to the server from a client via API call.

V. REST API
REST is an architectural style for building distributed systems based on hypermedia. A primary advantage of the REST model is that it is based
on open standards and does not bind the implementation of the model or the client applications that access it to any specific implementation.

An API follows the principle of REST. This approach minimizes the information to be known by the client i.e. the structure of the API. The
server provides the necessary information required by the client to interact for the service. An example of a service could be taken as a HTML
form: The location of the resource is specified by the server along with the required fields. The browser has no information as though what and
where to submit. Such information is provided by the server.
126. R Hamsini, Dr N K Cauvery and Santhosh Gandham,. REST based API Engine for Exome-Seq Genomic
Data Analysis. Journal for Advanced Research in Applied Sciences; Pages: 124-126

VI. API DESIGN AND DEVELOPEMENT


REST is used to build dynamic and unique URLs. The client application sends HTTP requests to receive JSON JavaScript Object Notation as
the requested response. JSON is chosen over XML, as a simplified data format and its better performance. To marshal and un marshal java
objects to JSON format and vice versa, Jackson JSON/Java library is used. The integration of the API layer with the backend data source, as well
as other services such as configuration, is done using the Play framework. Apache Maven 2 is used to define the libraries, modules, and API
dependencies. Continuous integration is used throughout the cycle along with unit tests. The overall project has been managed through an
agile/SCRUM process, with iterative, incremental development sprints.

VII. API OPERATIONS


Identification of resource by the Client is straightforward. Most of the time the client access the server in read only mode i.e. GET method and in
certain cases where data is very crucial, POST method is used. All request and response headers have a content type application/j son, which
means that complex queries and responses are in the form of JSON arrays or JSON objects.

The body of the HTTP request contains the data required to perform the operation. REST defines a stateless request model. HTTP requests
should be independent and may occur in any order, so attempting to retain transient state information between requests is not feasible. REST
model implements a finite state machine where a request transitions a resource from one well-defined non-transient state to another.

Example of a request by client application:

POST /authenticate
GET /list project
GET /get Sequencing Platform
POST /analyze exam- sequence
Example of a response JSON by the Server application:

{Object:
{
appname: Exome-Seq,
analyses ages:"["Quality Metrics, Reference Assembly, Variant Detection, Variant Annotation, Population
Genetics"]",
File type:"fastq",
pname:"RelatedExomeSeqTest"
}
}

The above JSON shows the response for the analyze module of exam- sequence Application. The data present in the JSON is extracted and the
client application proceeds with the further rendering of the application and request procedures.

VIII.CONCLUSION
This paper presents the advantages of using REST architecture for building a light weight scalable APIs for genomic data analysis applications.
The APIs were creating for the entire exam sequence analysis workflow which provided a fully functional set of services.

REFERENCES
[1] A Review on Genomics APIs, Rajeswari Swaminathan, Yungui Huang, Soheil Moosavinasab , Ronald Buckley,Christopher W.
Bartlett, Simon M. Lin, Computational and Structural Biotechnology Journal,Volume 14, 2016, Pages 815

[2] Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor, William McLaren,Bethan
Pritchard,Daniel Rios,Yuan Chen,Paul Flicek,Fiona Cunningham, Bioinformatics (2010) 26 (16): 2069-2070.

[3] The Ensembl REST API: Ensembl Data for Any Language, Andrew Yates,Kathryn Beal,Stephen Keenan,William McLaren,Miguel
Pignatelli,Graham R. S. Ritchie,Magali Ruffier,Kieron Taylor,Alessandro Vullo,Paul Flicek, Bioinformatics (2015) 31 (1): 143-145.

[4] The Materials Application Programming Interface (API): A simple, flexible and efficient API for materials data based on REpresentational
State Transfer (REST) principles, Shyue Ping Ong , Shreyas Cholia, Anubhav Jain , Miriam Brafman , Dan Gunter, Gerbrand Ceder , Kristin A.
Persson, Computational Materials Science Volume 97, 1 February 2015, Pages 209215

Das könnte Ihnen auch gefallen