Universidad Nacional de Chimborazo

NOVASINERGIA, 2018, Vol. 1, No. 2, junio-noviembre, (70-82)

ISSN: 2631-2654

https://doi.org/10.37135/unach.ns.001.02.08

Research Article

A Fog Assisted Cloud Paradigm for accessibility and collaboration to

Genomic Data Analysis

Fog Asiste Cloud Paradigma para la accesibilidad y colaboraci ´on al An ´alisis de

Datos Gen ´omicos

Paola G. Vinueza Naranjo

*, Navinkumar J Patil

DIET, Sapienza Universit `a di Roma, Roma (RM), Italy, 00184

Dipartimento di Fisica, Universit `a della Calabria, Rende (CS), Italy, 87036; navinkumar.patil@iit.it

* Correspondence: paola.vinueza@uniroma1.it

Recibido 30 octubre 2018; Aceptado 03 diciembre 2018; Publicado 10 diciembre 2018

Abstract:

Increasingly growing Next-generation sequencing requires large-scale computing

resources to handle the huge amount of data produced. The Cloud computing paradigm

readily handles huge data but the core issue with this paradigm is transfer of enormous

data to and from cloud computers due to limited bandwidth which lies in the centralized

nature of a Cloud computing architecture that is located far away from users. An

architecture where computing power is distributed more evenly throughout the network

is the way to combat this problem. The architecture should drive the processing capacity

towards the edge of the network, closer to the source of the data. For this propose

Fog computing offers a promising solution to move computational capabilities closer

to the data generated and will be the solution to gain traction in genomics research.

We propose a novel Collaborative-Fog (Co-Fog) model that adopts the Fog and Cloud

computing paradigms to manage huge genomic data sets and to enable understanding of

how key stakeholders can manage the interaction and collaboration. The present work

describes the Co-Fog model that promises increased performance, energy efﬁciency,

reduced latency, faster response time, scalability, and better localized accuracy for

future large-scale collaborations in genomics.

Keywords:

Big data, Distributed resource management, Cloud computing, Fog computing,

Next-generation sequencing (NGS)

Resumen:

La secuenciaci ´on de la pr ´oxima generaci ´on es cada vez m

as creciente y requiere

recursos inform ´aticos a gran escala para manejar la enorme cantidad de datos

producidos. El paradigma Cloud computing f ´acilmente maneja datos enormes, pero

el problema central con este paradigma es la transferencia de datos enormes hacia y

desde las computadoras en cloud debido al ancho de banda limitado que radica en

la naturaleza centralizada de la arquitectura Cloud computing la cual est ´a localizada

lejos de los usuarios. Una arquitectura donde la potencia de computaci ´on se distribuya

de manera m ´as uniforme en toda la red es una forma de combatir este problema. La

arquitectura debe llevar la capacidad de procesamiento hacia el borde de la red, m ´as

cerca de la fuente de los datos. Para esta propuesta, Fog computing ofrece una soluci ´on

prometedora para acercar las capacidades computacionales a los datos generados y

ser ´a la soluci ´on para ganar fuerza en la investigaci ´on gen ´omica. Proponemos un nuevo

modelo llamado Collaborative-Fog (Co-Fog) que adopta los paradigmas Fog y Cloud

computing para administrar grandes conjuntos de datos gen ´omicos y para permitir

la comprensi ´on de c ´omo las partes interesadas pueden gestionar la interacci ´on y la

colaboraci ´on. El presente trabajo describe el modelo Co-Fog que promete un mayor

rendimiento, eﬁciencia energ ´etica, menor latencia, tiempo de respuesta m ´as r ´apido,

escalabilidad y una mejor precisi ´on localizada para futuras colaboraciones a gran

escala en la gen ´omica.

Palabras clave:

Macrodatos, Administraci ´on de recursos distribuidos, Cloud paradigma, Fog

paradigma, Secuenciaciones de nueva generaci ´on (NGS)

http://novasinergia.unach.edu.ec

1 Introduction

Genomic medicine is an emerging medical

discipline that uses the human genomic data

of an individual as a part of their clinical care.

Besides healthcare, genomic data is used in various

domains, including research on genomic-wide

association studies, ancestry determination, legal

cases (e.g., paternity cases), and forensic and

criminal investigations. The potential of genomics

could revolutionize clinical care by providing

targeted diagnostics and treatment for patients

based on their genetic makeup, identify the genetic

predisposition for an individual to serious disease

and determine if the potential offspring may

develop rare genetic disease based on genomic data

of parents, etc. Thus the analysis of genomic data

and development and implementation of genomic

medicine based on genomic data analysis has

tremendous economic potentials, which indeed

explains the success of many enterprises such

as 23andMe, Color Genomics and some recent

services from Google Genomics, IBM Watson,

Microsoft Genomics, Amazon AWS Genomics and

Apple Research Kit.

Thus the recent advances in genomic research are

leading to a new era in medicine. In the next few

years, the use of genomic data in healthcare will

rapidly increase. In the future, decisions regarding

the prevention and treatment of diseases will

be increasingly based on an individual’s genetic

makeup. This major change in medicine requires

careful preparation. Even though the promise

of personalized diagnoses and treatment based

on genomic data analysis and genomic medicine

seems just around the corner, the study of genomics

is becoming a ﬁeld that is dominated by the growth

of data.

Below, we identify the useful collections of

publicly available next-generation sequencing

(NGS) datasets and how new advances in NGS

technologies are greatly expanding the current

volume and the range of existing data. The

Sequence Read Archive (SRA) is an international

public archival resource for NGS data established

under the guidance of the International Nucleotide

Sequence Database Collaboration (INSDC).

Instances of the SRA are operated by the National

Center for Biotechnology Information (NCBI),

the European Bioinformatics Institute (EBI)

and the DNA Data Bank of Japan (DDBJ). The

mission of INSDC is to preserve public-domain

sequencing data and to provide free, unrestricted

and permanent access to the data. By the

Figure 1: An overview of the publicly available data at the

Sequence Read Archive (SRA) based on user-submitted

metadata. The SRA’s growth rate is greatly increasing

over time. Half of the SRA submissions have been

generated by genomic library strategies, mostly whole

genome sequencing. The second half is composed of

library strategies from transcriptome, epigenomic, and

other applications. Figure adapted from (NCBI, 2018).

start of 2012, approximately 75 000 genomic,

15 000 transcriptome and 15 000 epigenomic

submissions had been contributed to the SRA

(ﬁgure 1). However, that volume of data represents

only the tip of the iceberg as the number of

genomic submissions has been steadily increasing,

particularly in recent years.

Huge amounts of genomic data requires high

performance computing and data storage

infrastructure that is often beyond the capabilities

of single institution. For this reason, Cloud

computing is becoming the preferred solution for

medical research centers and healthcare providers

to efﬁciently deal with the increasing amount of

genomic data in ﬂexible and cost-effective way.

Cloud computing facilitates the storage and

management of large amount of data, acts as a ﬁnal

destination for heavy-weight processing, long-term

storage and analysis. Cloud computing eliminates

the expenses of computerization and framework

support and is ﬂexible and cost-effective approach

to genomic data management. Cloud computing

service providers offer services that provide

the infrastructure, software, and programming

platforms to clients, and are accountable for the

cost for development and maintenance. Challenges

of using Cloud computing for genomic data include

lengthy data transfers for uploading data to the

cloud server, the perceived lack of information

safety in Cloud computing, and the requirement for

http://novasinergia.unach.edu.ec 71

developers with advanced programming skills.

Also, since the number of devices connecting to

the cloud is growing, there is undue pressure on the

cloud infrastructure. Due to the loosely controlled

and non-homogeneous nature of the internet there

are several other issues related to Cloud computing

that are still unresolved. One such issue is network

delay or lag between client request and cloud

response for real time applications. This can be

explained by fact that data centers are often located

far away from major cities and population areas.

This physical distance between data centers and

end users has an impact on latency which could

affect the users. This delay can be a major issue

for applications which rely heavily on storage,

streaming data and ofﬂine processing.

Fog computing is a natural extension of Cloud

computing and is foreseen as a remedy to eliminate

such issues. Cloud and Fog computing share

overlapping features, but Fog computing has

additional attributes such as location awareness,

enhanced mobility features, edge deployment,

support for real-time processing (Botta et al.,

2016). In contrast with centralized clouds,

fog nodes are geographically distributed and

deployed in large numbers near wireless access

points in areas which sustain the heaviest usage,

thus having a close proximity to end-users and

offers a mobile, low latency, latency-sensitive

analytics for mission critical requirements and

real-time interaction. Rather than a substitute,

Fog computing often serves as a complement to

Cloud computing (Baccarelli et al., 2018). The

concepts of cloud and Fog computing can be

integrated into a single platform to achieve the

best of both worlds: reduced latency, geographic

awareness, improved data streaming, and access

to commodity resource pools (Madsen et al.,

2013). In table 1, we made a general overview

of the existing frameworks for the Big Data

(BD), Data Centers (DC) and Internet of Things

(IoT) environments and highlight their advantages

and disadvantages. Cloud computing powerful

technology to perform massive-scale/complex

computing and Fog computing offers a promising

solution to move computational capabilities closer

to the data generated are the solution that is gaining

traction in genomics research. Fog computing

technology can provide a way for research to

enhance their capability to store and share data,

save time and reduce costs of data sharing.

This paper presents an idea of fog assisted cloud

paradigm for genomic data analysis. In the next

sections we introduce cloud and fog paradigms,

motivation and application of fog assisted cloud

paradigm to genomic data analysis and the

Collaborative-Fog (Co-Fog) paradigm. We ﬁnally

conclude with conclusions and future research

opportunities.

2 Cloud and Fog Computing

Paradigm

Figure 2: Elasticity of Fog computing is its ability to

expand or contract its dedicated resources to meet the

current demand. Elasticity is one of the feature of Fog

associated with scale-out solutions (horizontal scaling),

which allows for resources to be dynamically added

or removed when needed. In virtualized environments

Fog elasticity could include the ability to dynamically

deploy new virtual machines or shutdown inactive virtual

machines. The workload vs. time plot shows the elasticity

of Fog computing model, where it has ability to add and

remove resources “on the ﬂy” to handle the load variation.

Cloud and Fog computing are two standing-alone

technological paradigms under the real of the Future

Internet. The National Institute for Standards and

Technology (NIST) (NIST, 2018a) formally deﬁnes

the Cloud computing paradigm as a model for

enabling convenient, on-demand network access to

a shared pool of conﬁgurable computing resources

(e.g., networks, servers, storage, applications,

and services) that can be rapidly provisioned

and released with minimal management effort or

service provider interaction. Cloud computing

provides the tools and technologies to build

data/compute intensive parallel applications

with much more affordable prices compared

to traditional parallel computing techniques.

http://novasinergia.unach.edu.ec 72

Table 1: Overview of BD/DC/IoT and their advantages and disadvantages in Fog Computing supports

Fog Computing Support (BD + DC + IoT)

Frameworks

Pros Cons

(Bonomi, 2011;

Bonomi et al.,

2012, 2014)

(Kai et al., 2016)

(Saharan & Kumar,

2015)

– BD in Fog is distinguished by volume, velocity,

variety and geo-distribution

– Applications data are passed in several layers

– Eliminate delays in data transfer

– Allows to keep data close to users instead of storing

them in far DCs

– Reduces the time-scale in real-time

Deﬁnes policies to speciﬁc security,

isolation and privacy during multi-tenancy

+DC

(Bonomi, 2011;

Bonomi et al.,

2012, 2014)

(Saharan & Kumar,

2015)

– Degree of consistency between collection points

– Policies to deﬁned network, storage,

compute with a service such as

minimum delay rate etc.

+IoT

(Bonomi, 2011;

Bonomi et al.,

2012, 2014)

(Firdhous et al.,

2014)

(Byers &

Wetterwald, 2015)

(Saharan & Kumar,

2015)

(Luan et al., 2015)

(Kai et al., 2016)

– Permits ordinary physical or daily life object

connections

– Fast mobile apps

– Improves the QoS through local fast-rate connections

– Distributed intelligence

X Scalability

X Network resource preservation

X Close loop control

X Resilience

X Clustering

Not clear deﬁnitions of policies to specify

thresholds for load balancing such as

minimum number of users, connections,

CPU load etc, and policies to specify QoS

requirements.

Technologies

Network

Function

Virtualization

(NFV)

Software

Deﬁned

Networking

(SDN)

(Luan et al., 2015)

7 7 7

Cloud computing offers higher-level services,

unlike local server or a personal computer that

can be rapidly provisioned and released with

minimal management effort and relies on the

self-establishment and self-management in order

to guarantee scalability to large scale (Mell &

Grance, 2011; Armbrust et al., 2010; Fox et al.,

2009; Buyya et al., 2009). Cloud providers deliver

to the users mainly three types of service models:

i) Infrastructure as a Service (IaaS): IaaS providers

offer to the users a pool of computing storage and

network resources; ii) Platform as a Service (PaaS):

PaaS providers enable users to access a platform to

develop and deploy software, and; iii) Software as a

Service (SaaS): SaaS users access software running

on servers. Cloud computing facilitates the storage

and management of large amount of data and can

serve as a possible instrument of surveillance. The

clouds act as the ﬁnal destination for heavy-weight

processing, long-term storage and analysis.

NIST in March, 2018 released a deﬁnition of

the Fog computing by adopting much of Cisco’s

commercial terminology as published in NIST

special publication 500-325 (NIST, 2018b).

http://novasinergia.unach.edu.ec 73

Fog computing conceptual model deﬁnes Fog

computing as a horizontal, physical or virtual

resource paradigm that resides between end devices

and Cloud computing data centers. This paradigm

supports vertically-isolated, latency-sensitive

applications by providing ubiquitous, scalable,

layered, and federated distribution of the

communication, computation, and storage

resources. Precisely, fog nodes (physical (e.g.

gateways, switches, routers, servers, etc.) or

virtual components (e.g. virtualized switches,

virtual machines, cloudlets, etc.)) are small-size

virtualized inter-connected resource-equipped data

centers, which are hosted by wireless access points

at the edge of the network, in order to build up a

two-tier FOG-CLOUD hierarchical architecture.

Fog computing natively supports inter-Fog resource

pooling. Furthermore, Fog computing handles data

at the network level, on smart devices and on the

end-user client side, instead of sending data to

a remote location for processing (Bonomi et al.,

2012; Horne, 2018). In Fog computing, cloud

elastic resources are extended to the edge of the

network, such as portable devices, smart objects,

wireless sensors and other IoT devices to decrease

latency and network congestion (Tang et al., 2017).

In virtualized environments, Fog elasticity (see

ﬁgure 2) could include the ability to dynamically

deploy new virtual machines or shutdown inactive

virtual machines.

Next-generation sequencing is growing

in number, and these data requires using

large-scale computational resources. Cloud

computing’s powerful technology to perform

massive-scale/complex computing coupled with

the Fog computing’s promising solution to move

computational capabilities closer to the source of

data generation thereby enhancing the capability to

store and share data, save time and reduce costs of

data sharing could provide an exceptional solution

in the ﬁeld of genomics research.

3 Motivation and Application to

Genomics

Cloud and Fog computing are considered by

investigators to manage and share the vast amounts

of genomic data generated following NGS. It was

recognized that data storage/management is a

growing problem which will require Cloud and Fog

technology (Charlebois et al., 2016).

For science users, Fog computing is having two

main advantages: reproducibility and local/global

access. Fog computing exhibits the right

features for coping with the aforementioned

technological issues. The Fog computing

architecture consisting of three components,

namely: IoT nodes, fog nodes and back-end

cloud, where fog nodes are small-size virtualized

inter-connected resource-equipped data centers, that

process tasks without third-party interference and

collaboratively provide computational ﬂexibility,

better communication, storage capacity, and much

more additional new smart services in a hierarchical

environment for rising number of end users in

its close proximity. In addition, Fog computing

natively supports three main services: 1) user

virtualization; 2) User-to-Fog task ofﬂoading, and;

3) inter-Fog resource pooling. These services

could be efﬁciently exploited, to implement the

Co-Fog collaboration network as an overlay

network of user clones, that entirely relies on the

bandwidth/computing resources of the supporting

fog nodes. The Cloud computing addressed many

problems posed by data archives. But the Fog

elasticity allows users to scale computing resources

in proportion to the amount of data being analyzed,

sidestepping constraints imposed by fog nodes.

Input data can be downloaded directly to the fog

nodes that will process it, where the ﬁeld has

produced an array of uniformly processed and

summarized data-sets.

So doing, the native resources of the physical things

could be employed only for the synchronization

with the corresponding Fog-hosted clones, allowing

to the users, perhaps on opposite ends of the globe,

to create near-identical hardware and software

setups (see ﬁgure 3), where the reproducibility

advantages are possible on non-cloud computers

using Virtual Machine (VM)-based technology

and the (emerging) container based technology,

which are tools that package software with all

the necessary components to enable reproducible

deployment in different computing environments.

This is indeed, the main idea behind the proposed

Co-Fog paradigm. Also, Fog providers maintain

data centers in such a way that achieves economies

of scale and Fog users need not be concerned

with outages, software patches, service contracts

or damaged parts. Passing to describe ﬁgure

3 that sketches the reference architecture of the

Big Data (BD) technological platform. It is

composed of ﬁve blocks, which are: 1) the

IoT (data generation) layer; 2) the radio access

network; 3) the proximate Fog layer; 4) the

http://novasinergia.unach.edu.ec 74

Figure 3: Reference architecture for the Big Data

scenario.

Internet backbone, and; 5) the remote Cloud layer.

So, according to the reported architecture, BD

streams are: i) gathered by a number of spatially

heterogeneous devices (users) scattered over the

environment of interest; ii) forwarded to proximate

fog nodes over (single-hop) WiFi connections for

local pre-processing (like, data compressing, fusion

and ﬁltering), and; iii) routed to cloud-based remote

data centers over (multi-hop) internet WANs for

further post-processing.

The goal is to allow users to share information

in real-time by leveraging Fog-assisted social

network platforms (like Globus Genomic, Dropbox,

iCloud). These applications require massive sets of

inter-stream cross-correlation analytics, to quickly

detect the occurrence of new social trends and/or

anomalies.

Fog services is comprised by three deployment

models (public, private or hybrid) that vary

depending on the extent to which they are freely

accessible (Dove et al., 2015; Forer et al., 2012;

Ruiter & Warnier, 2011; Shanker, 2012). The

decisions over the type of services and deployment

models inﬂuence the form Fog computing adoption

and reﬂect how varying ethical, legal and social

challenges are managed.

3.1 An Introductory Example

The deﬁnition of cloud and fog computing applied

to the ﬁeld of genomics research is ”a scalable

service where genetic sequence information

is stored and processed virtually, usually via

networked, large-scale data centers accessible

remotely through various users and platforms over

the internet” (Dove et al., 2015).

The following illustrative example (see ﬁgure

4) aims at giving some ﬁrst insight into the

Fog-over-Users interplay and the roles played

by the Computer-to-Computer (C2C) interaction

model, in order to establish collaboration, links,

and ties.

In the ﬁgure 4 we show the main building

blocks and involved users, where the fog fosters

reproducibility by enabling investigators (users)

to publish data-sets (resp., User A (Data A) and

User B (Data B)) in ﬁgure 4) to the fog/cloud,

including different versions thereof, without loss or

modiﬁcation of the previous data-sets. Moreover,

the users can be situated near or far geographically

(resp., User A (Data A), User B (Data B), User C

(Data C), and User D (Data D) in ﬁgure 4) and can

clone data-sets within the fog applied customized

software to perform their own analyses and derive

new results. Independent investigators can copy

original/primary data-sets, softwares and published

results within the fog/cloud to replicate published

analyses, can be on opposite ends of the globe, to

create near-identical hardware and software setups.

The Horizontal trafﬁc Ofﬂoading (HO) capability

offered by the nodes allows the implementation

of the inter-clone Co-Fog network by sustaining

the required C2C interactions (see the blue

paths of ﬁgure 4), while the corresponding

Vertical trafﬁc Ofﬂoading (VO) capability makes

feasible User-to-Clone and Clone-to-Cloud

synchronization (see the red paths of ﬁgure 4). The

fog nodes fully support data mining, Clone-to-User

communication, and inter-clone communication

(see the green paths of ﬁgure 4). So, it is expected

that the native resources of the physical things are

saved. The Fog is also accessible globally, so that,

the investigator anywhere of the world can rent

resources from a provider, regardless of whether

the investigator is near a data center. Data can

be secured and controlled by the collaborators

without having to navigate by several institutions’

ﬁrewalls. The team members of fog can use

the same commands to run the same analysis on

the same (virtualized) hardware and software.

This makes the Co-Fog an attractive venue for

small/large genomics interaction/collaborations

and also an important tool in the effort to promote

robust sharing of genomics data (NIH, 2014; cli,

2017) (see ﬁgures 3 and 4).

Moving on to give the complexity of genomics

studies and the need to enroll patients in

geographically dispersed study sites, collaboration

on large-scale genomics sequencing projects at

multiples sites is fairly common. Before the

http://novasinergia.unach.edu.ec 75

data generation

Fog Node

Single- Hop Wireless Access Nertwork

Fog Node

Collaboration

Cloud Node

HO /

Collaboration

Data

Collaboration

Fog-over-Users Collaboration

/Fog-over-Users

C2C link

User A

User B

Figure 4: An illustrative example of the Fog-over-Users, Fc=Fog clone; VO= Vertical Ofﬂoading; HO= Horizontal

Ofﬂoading; C2C=Computer-to-Computer.

computational analyses begin, all relevant data is

collected at whichever site that has the requisite

computing capacity and experience required. If

more than one site is to analyse the complete

data-set, the data must be copied. The larger and

the more decentralized the project, the more copies

must be made. The collaborators at the various sites

can use computers located near the data. Overall,

the fog elasticity allows investigators to scale

computing resources in proportion to the amount of

data being analysed, avoiding restrictions imposed

by local clusters. Input data can be downloaded

directly to the fog nodes that will process it, without

ﬁrst going through a particular investigator s

cluster.

In some cases, the data may already be preloaded

into a fog (e.g., the International Cancer Genome

Consortium (ICGC) (ICGC, 2018) data are

available from the Cancer Genome Collaboratory).

If data are protected (e.g., dbGaP), it is possible

that existing protocols will make it is possible

to create a compliant fog-based computational

setup (Nellore et al., 2016; Langmead & Nellore,

2018). The commands used to rent the cluster

and run the software can be published or shared

so that collaborators can do the same, avoiding

inter-cluster compatibility issues. In the future, a

series of studies can apply Co-Fog architecture to

study large collections of publicly archived data.

4 The Proposed Co-Fog Paradigm

The introduction of the model for distributed

collaboration ties in the data realm modiﬁes the way

in which users and things utilize the C2C interaction

model in order to establish collaboration links ties.

By design, under the C2C model, the users’ limits to

set general rules, in order to deﬁne the community

of collaborating smart devices. The services to be

provided by the computers community and smart

devices are both the producers and consumers of

data and information.

4.1 Fog Interfaces with Cloud, other

Fog Nodes, and Users

As aforementioned, Fog computing expands the

Cloud computing functionality with more elasticity

to the edge level of the core network to share same

processing strategies and features (virtualization)

and makes extendable nontrivial computation

services.

The Fog computing architecture allows processing,

networking, and storage services to dynamically

http://novasinergia.unach.edu.ec 76

data generation

Fog Node

CNT

Data

Internet Core

Cl2Cl link

U2F/F2U two-way link

F2F link

Inter-clone

Virtual Transport

Network

Co-Fog platform

Inter-fog physical

Wireless Backbone

CNT

Data

CNT

Data

CNT

Data

CNT

Data

Remote Cloud

Figure 5: Envisioned architecture for the Co-Fog technological platform of the distributed collaboration, Fc-A1, Fc-A2 and

Fc-A3 (resp., Fc-B1 and Fc-B2) clones; CNT=Container; F2U= Fog-to-User; U2F= User-to-Fog; F2F= Fog-to-Fog.

transfer the data at the fog node, cloud, and

users continuum. However, the interfaces for the

fog to interact with the cloud, other fogs, and

users, must facilitate the ﬂexibility and dynamic

relocation of the computing, storage, and control

functions among these different entities. This will

enable well-situated end-user assessment for Fog

computing services and will also allow capable and

effectual Quality of Service management (Anawar

et al., 2018).

Roughly speaking, the proposed integrated Co-Fog

architecture is based on the following main

building blocks (see ﬁgure 5): Fog-to-Cloud:

Interfacing fog-to-cloud can be considered

compulsory to support fog-to-cloud and vice-versa

collaboration which provides back-to-back

services. Fog-to-cloud interface also supports

functionalities, such as: i) functions at fog to be

supervised/managed with the Cloud computing

ability; ii) cloud and fog which transfer data to

each other for processing and comparing; iii)

cloud which can decide to distribute/schedule fog

nodes for allocation of the services on demand; iv)

cloud and fog mutually can differentiate for better

management computing services with each other,

and; v) cloud which can make the availability of

its services through fog-to-users. It is essential to

ﬁnd out which information and services should be

transversely passed at fog and cloud. Regularity

and granularity of such data and information should

decide how fog or cloud can respond to that data.

In the ﬁgure 5, we observe Fog-to-Cloud

(multi-hop) Internet connections, allowing fog

clones to import/export data from/to the remote

network. Fog-to-Fog: Fog nodes have pool

resources functionality to support processing with

each other. For example, all deployed fog nodes

share their data storage, computing, and processing

capability tasks with prioritized node functionality

system for one or several users. Multiple fog nodes

might also act together with service for backups of

each other.

In the ﬁgure 5, we can see the inter-Fog backbone

that provides inter-Fog connectivity and makes

feasible inter-Fog resource pooling. Inter-Fog

allows the hosted clones to exchange data by

http://novasinergia.unach.edu.ec 77

establishing C2C collaboration over an inter-clone

overlay network relying on TCP/IP end-to-end

transport connections and a set of inter-connected

fog nodes (e.g., Fog Node A and Fog Node B), that

act as virtualized cluster headers. These virtualized

cluster headers host the clones (e.g., Fc A and Fc B)

of the involved physical devices (e.g., Users (A1,

A2 and A3) and Users (B1 and B2)).

The clones exploit the Fog support, in order to

augment the computing-communication capabilities

of the associated devices (e.g., Users (A1, A2 and

A3) and Users (B1 and B2)). A commodity

wired Giga-Ethernet switch provides intra-Fog

connectivity.

Fog-to-User: Fog-to-User interface essentially

needs to allow the users to access fog services

in user friendly environment, provide resources

efﬁciently with secure ways.

Figure 5 explains the interfaces of fog with cloud

and users both, through hierarchically distributed

Fog computing structure iteratively continuum.

In ﬁgure 5 we can see an emerging era of technology

world from traditional Cloud computing towards

nearly deployed Fog computing. It is also

visualized which type of interface should be

included in different type of era (e.g., fog-to-cloud,

fog-to-fog, and fog-to-users). Also, it is pointed out

that why we have a single and combined platform

(Fog Computing) of these essential technologies.

According to ﬁgure 5, a fog node covers a spatial

area Da (m) and serves a cluster of users. The

fog node can comprise of number of homogeneous

quad-core Dell Power Edge-type physical servers,

which are equipped with 3.06 GHz Intel Xeon CPU

and 8 GB of RAM as an example. Each server

may host the maximum number of Docker-type

containers (Bernstein, 2014), the size of a container

is usually within tens of MB (Zhang et al., 2018).

Each container clones a thing (e.g., a user) and,

according to ﬁgure 6, it is equipped with a virtual

processor with a number of homogeneous virtual

cores. Each thing (user) is associated to a software

clone (e.g., a virtual avatar), that is hosted by the

serving fog node.

In the ﬁgure 5, the wireless access network,

that supports F2U/U2F communication through

TCP/IP connections running atop IEEE802.11/15

single-hop links.

A Virtualization layer: This layer allows each thing

(user) to augment its limited resources by exploiting

the computing capability of a corresponding virtual

clone. The virtual clone runs atop a physical server

of the fog node that currently serves the cloned

user.

User-to-Clone: User-to-Clone connections

allow the physical devices to synchronize the

corresponding clones by exploiting the support of

single-hop wireless access links.

The service models supported by the Co-Fog

platform with the following two main remarks.

First, since the fog nodes of ﬁgure 5 may play the

two-fold role of ofﬂoading and aggregating points

for the trafﬁc generated by the underlying users,

the Co-Fog paradigm is capable to support, by

design, all the Up/Down Ofﬂoading, Aggregation

and Peer-to-Peer (P2P) service models. Second, we

stress that a main peculiar feature of the proposed

Co-Fog paradigm is that the overlay network of

ﬁgure 5 allows to move the implementation of the

inter-user links from the device-based physical

bottom layer to the clone-based virtual upper layer

of ﬁgure 5.

The clone synchronizes the corresponding thing and

works on behalf of it, in order to reduce thing energy

consumption and provide bandwidth/computing

thing-augmentation. U2F and F2U communications

are guaranteed by transport (TCP/UDP-IP)

connections that run atop IEEE802.11/15 up/down

single-hop links (see the red rays of ﬁgure

5). A (possibly, wireless) broadband backbone

interconnects all the fog nodes (see the blue

rays of ﬁgure 5). Its role is to allow inter-clone

communication among different fog nodes. Thing

clones are logically inter-connected by end-to-end

transport (TCP/IP) connections, to build up an

overlay virtual network of inter-clones. For this

purpose, intra-Fog wired Ethernet links (resp.,

inter-Fog backbone-supported wireless links) are

used to instantiate transport-layer connections

among clones hosted by a same fog node (resp.,

by different fog nodes). The corresponding clones

establish a (bidirectional) collaboration. Afterward,

the clones exchange the data provided by their

associated data generation and, then, perform the

required (cumbersome) mining of all available data.

For this purpose, the computing capability of the

hosting fog nodes is exploited. Finally, each clone

returns the mined data to its owner (user) by using

the communication capability of the host fog node.

In data generation layer each site has some

computational resources and generate data.

Analysis that require the full data sets are to be

performed at multiple sites, requiring each of these

sites to gather all portions of the data. As more fog

nodes join to analysis, more copies must be made.

Multiples fog nodes can organize themselves into

a federated fog, where each analysis of the data

set is automatically coordinated to minimize data

http://novasinergia.unach.edu.ec 78

transfer. The computers located where the data is

generated are used to analyse that subset. Fog nodes

can also consolidate their data in a cloud-based data

center where the analyses are performed.

4.2 Co-Fog in Genomics

Co-Fog will be a natural ﬁt for sharing across

borders; data may be housed in the originating

jurisdiction, where control is maintained,

while authorized access is available outside

the jurisdiction through a common interface.

Many collaborations already use the cloud to

consolidate project data. ENCODE (ENCODE,

2018) uses the DNAnexus (DNAnexus, 2018)

platform for cloud-based analysis and data sharing,

and DNAnexus in turn uses the infrastructure

of AWS (Genomics, 2018) modENCODE

(modENCODE, 2018) and ICGC (ICGC, 2018)

where both host their data sets in the cloud through

AWS.

With Co-Fog the collaboration can also be

implemented to the above given example which

solely using cloud. By using Co-Fog platform

which encourages another dimension of strength

borrowing, it will be easier to leverage public data.

The investigators can use public data to boost the

power available to analyse a locally generated data

set, a paradigm that can prevails in microarray

data analysis. We expect Co-Fog can be used for

running large-scale analysis across multiple fogs,

even to the point that new sequencing data analyses

can be performed in the Fog and with the beneﬁt

of being able to see across many studies with

important variables in common.

As an example, RNASeq-er enables investigators

who have submitted unpublished sequencing data to

the ArrayExpress archive to automatically analyse

that data, free of charge, using computational

resources at the EMBL-EBI in the context of

other public data in the archive. When the study

is published and the data become public, the

summarized results can be joined with those of

the other published, archived studies. Thus, the

investigator beneﬁts before publication and the

community beneﬁt after (Langmead & Nellore,

2018).

4.3 The Container-based

Virtualization Technology

Virtualization is employed in Fog-based data

centers, in order to (Baccarelli et al., 2017):

• dynamically multiplex the available physical

computing, storage and networking resources

over the spectrum of the served devices;

• provide homogeneous user interface atop

(possibly) heterogeneous served devices; and,

• isolate the applications running atop the

same physical servers, in order to provide

trustworthiness.

Figure 6: Container-based virtualization of a physical

server equipping a Fog node (Virtualized server

architecture); VP=Virtual Processor.

Roughly speaking, in virtualized data centers,

each served physical device is mapped into a

virtual clone that acts as a virtual processor and

executes the programs on behalf of the cloned

device (Portnoy, 2012). In principle, two main

virtualization technologies could be used to attain

device virtualization, namely, the (more traditional)

Virtual Machine (VM)-based technology and the

Container based technology (Bernstein, 2014;

Soltesz et al., 2007).

According to ﬁgure 6, a virtualized physical server

running at a fog node is composed of: i) containers

where each container plays the role of virtual clone

for the associated physical thing. Hence, the

container acts as a virtual processor and executes

the tasks ofﬂoaded by the thing on behalf of it.

For this purpose, it is equipped with a Virtual

Processor (VP), that runs at a (scalable) processing

frequency f (bit/s) and it is controlled by a Task

Manager. The VP executes the programs stored

by the corresponding Application Library (see

ﬁgure 6), all the application libraries stored by the

instantiated containers must be compliant with the

Host Operating System (HOS) equipping the host

physical server; ii) the pool of computing (e.g., CPU

cycles) and networking (e.g., I/O bandwidth) and

physical resources made available by the CPU and

Network Interface Card (NIC) that equip the host

server.

http://novasinergia.unach.edu.ec 79

Container Engine dynamically multiplexes the

resources of the fog server over the set of hosted

containers, and; iii) a HOS which is shared by all

hosted containers.(see ﬁgure 6)(Baccarelli et al.,

2017).

Due to the expected large number of devices (users)

to be virtualized, resorting to the container-based

virtualization would allow to increase the number of

virtual clones per physical server (e.g., the so-called

virtualization density) (Xu et al., 2014).

5 Conclusion and Future Work

Cloud computing is changing how large

computational resources are organized and

acquired. It is also changing how scientists and

researchers in genomics collaborate and deal with

vast archived data sets. However, Fog computing

provides certain advantages over Cloud computing

like faster data processing with reduced latency,

location-based customization, etc. However, Fog

computing is not a replacement for cloud computing

as cloud computing will still be desirable for high

end data storage, analysis and processing jobs in

scientiﬁc ﬁelds like genomics.

We adopt Fog computing as a necessary tool for

advance genomic research to collaborate and deal

with vast sets of archived data, so that the genomic

investigators would be able to do immediate

collaborations.

As more archived data will be housed in the fog,

where the Fog computers which are physically

proximate to the generated data can access it

rapidly and will allow new modes of analysis,

interaction and collaboration.

Co-Fog is the ways to address the crucial issues that

arise as the scientists increasingly use large-scale

genomics data to improve our understanding of

biology and disease. Also, considering the genomic

data to be user sensitive, there will be a need for

genomics investigators to understand the fog and

also remain responsible for their data stored in the

fog.

The proposed (Co-Fog) model has advantages of

higher speed, greater accessibility and collaboration

over Cloud computing which makes it a preferred

model for genomic data analysis, however,

Co-Fog can be applied in several other research

ﬁeld involving Big data storage and analysis.

Since, Co-Fog relies on the distributed networked

computing architecture, its innovative solutions

are expected to successfully tackle the issues in

distributed security, in order to allow the migration

of the Co-Fog paradigm from the theory to the

practice.

Interest Conﬂict

The authors declare that there is no conﬂict of

interests regarding the publication of this paper.

References

(2017). Sharing clinical and genomic data on cancer —

the need for global solutions. New England Journal of

Medicine, 376(21), 2006–2009. PMID: 28538124.

Anawar, M. R., Wang, S., Azam Zia, M., Jadoon, A. K.,

Akram, U., & Raza, S. (2018). Fog computing:

An overview of big iot data analytics. Wireless

Communications and Mobile Computing, 2018.

Armbrust, M., Fox, A., Grifﬁth, R., Joseph, A. D., Katz,

R., Konwinski, A., Lee, G., Patterson, D., Rabkin,

A., & Stoica, I. (2010). A view of cloud computing.

Communications of the ACM, 53(4), 50–58.

Baccarelli, E., Naranjo, P. G. V., Scarpiniti, M., Shojafar,

M., & Abawajy, J. H. (2017). Fog of everything:

Energy-efﬁcient networked computing architectures,

research challenges, and a case study. IEEE access,

5, 9882–9910.

Baccarelli, E., Scarpiniti, M., Naranjo, P. G. V., &

Vaca-Cardenas, L. (2018). Fog of social iot: When

the fog becomes social. IEEE Network, 32(4), 68–80.

Bernstein, D. (2014). Containers and cloud: From lxc to

docker to kubernetes. IEEE Cloud Computing, 1(3),

81–84.

Bonomi, F. Connected vehicles, the internet of things,

and fog computing. In The Eighth ACM International

Workshop on Vehicular Inter-Networking (VANET),

Las Vegas, USA, pp. 13–15.

Bonomi, F., Milito, R., Natarajan, P., & Zhu, J. (2014).

Fog computing: A platform for internet of things

and analytics. In Big Data and Internet of Things:

A Roadmap for Smart Environments, pp. 169–186.

Springer.

Bonomi, F., Milito, R., Zhu, J., & Addepalli, S. Fog

computing and its role in the internet of things. In

Proceedings of the ﬁrst edition of the MCC workshop

on Mobile cloud computing, pp. 13–16. ACM.

Botta, A., De Donato, W., Persico, V., & Pescap

e, A.

(2016). Integration of cloud computing and internet

of things: a survey. Future Generation Computer

Systems, 56, 684–700.

http://novasinergia.unach.edu.ec 80

Buyya, R., Yeo, C. S., Venugopal, S., Broberg, J., &

Brandic, I. (2009). Cloud computing and emerging

it platforms: Vision, hype, and reality for delivering

computing as the 5th utility. Future Generation

computer systems, 25(6), 599–616.

Byers, C. C. & Wetterwald, P. (2015). Fog computing

distributing data and intelligence for resiliency and

scale necessary for IoT: The internet of things

(ubiquity symposium). Ubiquity, 2015(November), 4.

Charlebois, K., Palmour, N., & Knoppers, B. M. (2016).

The adoption of cloud computing in the ﬁeld of

genomics research: the inﬂuence of ethical and legal

issues. PloS one, 11(10), e0164347.

DNAnexus (2018). A Global Network for Genomics.

Retrieved from https://www.dnanexus.com/

company.

Dove, E. S., Joly, Y., Tass

e, A.-M., in Genomics, P.

P. P., Committee, S. P. I. S., Burton, P., Chisholm, R.,

Fortier, I., Goodwin, P., & Harris, J. (2015). Genomic

cloud computing: legal and ethical points to consider.

European Journal of Human Genetics, 23(10), 1271.

ENCODE (2018). Encyclopedia of DNA Elements.

Retrieved from https://www.encodeproject.org/.

Firdhous, M., Ghazali, O., & Hassan, S. Fog computing:

Will it be the future of cloud computing? In

The Third International Conference on Informatics &

Applications (ICIA2014).

Forer, L., Sch

onherr, S., Weißensteiner, H., Specht,

G., Kronenberg, F., & Kloss-Brandst

atter, A.

(2012). Cloud computing computational medicine.

Computational Medicine.

Fox, A., Grifﬁth, R., Joseph, A., Katz, R., Konwinski,

A., Lee, G., Patterson, D., Rabkin, A., & Stoica,

I. (2009). Above the clouds: A berkeley view of

cloud computing. Dept. Electrical Eng. and Comput.

Sciences, University of California, Berkeley, Rep.

UCB/EECS, 28(13), 2009.

Genomics, E. (2018). Data driven innovation. Retrieved

from https://www.eaglegenomics.com/tag/aws/.

Horne, J. (2018). Social implications of big data and fog

computing. International Journal of Fog Computing

(IJFC), 1(2), 1–50.

ICGC (2018). International Cancer

Genome Consortium. Retrieved from

https://www.nhlbi.nih.gov/science/

trans-omics-precision-medicine-topmed-program.

Kai, K., Cong, W., & Tao, L. (2016). Fog computing for

vehicular ad-hoc networks: paradigms, scenarios, and

issues. The Journal of China Universities of Posts and

Telecommunications, 23(2), 56–96.

Langmead, B. & Nellore, A. (2018). Cloud computing

for genomic data analysis and collaboration. Nature

Reviews Genetics, 19(4), 208.

Luan, T. H., Gao, L., Li, Z., Xiang, Y., & Sun, L. (2015).

Fog computing: Focusing on mobile users at the edge.

arXiv preprint arXiv:1502.01815.

Madsen, H., Burtschy, B., Albeanu, G., &

Popentiu-Vladicescu, F. (2013). Reliability in

the utility computing era: Towards reliable fog

computing. 20th International Conference on Systems,

Signals and Image Processing.

Mell, P. & Grance, T. (2011). The nist deﬁnition of cloud

computing. NIST.

modENCODE (2018). The National Human Genome

Research Institute model organism ENCyclopedia

Of DNA Elements. Retrieved from http://www.

modencode.org/.

NCBI (2018). Sequence Read Archive. Retrieved

from https://trace.ncbi.nlm.nih.gov/Traces/

sra/sra.cgi?

Nellore, A., Jaffe, A. E., Fortin, J.-P.,

Alquicira-Hern

andez, J., Collado-Torres, L., Wang,

S., Phillips III, R. A., Karbhari, N., Hansen, K. D.,

& Langmead, B. (2016). Human splicing diversity

and the extent of unannotated splice junctions across

human rna-seq samples on the sequence read archive.

Genome biology, 17(1), 266.

NIH (2014). Trans-Omics for Precision

Medicine (TOPMed) Program. Retrieved

from https://www.nhlbi.nih.gov/science/

trans-omics-precision-medicine-topmed-program.

NIST (2018a). Deﬁnition of Cloud Computing. Retrieved

from https://csrc.nist.gov/publications/

detail/sp/800-145/final.

NIST (2018b). Fog Computing Conceptual

Model. Retrieved from https://www.

nist.gov/news-events/news/2018/03/

nist-releases-special-publication-500-325\

-fog-computing-conceptual-model.

Portnoy, M. (2012). Virtualization essentials, volume 19.

John Wiley & Sons.

Ruiter, J. & Warnier, M. (2011). Privacy regulations

for cloud computing: Compliance and implementation

in theory and practice. In Computers, privacy and

data protection: an element of choice, pp. 361–376.

Springer.

Saharan, K. & Kumar, A. (2015). Fog in comparison to

cloud: A survey. International Journal of Computer

Applications, 122(3).

Shanker, A. (2012). Genome research in the cloud.

Omics: a journal of integrative biology, 16(7-8),

422–428.

http://novasinergia.unach.edu.ec 81

Soltesz, S., P

otzl, H., Fiuczynski, M. E., Bavier, A.,

& Peterson, L. Container-based operating system

virtualization: a scalable, high-performance alternative

to hypervisors. In ACM SIGOPS Operating Systems

Review, pp. 275–287. ACM.

Tang, B., Chen, Z., Hefferman, G., Pei, S., Wei, T., He,

H., & Yang, Q. (2017). Incorporating intelligence in

fog computing for big data analysis in smart cities.

IEEE Transactions on Industrial informatics, 13(5),

2140–2150.

Xu, X., Yu, H., & Pei, X. A novel resource scheduling

approach in container based clouds. In Computational

Science and Engineering (CSE), 2014 IEEE 17th

International Conference on, pp. 257–264. IEEE.

Zhang, Q., Liu, L., Pu, C., Dou, Q., Wu, L., & Zhou, W.

(2018). A comparative study of containers and virtual

machines in big data environment. arXiv preprint

arXiv:1807.01842.

http://novasinergia.unach.edu.ec 82