Sie sind auf Seite 1von 4

IJSRD - International Journal for Scientific Research & Development| Vol.

5, Issue 12, 2018 | ISSN (online): 2321-0613

Data Management in the Cloud Computing


Ashish Adinath Vankudre
Department of Computer Science & Engineering
Adarsh Institute of Technology Vita, Maharashtra, India
Abstract— The present cloud computing time opens another management systems are in operational now, such as
road for organizations to consider moving their information BigTable in Google, Cassandra and Hive in Facebook, HBase
data management solutions to cloud. It is demonstrated in Streamy, PNUTS in Yahoo!, and many other systems.
through recent successful deployments that data management Cloud computing has become a major influence in data
applications are one of the most suitable candidates for management research and plays a key role.
deployment in the cloud. Data management in the cloud This research study based on noted technological
Computing brings many challenges as well as advantages. literatures indicate that the research in cloud-based data
This paper represents the current state of data management management systems is still in the early stages, and there is
solutions in the cloud and the current direction of research in enough potential for IS researchers to contribute to this area
Data management in Cloud Computing. It is necessary to of study. This paper clearly indicates that the researchers
contemplate that the cloud-based data management is a very share common research interests in following areas: data
fluid field, which means technologies are changing rapidly. It management architecture, data security and privacy in the
is also important that industry is investing more as compare cloud.
to other field of data management, thus making an interesting The rest of this article is organized as follows:
subject for detailed study. Section 2 summarizes the scope and boundaries of this
Key words: Cloud, Data Management, Cloud Storage, Data literature review. Section 3 describes the methodology used
Security, Data Privacy to identify the relevant articles from various journals and
other respective places. Section 4 outlines findings and
I. INTRODUCTION explanations. Section 5 explains future work and conclusions.
The start of last decade clearly presented that the pace and
volume of data being generated is exceeding the current II. SCOPE
capacity of ‘institutions’ data management. Cloud-based data Since data management is a broader area of study with many
management is in turn helping to realize the potential of large subtopics for research, this review would like to set the scope
scale data management solutions by giving effective scaling and boundaries of the paper to make the study focused. This
of resources. In a cloud-based data management scenario, paper considers only the papers that address the following
institutions rent storage and computing power in order to research issues: data management architectures in cloud, data
make the data management applications work rather than security in cloud and data privacy in cloud. Though the
making considerable capital in-house investment for review may contain papers from other subject areas, the focus
infrastructure. is always in the mentioned areas.
The primary advocates of public cloud services are
the public cloud providers themselves, but the adoption of III. METHODOLOGY
cloud-based systems has been relatively modest regardless of This paper is based on analyzing relevant articles published
a significant marketing push by major public cloud providers in major information systems journals and conference
(Geczy et al, 2013). As Mann and Morton described in their proceeding papers. This paper orchestration is influenced by
respective papers, data management in cloud looks very the suggestions given by Webster & Watson 2002. Three
promising from the economic point of view initially, but types of article searches were conducted as suggested by
becomes expensive in a long term. Present-day public cloud Webster & Watson 2002.
services are often economical for less than two years (Mann, 1) Start with journal databases like ABI/Inform (ProQuest).
2011; Morton and Alford, 2009). Security, control and legal 2) Go backward by reviewing the citations for the articles
protection of data and services are among the most important identified to determine prior articles.
aspects for organizations (Anthes, 2010; Lanois, 2010). 3) Go forward by using the Web of Science (the electronic
In a 2010 article,Shi describes the cloud. Though version of the Social Sciences Citation Index) to identify
there has not been a standard definition about cloud articles citing the key articles identified in the previous
computing, he describes the substantial features of it as steps. Determine which of these articles should be
scalability, fault tolerance, high performance cost, pay-as- included in the review.
you-go, etc. (Shi et al, 2010). Cloud-based data management Since data management in cloud is a relatively new
systems provide a flexible and economical solution to scale topic and many technological changes happened in recent
horizontally with commodity hardware, and more years, only the last 10 year articles were considered for
importantly, the scaling server resources are transparent to the review. Articles are included in this review only if they
applications. It has been noted from the beginning of this address any of the aspects mentioned in the scope section.
decade that more and more companies are moving their data
management applications from expensive, high-end servers
IV. ARTICLE RELEVANCE
to cheaper, cloud-based solutions.
Data management is one of the most important In this section the identified articles using the above
research areas in cloud computing. Many cloud-based data mentioned methodology are categorized with respect to the

All rights reserved by www.ijsrd.com 227


Data Management in the Cloud Computing
(IJSRD/Vol. 5/Issue 12/2018/063)

data management characteristics. The following software and data anymore because they’re in expert hands
characteristics were derived from the relevant articles after (Anthes, 2010).
analyzing the content: Many articlesstate that organizations utilizing public
Data management architecture in cloud: Articles considered clouds lose control over their critical data and services. Once
in this section deal with cloud based data management the external entities gain control, data is considered
solution architecture. Data management in cloud is still a substantially less reliable than organizational intranets. Even
broader area, but the articles are limited to the topics of large though existing solutions offer a data storage service with
scale data storage, massive parallel query execution, and good dependability, accessibility and availability guarantees,
facilities for analytical and query processing. and a geographically independent location, the adoption of
Data security in cloud: An article is included in this these solutions can be more problematic. As more businesses
category if it adds any value to the existing knowledge related move their data into cloud-based storage platforms, security
to data security in cloud. concerns remain under-appreciated.
Data privacy in cloud: An article is included in this As Ferreira mentioned in a 2012 article,
category if it adds any value to the existing knowledge related organizations should maintain control over their critical data,
to data privacy in cloud. services and infrastructure. Non-critical elements can be
outsourced to external providers. Critical organizational data,
A. Findings
services, and infrastructure to access them are kept in-house.
Technologically, the cloud computing concept is nothing new Cryptographic Cloud Storage is another concept proposed by
(Howie, 2010). Distributed systems have existed for many researchers, which offers a virtual private storage with
years. It is very clear that industry is moving towards moving security of a private cloud and the cost savings of a public
data out of their own data centers and into the cloud. cloud. (Kamara & Lauter 2010)
Data management in cloud addresses the challenges
in managing large collections of data in the cloud computing C. Data Privacy in the Cloud
environment. Huge volumes of data in cloud computing The issue of protecting confidential data is not new. There has
environments pose big infrastructure challenges, including been extensive research in the area of statistical databases.
data storage, massively parallel query execution, facilities for There are increasing concerns about invasions of and
analytical processing, and online query processing. There is a potential threats to privacy of personal information by
high degree of complexity involved in ensuring that they can information technology. Other studies on privacy- preserving
sustain consistent and reliable operations under peak loads. data management can be found in Estivill-Castro and
It is very clear from the paper that cloud-based data Brankovic (1999), Atallah et al. (1999), and Verykios et al.
management systems will not replace the traditional RDBMS (2004). But most of the studies in this stream of research tend
in the near future; however, it supplies another choice for the to approach the privacy issue from a data miner’s standpoint.
applications which are suitable to be deployed in the cloud. Authors give several suggestions regarding data
(Shi et al, 2010) privacy. A very fewof they suggested using categorical data
During the existing cloud-based data management to prevent privacy concerns. Merging categorical values can
systems, BigTable, HBase, HyperTable, Hive and also reduce the proportion of identifiable records (Iyengar
HadoopDB are mostly used for analytical data management 2002).Another interesting research subject is to limit
applications, while PNUTS and Cassandra are used for web disclosure of confidential data for identifiable records when
data management. The chart below explains the file system the data is provided to analysts for classification. This
usage of each of the data management technologies used in suggestion can be done automatically or through manual
popular cloud solutions. process. There are different techniques proposed by various
Project File System researchers. One technique is a data perturbation method that
BigTable GFS can be used by organizations to prevent disclosure of
HBase HDFS confidential information, while providing the data to analysts
HyperTable KFS, HDFS for data mining (Bai Li, 2006). Another approach to privacy
Hive HDFS protection for categorical data is data swapping, suggested by
Cassandra Local File System Schlörer (1981) (who used the term “data transformation”)
and Dalenius and Reiss (1982).Fienberg et al. (1998)
B. Data Security in Cloud proposed a loglinear model-based perturbation method that
The WikiLeaks case clearly exposed the risks of adopting generates sample data based on the empirical multivariate.
public cloud computing models and services (Sternstein, There is also a legislation or governmental aspect in
2011). As more and more organizations are considering data privacy. For example, in the United States, the US Patriot
moving data to the cloud and the critical nature of the Act allows the government to demand access to the data
applications, it is important that clouds be secure. The major stored on any computer.
security challenge with clouds is that the owner of the data
may not have control of where the data is placed. V. FUTURE RESEARCH
Virtualization paradigm in cloud computing results in several Knowledge creation refers to the development of new tacit or
security concerns. (Hamlen et al, 2010) explicit knowledge from data and information or from the
One of the main security issues authors point out is synthesis of prior knowledge [Becerra-Fernandez et al.,
that the users are unaware of cloud security. Cloud users may 2004]. This is important because it enables researchers to
think they do not have to worry about the security of their

All rights reserved by www.ijsrd.com 228


Data Management in the Cloud Computing
(IJSRD/Vol. 5/Issue 12/2018/063)

move toward new research frontiers. Even though increasing Management in Organizations: A Pragmatic Perspective.
research interests are focused in this data management area, Information Systems Journal, 17(2), 143-163.
people still need to exchange their ideas and results. This [3] Hamlen, K. Kantarcioglu, M. Khan, L. Thuraisingham,
review also aims to reflect top research progress in the cloud B. (2010). Security Issues for Cloud
data management area. Computing.International Journal of Information Security
Cloud data management presents many challenges, and Privacy, 4(2), 36-48.
including problems of scale (storing petabytes of data, [4] Levina, N., and Vaast, E. (2005).The emergence of
providing massively parallel query execution, facilities for boundary spanning competence in practice:Implications
analytical processing, and online query processing), security for implementation and use of information systems. MIS
and privacy, and environmental concerns. This study also Quarterly, 29(2), 335–363.
addresses the practical adoption of public, private and hybrid [5] Ravishankar, M.N.; Pan, S.L.; and Leidner, D.E.
cloud architectures. They differ in accessibility, ownership (2011).Examining the strategic alignment
and location of cloud-based environments. Since andimplementation success of a KMS: A subculture-
Organizations can exercise full control over their data, based multilevel analysis. Information Systems
services, resources and infrastructure, private clouds are the Research, 22(1), 39–59.
most beneficial for organizations (Orakwue, 2010). But [6] Tiwana, A (2012), Novelty-knowledge alignment: A
research on this area is not so popular. theory of design convergence in systemsdevelopment.
There is a hybrid model for data management in the Journal of Management Information Systems, 29(1) 15–
cloud, since public clouds are the most risky and 52.
disadvantageous for organizations (Hofmann and Woods, [7] RizwanMian, Patrick Martin (2012). Executing data-
2010). Organizations lose control over their valuable data, intensive workloads in a Cloud.ACM International
services and infrastructure. Hybrid clouds represent a Symposium on Cluster 2012 12th IEEE/ACM
combination of private and public clouds (Sotomayor et al., International Symposium on Cluster, Cloud and Grid
2009). Researches in this area are limited in terms of Computing.
accessibility, analytical processing and query processing, but [8] Yingjie Shi, XiaofengMeng, Jing Zhao, Xiangmei Hu,
still there is room for further study. Suitable strategies for Bingbing Liu and HaipingWang (2010). Benchmarking
effective management of hybrid clouds is also an area to be Cloud-based Data Management Systems.CloudDB’10,
explored. Toronto, Ontario, Canada. ACM 978-1-4503-0380-
There is one disadvantage of grouping or swapping 4/10/10
or merging in that they also reduce data quality. There is very [9] Bernardo Ferreira, Henrique Domingos (2012).
little research addressing this issue, and this problem may Management and Search of Private Data on Storage
need further investigation. Clouds.Center for Informatics and Information
Technologies.SDMCMM’12, December 3-4, 2012,
VI. CONCLUSION Montreal, Quebec, Canada.
After going through various cloud-based data management [10] XiaofengMeng, Adam Silberstein, Fusheng Wang
information, it is almost certain that large scale data analysis (2012) Information and Knowledge Management.
decision support systems, tasks and application-specific data CIKM’12, October 29–November 2, 2012, Maui, HI,
marts are more likely to take advantage of cloud computing USA.ACM 978-1-4503-1156-4/12/10.
platforms than operational, transactional database systems. [11] Peter Géczy, Noriaki Izumi, KôitiHasida (2013). Hybrid
The current research indicates that most of the research in this cloud management: Foundations and strategies. Review
area is happening in conjunction with the basic cloud of business and finance studies. (4) 1
principles, such as dependability, availability, security, and [12] Hussam Abu-Libdeh, Lonnie Princehouse, Hakim
privacy. This paper provides an insight into the past and Weatherspoon (2010). RACS: A Case for Cloud Storage
present cloud-based data management issues and current Diversity, ACM 978-1-4503-0036-0/10/06
research interests. Insight into the past and present of cloud- [13] Anthes, G. (2010). Security in the Cloud: Cloud
based data management allows for identifying the gaps in the Computing Offers Many Advantages, but Also
previously mentioned area of data management. Cloud-based InvolvesSecurity Risks. Communications of ACM,
data management study areas such as private and personal 53(11), 16-18.
cloud and data privacy still require more detailed study and [14] Xiao-Bai Li, SumitSarkar (2006). Privacy Protection in
research. Data Mining: A Perturbation Approach for Categorical
Data Information Systems Research. (17) 3, 254–270
[15] Iyengar, V. S. (2002). Transforming data to satisfy
REFERENCES
privacy constraints. Knowledge Discovery
[1] AlinaDulipovi ci and Daniel Robey (2013).Strategic DataMining.ACM Press, New York, 279–288.
Alignment and Misalignment of Knowledge [16] Daniel J. Abadi (nd) Data Management in the Cloud:
Management Systems: A Social Representation Limitations and Opportunities. IEEE Computer Society
Perspective. Journal of Management Information Technical Committee on Data Engineering
Systems / spring 2013, 29(4), 103–126. [17] Gary Anthes (2010Security in the
[2] Butler, T., Murphy, C. (2007).Understanding the Design Cloud.Communications of the acm,(53) 11
of Information Technologies for Knowledge

All rights reserved by www.ijsrd.com 229


Data Management in the Cloud Computing
(IJSRD/Vol. 5/Issue 12/2018/063)

[18] S. Ghemawat, H. Gobioff, and S.-T. Leung (2003). The


google file system, in Proceedings of SOSP’03,
NewYork, 29–43.

All rights reserved by www.ijsrd.com 230

Das könnte Ihnen auch gefallen