Beruflich Dokumente
Kultur Dokumente
Abstract
This paper aims to examine the impact of data migration in National Enterprise Wide
Statistical System (NEWSS) to the Department Of Statistic Malaysia . Number of data is
drawn from NEWSS Phase 1 and II included data from Economic Census 2005 and 2010.
ETL model is a comprehensive method to study NEWSS data migration because we can
investigate the effect on the data across sectors in department. The general findings of this
paper is that the contribution of data migration activities to the Operational of System
NEWSS is significantly increased in 2014 compared to 2011. This is in line with the
Department mission and objective to produce integrity and reliability data of National
Statistics through the use of the best technology, and to improve and strengthen statistical
services and delivery system.
Keywords: Database migration, Data Migration, ETL, Objective and Mission
1.
Introduction
With the rapid growing of business requirement in DOSM and new enterprise wide application
integration, organizations come to a stage where they have to change from working in
separated database and multiple platform to a single and integrated one. Migration also happen
when Organization realize that the existing systems have performance and scalability
limitations, which cannot cater to their ever-expanding business needs.
Data migration is the process of transferring data between storage types, formats, or computer
system. It is required when Organizations or Individuals Change Computer Systems or
Upgrade to New Systems or when System Merge.
Usually data migration performed
programmatically to achieve an automated migration.
figure 1: Data migration flow in DOSM
There is a different between data migration and database migration, though database
migration encompasses data migration also.
movement of data and conversion of various other structures and objects associated with the
database including schema and applications associated with the current system to a different
technology/platform. Database migration is one of the most common but a major task in any
application migration. Example of activity comprises in database migration are
Data migration is simply the movement of data from one database (or File System)/platform
to another. This may include extraction of the data, cleansing of the data and loading the
same into the target database. for example, when an application is developed, it is required
to get those data for the newly developed application to operate. In this case only the data is
moved from the required database to the database used by the new application.
In simple ways database migration can be referred when there is a shifting from one type of
database systems to an entirely new type of database system or to a database system with
entirely new features and functionality. Hence data migration is a subset when database
migration activities are carried out, though data migration may also be taken up
independently.
There are interesting question why it is required to move to other database while the existing
systems are running with current database. the reason why data migration are important in
DOSM are
1. Avoid Businesses Failure
2. Improve corporate performance and deliver competitive advantage
3. Efficient and effective business processes (centralized db)
4. Measureable and accurate view of data
5. Perceive better value in the newer system in term of standardization of operational field
work and data entry
2.
Literature review
There are a number of studies conducted on best practices for data migration. For example
data migration, Methodologies for assessing, planning, moving and validating data migration
by IBMGlobal Technology Services, October 2009 and NetApp Global Services,
January 2006. Meanwhile, study about Database Migration Approach & Planning done by
Keshav Tripathy, Pragjnyajeet Mohanty and Biraja Prasad Nath(2002).
2
From the study by Martin Wagner, March 17, 2011 on Introduction on Patterns for Data
Migration Projects, he conclude that the quality constraints on the data in the old
system may be lower than the constraints in the target system. Inconsistent or missing
data entries that the legacy system somehow copes with (or ignores) might cause severe
problems in the target system. In addition, the data migration itself might corrupt the
data in a way that is not visible to the software developers but only to business users.
NetApp Global Services, January 2006 on Data Migration Best Practices state that for IT
managers, data migration has become one of the most routineand challengingfacts of
life. With the increase in the percentage of mission-critical data and the proportionate
increase in data availability demands, downtimewith its huge impact on a companys
financial bottom linebecomes unacceptable. In addition, business, technical and
operational requirements impose challenging restrictions on the migration process itself.
Resource demandsstaff, CPU cycles, and bandwidthand risksapplication downtime,
performance impact to production environments, technical incompatibilities, and data
corruption/lossmake migration one of ITs biggest challenges. Since the majority of
storage systems purchased by customers is used to store existingrather than newdata,
getting these new systems production-ready requires that data be copied/moved from the
old system to be replaced to the new system being deployed. Whether the migration is
performed by internal IT or an external services provider, the migration methodology is the
same.
On the other hand, IBM Global Technology Services October 2009 mention that when
systems must be taken down for migration, business operations can be seriously affected. A
keyway to minimize the business impact of data migration is to use best practices that
incorporate planning, technology implementation and validation. Any change in the storage
infrastructure, whether it is a tech-nology refresh, consolidation, relocation or storage
optimization,requires an organization to migrate data.
There are a variety of software products that can be used to migrate
data,including volume-management products, host- or array-based replication
products and relocation utilitiesas well as custom-developed scripts. Each
ofthese has strengths and weaknesses surrounding performance, operating
system support, storage-vendor platform support and whether or not application
downtime is required to migrate the data. Some of these products enable online
migration of dataso applications dont need to be taken offline during the
migration process. A subset of these provides nondisruptive migration,which
means that applications not only remain online, but also that application
processing continues without interruption or significant performance delays.
Therefore, IT organizations should carefully explore software options.Specific
requirements can help determine the best software technology to use for each
migration.
In addtion , Keshav Tripathy, Pragjnyajeet Mohanty and Biraja Prasad Nath on
Database Migration Approach & Planning state that Database migration,
consists of three major components, they are
Schema Migration This consists of mapping and migrating the source schema
with the target schema. For this the schema needs to be extracted from the
source system and the equivalent needs to be replicated in the target system
Data Migration This is the part where the data is extracted from the source
database. Then it is checked for consistency and accuracy, it is cleansed if
necessary. Finally it is loaded into the target system.
3.
Methodology
3.1
Source of Data
Number of migration data is drawn from from NEWSS Phase 1 and II included data
from Economic Census 2005 and 2010.
3.3
Analysis
The process occur during the migration process are analysis, mapping, planning, designing,
testing, loading n verifying.
The analysis happen in the source system, after that process extract and transform data into
staging area. Staging area is a workspace where we work to clean, put a rules, validate data
before we load it into the target.
3.5
d) Data Source - Verification on the data sources and format been done in this phase.
e) Cleansing and Loading Data - The Verification on the data sources with migration
script been done to check either the source data is clean or not. if during the
verification process the data show error, so the details checking and correction need to
be made by SMD. Otherwise if the verification shows a successful process, the data
will go to the final verification and be prepared for the next steps.
f) Production Loaded - The Data with Final Verification that been prepared on steps (e)
been loaded to the Production database. After the data been loaded a few testing
using the NEWSS system on the data loaded need to be perform by SMD. After the
successful testing, the approval of data migrated will be done by SMD.
3.6
3.7
Migration tools,
In our market for migration tools, there were many software tools that being use for
ETL purpose. All the available tools do have their own strength and weaknesses.
Below is the tools that been explored during the migration process conduct in DOSM.
a) Talend
really powerful, stable and customizable
It's quite-well embeddable, it produces java code
The drawback is the learning curve
b) Pentaho
Its ETL tool (named Kettle)
is just a component of Pentaho Business Intelligence open platform
It's java-based
The major drawback is that Kettle is much harder to extend than Talend.
c) CloverETL
mostly younger
it's light, easily embeddable and easy to learn
But it's really much less powerful than Talend and even than Kettle.
3.8
a)
b)
c)
d)
e)
f)
CHALLENGES
Challenges that occurs during migration process in DOSM are :
Analyze -difficulty of data collection process because of data come from different
sources and misunderstanding of user requirements.
Mapping - Uncontrollable of mapping versioning due to frequent changes of the
survey form.
Design Migration Script - usually time constraint problem because large number of
data been inform to be migrate in a short time frame. sometime ad hoc request from
SMD.
Cleansing Data - Hard to get the real final data because of a few revision release.
Sometime data cleansing need to be confirmed several times from the SMD especially
if there is a data problem during the ETL process.
Testing - Checking the data involved many reference tables ie: Establishment Frame,
MSIC Code, Household Frame, locality and etc.
Data Loaded - Some data patching activities impacts the new version of final data.
10
CURRENT
FUTURE
No Profiling activity
StatsDW
Patching
data which
4.2
Contribution of Migration activities to the DOSM
Based on migration process for NEWSS, the result is shown in Table 2 as below.
CURRENT
FUTURE
No Profiling activity
Establish
phase
profiling
No centralized final
data
StatsDW
Patching
11
Repetition migration
process impacts varies
of Final Data
4.3
From paired t-test conducted, null hypothesis was rejected (p-value = 6.0 x 10-9). Mean value
for multiplier of unskilled labour in 2010 is 1.2897 which is smaller than 1.4539 recorded in
2005. This reveals that the contribution of unskilled labour for the period of 2005 and 2010 is
negatively significant. This is once again in line with government policies, to reduce
dependency towards unskilled labour. The result is shown in Table E as attach in Appendix.
4.3
Further investigation need to be conducted across sectors to assist the policy makers in
determining which sector(s) of the economy to spend one additional unit. A comparison of
output multipliers would show where spending would have the greatest impact on output or
employment generated throughout the economy (Hussain, 2011).
Table 2 portrays ten sectors with highest multiplier increase of skilled labour for the period of
2005 and 2010, lead by Measuring, Checking & Industrial Process Equipment (0.448),
followed by Other Mining and Quarrying (0.355) and Recycling (0.335).
4.4 Contribution of Migration activities to the IT Division
Table 2: Top Ten Sectors with Highest Multiplier Increase of Skilled Labour
SECTOR
Measuring, Checking & Industrial
Process Equipment
Other Mining and Quarrying
Recycling
Confectionery
Sheet Glass and Glass Products
Communication
Medical, Surgical and Orthopaedic
SKILLED
2005
SKILLED
2010
0.29317
0.74101
0.44783
0.10271
0.16663
0.18408
0.29967
0.71280
0.40515
0.45795
0.50256
0.51117
0.61203
0.98839
0.64597
0.35524
0.33593
0.32709
0.31236
0.27559
0.24082
2
3
4
5
6
7
DIFFERENCES RANK
12
Appliances
Forestry and Logging
Computer Services
Financial Institution
0.11221
1.08238
0.84319
0.34934
1.29591
1.04517
0.23712
0.21353
0.20198
8
9
10
From table 3, five sectors were identified to be contrary with the government intention to
increase the number of skilled labour in Malaysias labour market. The multiplier effect of
skilled labour are found to be smaller in 2010 for these five sectors, which shows that the
additional skilled employment generated in 2010 is smaller than in 2005. The highest
multiplier decrease was recorded by Fertilizer (-0.524), Publishing (-0.251), Water Transport (0.234), Air Transport (-0.234) and Other Private Services (-0.227).
Table 3: Bottom Five Sectors with Highest Multiplier Decrease of Skilled Labour
SECTOR
Fertilizers
Publishing
Water Transport
Air Transport
Other Private Services
SKILLED
2005
0.88617
0.67956
0.81389
1.16909
0.41426
SKILLED
2010
0.36188
0.42833
0.57923
0.93468
0.18688
DIFFERENCES
RANK
-0.52430
-0.25123
-0.23466
-0.23441
-0.22738
120
119
118
117
116
With the emergence of many new technologies in varies sectors of Malaysian economy, the
dependency towards unskilled labour are expected to be decreasing. However, from Table C,
32 sectors are found to be contrary with above statement, lead by ten sectors as presented in
Table 4. The employment multiplier of unskilled labour of these sectors is found to be greater
in 2010 than 2005. Hence, these sectors need to be given more consideration by the
government to spend one additional unit of investment because most of the sectors are seen to
require more skilled labour.
Table 4: Top Ten Sectors with Highest Multiplier Increase of Unskilled Labour
SECTOR
Recycling
UNSKILLED UNSKILLED
DIFFERENCES RANK
2005
2010
0.97630
1.59857
0.62227
1
13
1.15466
1.66741
1.62424
0.98516
1.71867
1.75142
2.22918
2.09614
1.41814
2.12325
0.59676
0.56177
0.47189
0.43299
0.40457
2
3
4
5
6
1.77653
2.15729
0.38076
1.79615
1.17550
1.96180
2.11297
1.44179
2.20841
0.31681
0.26629
0.24661
8
9
10
14
5.
Concluding Remarks
The contribution of data migration in the operational of NEWSS has been proved by increasing
of demand by SMD on migrated data.
Skilled labour is also crucial in generating the gross domestic product (GDP), improving the
image of the industry and creating a productivity workforce (Mohamad Faiz, 2008). Shamley
and Ishak (2011) have concluded that skilled workers contribute positively to the productivity
in manufacturing sectors.
In achieving the governments goal to become a comparative, developed and high income
nation by 2020, the government has targeted to produce 50 per cent skilled labours in various
fields. Currently the country had only 28 per cent skilled labour and it was targeted to increase
to 33 per cent next year.
Hence, this study was conducted with the aim to investigate whether current labour market is
in line with government intention. The employment multiplier was calculated using InputOutput Model. In addition, the contribution of skilled and unskilled labour has been
investigated using paired t-test. The results are presented in Table 1-3. The analysis across
sectors was conducted to identify which sector(s) employment changes was occurs (Table 4-6,
Table A and B). The detailed information provided by employment multiplier across sectors
perhaps serve as valuable input to the national planners in determining which sectors need to
be improved.
15
References
Chinkook, L. and Gerald, S. (1999). Effect of Trade on the Demand for Skilled and Unskilled
Workers Economic Systems Research, Vol.11, No.1, 1999
http://www.theborneopost.com/2014/06/05/50-per-cent-skilled-workers-targeted-to-beproduced-by-2020/
http://www.investopedia.com/terms/s/skilled-labor.asp
http://www.investopedia.com/terms/u/unskilled-labor.asp
Hussain Ali Bekhet. Output, Income and Employment Multiplier in Malaysian Economy:
Input-Output Approach, International Bisiness Research, Vol.4, No.1, January 2011.
Lowell, L. and Batalova, J. (2005). International Migration of Highly Skilled Workers:
Methodological And Public Policy Issues. Population Association of America 2005 Annual
Meeting Program.
Madeline B.D. Input-Output Multiplier Analysis for Major Industries in The Philipines, 11th
National Convention on Statistics (NCS), EDSA Shangri-La Hotel, October 4-5, 2010.
MASCO (2010). Kementerian Sumber Manusia Malaysia.
Mohd Shalemy and Ishak Yussof (2011). High Skilled Workers and Productivity in
Manufacturing Sector in Malaysia. Prosiding PERKEM VI, Jilid 2 (2011) 308-318
Raouf, R. and Hafid, H. Relocation and Inequalities between Skilled and Unskilled in
Northern Countries: Simulation Using a CGE Model. International Journal of Economics and
Financial Issues. Vol.4, No.4, 2014, PP.758-772.
Roberts, J. and Skoufias, E. (1997). The Long-Run Demand for Skilled and Unskilled Labour
in Colombian Manufacturing Plants. The Review of Economics and Statistics, Vol. 79, No. 2.
(May, 1997), pp. 330-334.
Siti Rahmah and Nurul Naqiah (2013). Foreign Employment Multiplier in Malaysia: An
Input-Output Analysis. Presented in Technical Paper Presentation 2013.
Appendix
Table A: Total Employment Multiplier
Year
2005
2010
16
SECTOR
DESCRIPTION
UNSKILLED
SKILLED
Paddy
1.2540
0.1411
Food Crops
1.3874
0.1103
Vegetables
1.2790
0.1352
Fruits
1.4993
0.1501
Rubber
1.2840
0.1135
Oil Palm
1.3644
0.1806
Flower Plants
1.7222
0.2440
Other Agriculture
1.5405
0.1698
Poultry Farming
1.7204
0.2957
10
Other Livestock
1.5852
0.2139
11
1.1547
0.1122
12
Fishing
1.5551
0.1264
13
0.6459
0.6614
14
1.8025
0.5852
15
1.6960
0.3903
16
0.9485
0.1027
17
2.3614
0.2774
18
1.9618
0.2189
19
Preservation of Seafood
Preservation of Fruits and
Vegetables
1.7055
0.2993
20
Dairy Production
1.7429
0.3560
21
2.4162
0.3372
22
Grain Mills
1.5473
0.3093
23
Bakery Products
1.9725
0.3432
24
Confectionery
1.4408
0.1841
25
1.8880
0.3311
26
Animal Feeds
1.3848
0.4221
27
1.5638
0.1998
28
Soft Drink
1.6551
0.3963
29
Tobacco Products
1.6868
0.3074
30
1.4124
0.4253
31
Finishing of Textiles
2.1364
0.3862
32
Other Textiles
1.7151
0.3636
33
Wearing Apparel
1.4665
0.2498
34
Leather Industries
1.6718
0.1378
35
Footwear
1.6293
0.2457
36
1.6242
0.1823
37
1.7765
0.2451
38
1.7962
0.2553
39
1.6674
0.1800
40
1.7080
0.2299
UNSKILLED
1.0953
SKILLED
0.0750
1.1298
0.1061
1.2033
0.1057
1.1368
0.1018
1.3409
0.2722
1.1720
0.1632
1.1383
0.1033
1.0837
0.1162
1.5253
0.2580
1.4931
0.2887
1.7514
0.3493
1.5381
0.3065
0.4885
0.6990
0.9398
0.4113
1.1200
0.1649
1.0553
0.4580
1.8916
0.4132
2.2084
0.3500
1.5840
0.3437
1.5831
0.4450
2.2614
0.3983
1.5576
0.2303
1.5538
0.3059
1.0232
0.5112
1.3778
0.3827
1.4322
0.4257
1.1577
0.3179
1.4586
0.4141
0.8491
0.4958
1.4117
0.4538
1.5177
0.4488
1.3360
0.3228
1.3262
0.2610
1.4726
0.2935
1.4782
0.2645
2.0961
0.3414
2.1573
0.3553
2.1130
0.3912
2.2292
0.3707
1.8879
0.3955
17