Sie sind auf Seite 1von 12

48 International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014

Knowledge Discovery for


Tourism Using Data Mining
and Qualitative Analysis:
A Case Study at Johor Bahru, Malaysia
Atae Rezaei Aghdam, Department of Information Systems, Universiti Teknologi Malaysia,
Johor Bahru, Malaysia
Mostafa Kamalpour, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru,
Malaysia
Alex Tze Hiang Sim, Department of Information System, Universiti Teknologi Malaysia, Johor
Bahru, Malaysia

ABSTRACT
This paper aims to propose a new guideline for analyzing tourist profiles as found in www.tripadvisor.com.
These have been examined from two different aspects so as to gain conclusive results. Tourist data were
“crawled” from tripadvisor.com through a specific web crawler. Mining techniques using a combination of
visualization, clustering, and association rules were instrumental in discovering the first set of interesting
knowledge. This was followed by a qualitative analysis applied through Nvivo software via coding of the tour-
ist’s comments in order to define the design of the prospective model. A final set of results was obtained once
both results confirmed each other. In this study, results show that there are several types of tourists; with each
group having different preferences. For example: male Singaporean visitors to hotels tend to enjoy wine and
food in addition to outdoor activities; while local visitors to Legoland are not satisfied with certain aspects,
such as the price of food. International tourists, however, consider the affirmative points of Legoland. This
research can be very useful for tourist associations and hotel managers in Johor Bahru.

Keywords: Data Mining, Johor Bahru, Knowledge Discovery, Malaysia, Qualitative Analysis, Tourism

1. INTRODUCTION tourists arriving in Malaysia. In addition, the


World Tourism Organization estimates that
According to the annual report of Tourism Ma- global tourism is expected to grow faster than
laysia in 2010 (Tourism Malaysia, 2010), the other economic sectors in the world (Witten
tourism industry has played an important role & Frank 2011). Uncovering new, interesting
in increasing the GDP with some 24.6 million and useful information on tourist data can be

DOI: 10.4018/ijabim.2014100105

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014 49

helpful for tourism organizations in order to three main aspects of using data mining in the
identify tourists’ behavior patterns and their tourism industry. These are, namely; forecast-
preferences. This, then, is a great opportunity ing tourist expenditure, analyzing profiles of
for the Malaysian tourism industry to boost tourists and forecasting the number of tourist
tourist arrivals and increase revenue. This paper arrivals. The author has found various results
focuses on the role of data mining in the Malay- based on these three dimensions. For instance,
sian tourism industry, in particular, regarding in forecasting tourist expenditure, artificial
places of interest in Johor Bahru and hotels in intelligence sources such as Neural Network
the Mersing area. In essence, we will analyze were used for estimating tourist expenditure in
tourist behavior patterns in Johor Bahru so as the Balearic Islands. Further, Au & Law (2002)
to discover useful and hidden knowledge in used data mining techniques to predict shop-
order to recommend appropriate places to visit. ping expenditure by tourists with an accuracy
Moreover, quantitative and qualitative analyses level of 94%. In relation to analyzing tourist
have been undertaken simultaneously in this profiles (Bose, 2009), categorized tourists into
paper. Quantitative analysis was applied by specific groups such as; developmental support,
Weka machine learning software; while qualita- prudent developers, ambivalent, cautious and
tive analysis was performed by Nvivo software protectionist respectively by using a clustering
based on tourists’ comments. Each group of technique. In forecasting tourist arrivals, some
analysis findings supports the other in order to studies had been conducted examining tourist
obtain significant results. This study can assist arrivals to Hong Kong from six different coun-
tourist associations in Johor Bahru and travel tries; Artificial Neural Network (ANN) showed
agencies in the promotion of places in Johor that this outweighed statistical methods. In a
Bahru which would be attractive to both local further study, Bose (2009) stated that it has been
and international visitors. This paper is arranged argued that, to date, only some AI techniques
in the following manner: Section 1; investiga- such as ANN and clustering techniques have
tion of related works regarding data mining in been used in tourism data mining. It is largely
the tourism industry; Section 2; explanation of prepackaged software that uses these techniques
our research methodology and pre-processing readily; it can also be used with little training
procedure; and finally, Section 3; analysis of the to analyze data. However, the author believed
data and illustration of “hidden knowledge” in that, in the future, more than one method will
order to recommend these to tourists. be applied for analyzing data. In this paper we
focus on analyzing tourist profiles from two
different perspectives, namely: quantitative
2. LITERATURE REVIEW and qualitative. In the following sections, the
related studies illustrate how traditional data
Currently, data mining supports various kinds
mining techniques have been used in the tour-
of application tasks: from data pre-processing
ism industry.
to association rules discovery, data clas-
sification, and cluster analysis respectively 2.1. General Research into
(Witten & Frank, 2011). Actually, it is part of the Tourism Industry
the decision-making process; and the avail-
ability to analyze data automatically helps to Data mining offers different approaches by
determine a potential model. It also assists in which to gain knowledge and subsequently use
estimating customer behavior in the realm of this knowledge in the realm of, specifically:
enhancing decision makers’ ability to both enhancing revenue, identifying customers’ be-
adjust marketing strategy and reduce risks havior patterns, recommendations to customers,
(Li, 2012; Han & Kamber 2006). In another etc. In the tourism industry in particular, a great
study (Bose, 2009), concentration was given to deal of work has been devoted so far, but still

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
50 International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014

the domain of research in this area is vast. More of these attributes are extremely important for
research studies are needed in order to provide travelers, specifically: price (value), and food.
better results compared to what we have now. The attribute value is highly associated with
For instance, using data mining techniques customer loyalty. The first result shows that
based on the information as mentioned can Asian travelers pay more attention to the prices
assist hotels to stay competitive. Hotels can offered by hotels as compared with Western
identify the most profitable methods by which travelers. The second result shows that, having
to develop a customer loyalty program. For good food and hospitality is important to both
example, tourism revenue in South Korea has Western and Asian groups. Similar research
doubled in the last decade. The reason for this has been carried out by Choi (2000) regarding
is that South Korea expanded their capacity Asian and Western travelers in Hong Kong. The
for hosting visitors by renewing their hotels method of data collection in this research was by
and increasing the number of hotel rooms. use of a questionnaire in the Hong Kong airport
Subsequently, competition between hoteliers and distributed among those visitors who live in
increased. A survey of 281 customers helped to the country. In the questionnaire, travelers were
edge closer to the answers of some questions, asked questions relating to demographic and
such as: which customers are likely to return personal issues, as well as questions relating to
to the same hotel; or, which service attribute the hotels where visitors resided. The feedback
is more important to customers and so on. The regarding hotels included: cleanliness, location,
survey took place among 281 guests in 11 room rate, service, value for money and location.
hotels in Seoul, South Korea. Many character- 540 people responded to the questionnaire and
istic information samples were collected and the collected data was analyzed by using factor
considered as sources of data; these included analysis. Results showed that Asian travelers
price range, location and service amenities generally spend less money than Westerners.
respectively. Visitors also provided the authors One of the reasons for this issue is that most
with data related to their demographic profile, Asian countries are among developing nations;
including: age, occupation, gender, nationality as a result, their salary is normally less when
and so on. All participants had visited at least compared with Westerners. Analysis also shows
one of the 11 hotels (Kosala, 2000). A study by that Westerners spend approximately 45% of
Poon (2005) researched the level of satisfac- their budget on accommodation, whereas Asians
tion with Malaysian hotels among travelers. spend less than 25% on this. Another result,
Their research question aimed to find differ- which supports the previous one, is that 70%
ences between Asian and Western travelers in of Asians look for midrange hotels. Results
an attempt to evaluate customer satisfaction also identified that Asians are more interested
issues with Malaysian hotels. The goal of the in shopping than Westerners. Asians generally
study was to suggest some ways to improve like to spend more than 50% of their budget on
hotel services, as well as examining outlying shopping. They also tend to place more emphasis
factors that provide specific services, which on value for money, whereas Westerners are
may encourage visitors to revisit the country. concerned with room quality. The final result
By distributing a questionnaire at KLIA Air- shows that Asians are very interested in spend-
port among 200 tourists who were leaving the ing their time on entertainment during the trip.
country, data was collected and analyzed. The
results showed that 43% of visitors were in the 2.2. Data Mining in
age group of 21-35 and 32% of them were in the Tourism Industry
age group of 36-50. There are some attributes
in the questionnaire relating to factors such as: Discovering new, stimulating and valuable
cleanliness, rooms, location, sleep quality, value knowledge through the use of a variety of
for money and services. Results show that two techniques such as; classification, clustering,

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014 51

visualization and association rules is the main every part of tourism such as the hotel industry;
aim of data mining (Romero, 2009). In the for instance, by offering tourist information
meantime, the necessity of using data mining kiosks to provide customer information dur-
in tourism is inevitable. In this section, we will ing their stay and then analyzing their data. A
investigate some related works regarding data research study was carried out in South Korea
mining in the tourism industry. For instance, regarding data mining; the hotel industry pro-
Guoxia (2009) focused on using a decision filed visitors based on their preferences and
tree algorithm to analyze tourist markets from comments. The author attempted to examine
two different dimensions including: impact the effect of a customer’s demographic profile
factors of tourist spending and impact factors on hotel choice. Visitors were categorized by
of tourists’ comprehensive evaluation of tourist some factors including: country of origin, oc-
destinations respectively. The first step focuses cupation, sex, age etc; as a result, they identified
on total expenditure of tourists in the classifica- interesting relationships existing between items
tion objects in the decision tree model; while in a dataset. Finally they came up with some
the second step focuses on attributes and values “if-then” rules, which show the probability of
including: cost, traffic, shopping environment, existence of some facts by analyzing the data.
dining etc. By using the decision tree algorithm For instance, one of the rules indicates that if
and associative rules mining to analyze the a customer stays at the hotel for a convention,
data and some rules, it was found that, for then he/she is probably a manager. From this
example, poor shopping environments affect finding that the customer is a manager, we can
tourist sentiment even if they are located in an acquire other information based on findings
area having the best attractions. Another result about managers. For example, if a customer
illustrates that, if tourism products only focused prefers either a room with a single bed or a suite,
on an attraction’s landscape in itself, the tour- pays for the hotel room by credit card, and stays
ism product would certainly decrease the value at the hotel for travel/pleasure; the customer is
itself. Another study proposed a new semantic likely to be Korean. Further, the author found
association rule mining algorithm, which intro- that some combinations of customer character-
duced a genetic algorithm. This method deals istics have more influence on the customer’s
with textual information and divided character- hotel selection and patronage behavior than a
istic words into various categories in an attempt single attribute. Hotel management would be
to find association rules between them. They wise to take into account a multitude of attributes
categorized tourist emergency information into such as: customer’s demographic profile, travel
five categories, namely: object, environment, purpose, prior service experiences, as well as
activity, event and result. The author believes the availability of certain amenities (Min Hokey,
that this technique is useful for searching rules 2002). It was just one of the examples among
in a database. The results indicate, further, the rules that they had defined.
that the algorithm can obtain semantic rules
effectively with a higher extraction rate and 2.3. Techniques in Tourism Data
correctness. In extracted association rules with
high confidence, the appearing words present The most important unsupervised learning tech-
the concepts of an emergency event. Hence, nique is that of clustering. Accordingly, attention
several rules discovered from small associa- is focused on ways to identify a structure in a
tion word sets have been used to search for group or collection of unlabeled data in order
emergency reports of tourism. Furthermore, to organize elements into groups (Isa, 2009).
semantic association rules mining would obtain Clustering refers to a process of clustering
a better result from a more sophisticated text and grouping together a collection of abstract
set of tourism emergency information (Zhou, or physical objects into similar groups. In a
2008). Data mining is present in occasions of cluster, objects of one group are different from

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
52 International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014

the objects of other groups (Etemadi, 2010). a comprehensive storage of tourist data which
In this paper, we intend to discover interest- could be used in our research.
ing knowledge which can form the enormous
amount of data in the web by using the clus- 3.2. Data Collections
tering technique. A cluster can be known as a
collection of data elements, which are similar to Due to the formidable task of gathering data
each other and also different from other cluster manually, the data were crawled from www.tri-
elements. Clustering refers to grouping data padvisor.com by providing the task specific web
into clusters with minimum similarity existing crawler, which was named as “RK Crawler”.
between elements of two different clusters and RK crawler is able to obtain a special place
maximum similarity between elements of each name, search for it in the tripadvisor database,
cluster (Larose, 2005). Another useful technique and then extract the tourist’s data concerning
for discovering rules between elements is by this place.
Association Rule Mining. This method is a data After collecting data by means of the
mining technique, which was introduced by crawler, data will then be exported automatically
Agrawal in 1993. He proposed the algorithm to Microsoft Excel with CSV format.
of Apriori in 1994 (Agrawal, 1996). The main The extracted dataset includes various
idea of Apriori is to find a relationship between attributes such as: age, gender, country, rat-
different items of a database. The association ing, who the tourist traveled with, purpose of
technique is useful to find a relationship be- travel, tourist type, tourist’s contribution, visited
tween two or more attributes and it attempts to cities, title, comments, places of interest, etc.
find rules between attributes. Association rules Visualizing some significant attribute of data
mining helps to extract this kind of knowledge can demonstrate relationships between items.
from a database so that it can be used by recom- The results show that, if a visitor is interested
mendation systems. in spas, there is a 67% probability that he/she
is also interested in going shopping.
Our results show that if a visitor is inter-
3. METHODOLOGY ested in theme and amusement parks, there
is a 73% chance that he/she is interested in
Our methodology is categorized into five phases shopping as well.
which include: (3.1.) defining required data,
(3.2.) data collections, (3.3.) pre-processing, 3.3. Pre-Processing
(3.4.) data mining and (3.5.) conclusion.
Pre-processing is the third step of our methodol-
3.1. Defining Required Data ogy. In this phase, the data needs to be clean,
normalized and transformed. Further, the col-
The phase relating to defining required data lected data needs to be compatible with a suitable
focuses on distinguishing which type of data format to use in Weka data mining software.
needs to be analysed. Thus, the first step is For preparing data with ARFF format, we need
detection of the tourist data. For identifying to follow some rules regarding standardization
required data, we found a target website which of this format to load in Weka software. Figure
gathers significant data about tourists, including: 1 demonstrates a collection of data with ARFF
age, gender, country of origin, goal of travel, format for loading in Weka software.
travel partners etc. Therefore, www.tripadvisor.
com was chosen as a repository within which 3.4. Data Mining
to collect the data. Tripadvisor is the world’s
largest travel website, with more than 200 The most important part of this research is
million unique monthly visitors and over 100 mining the data. Unlike traditional data mining
million reviews and opinions. In addition, it has techniques, our research includes quantitative

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014 53

Figure 1. Weka tags by ARFF format

and qualitative analysis so as to obtain conclu- and tourists’ preferences. Therefore, the results
sive results. We selected Weka software to apply from the Nvivo part will support the findings
visualization, clustering, and association rules of the Weka section.
mining to attempt to find stimulating knowl-
edge and rules existing between elements in a 3.5. Conclusion
dataset. We focused on two main techniques,
namely: clustering and association. Clustering Conclusion is the last phase of our methodol-
places an emphasis on building models, which ogy. In this phase, interesting knowledge will
are able to assign new instances to one of a set be revealed from the data; also, outcomes of
of well-defined classes. By using clustering Weka and Nvivo will be compared with each
techniques, we can determine the segmentation other to gain conclusive results. According to
of customers. K-Means and EM algorithm were results, the places would then be recommended
applied for clustering the data. Meanwhile, an to specific groups of tourists based on their
Apriori algorithm was applied to discover the preferences. On the other hand, the output of
association rules existing between items in our this phase is the output of entire phases of this
dataset. Likewise, for acquiring more detailed methodology.
understanding and extraction of knowledge
from numeric and non-numeric information 4. EXPERIMENTAL RESULTS
forms, Nvivo software is highly recommended
(Bazeley, 2013; Richards, 1999). Qualitative In this paper, several analyses have been car-
and text analysis is then fulfilled by Nvivo ried out on data from different dimensions. We
through coding of the tourist’s comments in analyzed the data of 587 profiles of visitors to
order to draw out the model. The model can interesting places and 610 hotel guests in Mers-
be delivered via significant points of places ing in Johor Bahru. As mentioned before, our

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
54 International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014

case study is located in Johor Bahru. Johor Bahru relationship between impact factors of places
is the second-largest city in Malaysia, located and tourists’ preferences.
in the southern part of the country. The city is
very close to Singapore; thus, Singaporeans can 4.2. Statistical Standpoint
reach it quite easily. Tourism has become one of Tourists Information
of the most significant key areas of economic
growth for Johor Bahru. There are several places In this part, the tourist data will be surveyed
of interest in Johor Bahru including: Legoland, from the statistical information aspect. Figure
Danga Bay, Johor Zoo, etc. Also, the Mersing 2 illustrates the frequency of visitors by their
area is located in Johor Bahru; this is an area country of origin.
which has a considerable amount of hotels and According to Figure 2, the largest number of
resorts for tourists. Thus, this research intends visitors can be seen as coming from Singapore.
to conduct a study of the Johor Bahru tourism Beside Singaporeans, most of the visitors are
industry based on interesting places and Mers- from Malaysia. These numbers are especially
ing hotels. Through analyzing tourist data of so for visitors to Mersing hotels. Mersing ho-
Johor Bahru, the results deduced varied and tels are almost empty of Middle Easterners but
useful information. full of Singaporeans, due to the short distance
between the two places. Figure 3 displays the
4.1. Analysis of Results factor based on age groups. Mersing Island is a
place that potentially attracts visitors because of
The process of analysing tourist data com- its natural beauties. Our result shows that most
menced by collecting data from the www.tri- visitors who visit this island are from Singapore,
padvisor.com site by means of the RK crawler. Malaysia, and the UK. There are high levels of
RK crawler is able to obtain the place name and interest in different activities; results show that,
search the tripadvisor database for this name. It apart from visiting beaches and historical places,
then crawls visitor data for information such as, having good food and wine, as well as having a
specifically: age, gender, country, comments, choice of outdoor and adventure activities are
etc. According to the number of visitors, the very attractive options for visitors. Our find-
length of time required for gathering the data will ings also show that there are some differences
be rendered variable by the crawler. In addition, between types of visitors. For instance, males
RK crawler can load the extracted data in CSV have different hobbies compared with females.
format automatically. Collected data needs to be In addition, some “hidden facts” revealed that if
clean and normalized in order to be analyzed by a visitor is interested in one particular activity,
data mining software. The format of gathered he/she may have interest in some other similar
data is then changed to ARFF, which is compat- activities. In the following sections, we will
ible with Weka data mining software. Before use different kinds of data mining techniques
mining the data, some statistical information can to uncover these interesting rules.
be extracted, for example: frequency of visitors Figure 3 shows the overall rate of reviews.
based on country of origin or dispersion of them In addition, it indicates three factors, namely:
based on age and other attributes. However, the cleanliness, service and value grouped by
hidden rules between their attributes are not gender, as well as age group. The figure distin-
always obvious until completion of data mining; guishes satisfaction of different groups based on
the data; thus, data mining is the next step to these factors. Results show that persons in the
finding the hidden rules between attributes in age group of 18 to 24 display less satisfaction
dataset through clustering and association rules than all other groups according to the average
techniques. Likewise, qualitative analysis was review rates; while the age groups of 35 to 49,
fulfilled by Nvivo software by means of coding 25 to 34, and 65+ respectively tend to have
the tourist’s comments. The model illustrates the greater levels of satisfaction. We discovered a

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014 55

Figure 2. Frequency of visitors sorted by their country of origin

Figure 3. Factors by age and gender group

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
56 International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014

further fact about the 65+ age group, namely: identify the important groups of tourists based
that those visitors in this age group are not as on their attributes. It was considered that the
satisfied with the six factors, as were the other clustering technique would be helpful in this
age groups. research to detect the most important clusters.
It was found that Malaysian and Singa- In the following section, clustering techniques
porean tourists are more interested in doing will be applied by Weka in order to recognize
shopping during their travel. Even though there notable clusters of tourists.
are many tourists from United Kingdom, the
interest level is still very low. Another notable 4.3. Exploring Notable
fact identified is that Belgians are very inter- Clusters of Tourists
ested in casinos and amusement parks. A further
interesting result for us during this analysis It was recognized that detection of significant
was that Chinese visitors to Mersing hotels clusters of tourist profile data would be helpful
are exclusively female; while French, German, for us to gain more knowledge in the realm of
Indonesian and Dutch tourists are mostly male. tourism. We executed the K-Means and EM
Surprisingly, Dutch tourists were not found to algorithms in order to determine the existence
be interested in outdoor and adventure activi- of some clusters from visitor data. After clus-
ties. Similarly, Malaysians and Singaporeans tering our dataset, we discovered that some
are more interested in shopping and spas. clusters were especially important in gaining
Within the hotel visitors’ profiles, there were more knowledge.
12 activity interests mentioned. In addition, the As a result of the clustering section, we
results show that a high percentage of visitors found that some clusters can be helpful for
are interested in particular activities such as us in this research in terms of recommending
visiting the beach, outdoor activities, having especial places for each of the clusters. Table 1
great food etc. Hence, we did not use them in illustrates a summary list of findings from the
the analysis because it was obvious to us that clustering section.
they are favorite activities. However, some
4.4. Interesting Rules
activities with a low interest level may still have
Relating to Tourist Data
some hidden relationships with each other. This
kind of information was revealed after statistical By escalating the number of attribute findings,
analysis was carried out on the dataset; however, identifying the rules between them becomes an
for the purposes of acquiring the extra interest- increasingly difficult task. However, some rules
ing knowledge from our dataset, we needed to

Table 1. A list of findings from clustering techniques

Cluster Finding
Singaporean females aged between 25 to 34 and who travel for fun are usually interested in spas,
1 shopping, good food and wine, outdoor activities, historical and cultural places as well as the beach
and sun.
Malaysian females aged between 35 to 49 and who travel for work/fun tend to be interested in spas,
2
shopping, good food and wine as well as the beach.
Singaporean male visitors aged between 35 to 49 and who travel for fun, are interested in good food
3 and wine, outdoor and adventure activities, visiting museums, historical and cultural places, as well as
the beach and sun.
British female visitors in the age group of 25 to 34 who travel for fun tend to be interested in good
4
food and wine, visiting museums as well as historical and cultural places and the beach.

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014 57

are not intuitive in a tourist’s dataset. In this a model based on some attributes. The figure
instance, association rules mining techniques below is the output model of Nvivo achieved
can lead to the appearance of some hidden rules after coding tourists’ comments based on the
among items in the dataset. An Apriori technique impact factors of Legoland and the country of
is then applied to uncover hidden relations origin of visitors.
between elements in the dataset. Amongst nu- Affirmative points argue the need for
merous rules, we were able to discover several a suitable place to be enjoyed by the whole
useful and valuable ones, some of which are family, including children; while the negative
listed below in Table 2. points present liabilities of Legoland, such as
As can be seen from the above rules, Aus- high food prices and hot weather. A number of
tralian female tourists enjoy visiting Legoland; results can be extracted from this model. For
while Singaporean male tourists tend to prefer to example, Australian tourists concentrate on
have an option of good food and wine, as well positive points (such as its being a fun place
as outdoor activities. These kinds of findings for children and great for the family); whereas
were not detected through clustering techniques; Indonesians argue that the prices for food are
therefore, association rules lead us to gain more expensive. An interesting point gleaned from
interesting knowledge from tourists’ respective this figure is that, in spite of there being negative
datasets. In the next step, we proceed to analyze points, some foreign visitors (notably Singa-
the same dataset using qualitative analysis poreans and Australians) only considered the
by Nvivo. Through the application of Nvivo, positive points. Likewise, word frequency can
the comments of tourists will be analyzed to be achieved through Nvivo for identifying the
catch the salient points of comments made. We most frequently-used words and their synonyms
hope to support both findings using different in tourists’ comments. In this case, the results
approaches. indicate that a great bulk of visitors emphasize
positive points of Legoland; which was the same
4.5. Textual Analysis of result as in the model. Nvivo analysis supports
Tourist’s Profile our result in the Weka analysis section; therefore,
these two analysis techniques can be useful in
The Nvivo qualitative analysis was carried out the tourism industry in order to acquire definite
to in order analyze the comments tourists tend results of tourist data.
to make about various places. Using Nvivo
actually helped us to make our qualitative
research more transparent; it provided us with 5. RECOMMENDATIONS
the ability to better address some of the key FOR TOURISTS
challenges in professional communication re-
search, including: efficiency, multiplicity and According to the results provided in previous
transparency (Hoover, 2011; Denardo, 2002). sections, we can recommend a list of appropri-
We coded tourists’ comments so as to extract ate places and services for specific groups of

Table 2. List of findings from association rules techniques

Rule Finding
If a tourist is from Australia, female, and aged 35 to 49, then she will most likely select Legoland as a fun
1
travel destination.
2 If tourists are aged 25 to 34 and choose Sanrio Hello Kitty Town, they are most probably females.
If a hotel visitor is from Singapore, male, and interested in good food and wine, he is also likely to be
3
interested in outdoor and adventure activities.

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
58 International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014

tourists based on their preferences. For places of the Weka analysis section of our research
of interest, we can suggest the model below so as to gain conclusive results. Findings of
for visitors. This model demonstrates suitable this study can be used by tourist associations
places for each group of tourists; for instance, and hotel managers in Johor Bahru in order to,
Legoland is one of the best options for Ma- specifically: identify tourist behavior patterns,
laysian, Singaporean and Australian travelers. improve their services, offer specific travel
According to the findings in the associa- packages and, finally, enhance the number of
tion rules section in Table 2, we conclude that tourists arriving in Malaysia.
Legoland can be included on our recommen- In the future, we would like to carry out
dation list for Australian female tourists, as further experiments, in particular, enlarging the
well as Malaysian and Singaporean adults. scope of study for the mining of voluminous
It is proposed that this model would be very amounts of data. Accordingly, it would be very
beneficial if used by hotel managers. The first helpful to propose an automated process by
rule displays that, since the spa area is of great which to collect and prepare data with suitable
interest for Asians, as well as female visitors, format autonomously. In this respect, the authors
it must then be recommended to this group of would not have to pre-process and convert the
visitors. The second rule illustrates that pack- format manually. Crawled data would actually
ages of good food and wine can be offered to be ready to be analyzed through just a few clicks
Singaporean and spa visitors. in a short space of time. Hence, it would provide
an opportunity to analyze the tourist profile of
all interesting spots and hotels in the world in
6. CONCLUSION AND a short time-frame and easy manner.
FUTURE WORK
This study has applied quantitative and qualita- ACKNOWLEDGMENT
tive analysis together in an attempt to discover
hidden knowledge from tourist profiles in www. The author would like to express gratitude for
tripadvisor.com. Several data mining and tex- the thoughtful suggestions and warm support
tual analysis techniques (such as visualization, received from the postgraduate students in the
clustering, association rules mining and Nvivo field of IT Management at Universiti Teknologi
qualitative analysis) have been carried out to Malaysia (UTM). He would also like to thank
show the performance of data mining in the his dear parents for their love and encourage-
tourism industry. In essence, we have shown that ment. UTM is gratefully acknowledged for its
Nvivo analysis in part can support the findings Research University Grant (vot 07J79).

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Asian Business and Information Management, 5(4), 48-59, October-December 2014 59

REFERENCES Isa, D., Kallimani, V. P., & Lee, L. H. (2009). Using


the self organizing map for clustering of text docu-
ments, Journal. Expert Systems with Applications,
Agrawal, R. (1996). Fast discovery of association
36(5), 9584–9591. doi:10.1016/j.eswa.2008.07.082
rules. Advances in knowledge discovery and data
mining (Vol. 12, pp. 307-328). Kosala, R., & Blockeel, H. (2000). Web mining re-
search: A survey. Journal ACM Sigkdd Explorations
Au, N., & Law, R. (2002). Categorical classification
Newsletter, 2(1), 1–15. doi:10.1145/360402.360406
of tourism dining. Journal of Annals of Tourism
Research, 29(3), 819–833. doi:10.1016/S0160- Larose, D. T. (2005). Discovering knowledge in
7383(01)00078-0 data: An introduction to data mining. Available at
www.wiley.com
Bazeley, P. (2013). Qualitative data analysis with
Nvivo. Sage Publications Limited. Li, J. (2012). Exploring the Destination Image of
China through International Urban Tourism. Uni-
Bose, Data mining in Tourism, 2009, Elsevier.
versity of Waterloo.
Choi, T. Y., & Chu, R. (2000). Levels of satisfaction
Poon, W.-C., & Lock-Teng Low, K. (2005). Are trav-
among Asian and Western travellers. International
ellers satisfied with Malaysian hotels? International
Journal of Quality & Reliability Management, 17(2),
Journal of Contemporary Hospitality Management,
116–132. doi:10.1108/02656710010304537
17(3), 217–227. doi:10.1108/09596110510591909
Denardo, A. M. (2002). Using NVivo to analyze
Richards, L. (1999). Using NVivo in qualitative
qualitative data.
research. Sage (Atlanta, Ga.).
Etemadi, R. (2010). An approach in web content
Romero, C., Ventura, S., Zafra, A., & Bra, P.
mining for clustering web pages (pp. 279–284).
(2009). Romero, Cristóbal VenturaN, Applying
IEEE. doi:10.1109/ICDIM.2010.5664660
Web usage mining for personalizing hyperlinks in
Guoxia, Z. (2009). The application of data mining Web-based adaptive educational systems. Comput-
in tourism information. IEEE, 3, 689-692. ers & Education, 53(3), 828–840. doi:10.1016/j.
compedu.2009.05.003
Han, J., & Kamber, M. (2006). Data mining: concepts
and techniques. Elsevier. Witten, I. H., & Frank, E. (2011). Data mining Practi-
cal machine Learning Tools and Techniques. Elsevier.
Hokey, M. (2002). A data mining approach to devel-
oping the profiles of hotel customers. International Zhou, Y. (2008). Constructing tourism association
Journal of Contemporary Hospitality Management, words set based on association rule mining. IEEE,
14(6), 274–285. doi:10.1108/09596110210436814 1, 571-574.

Hoover, R. S. (2011). Using NVivo to Answer the


Challenges of Qualitative Research in Professional
Communication: Benefits and Best Practices Tutorial,
Professional Communication. IEEE Transactions
on, 54(1), 68–82.

Copyright © 2014, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Das könnte Ihnen auch gefallen