Beruflich Dokumente
Kultur Dokumente
Introduction
The preceding chapter documented the development of the conceptual model for the study and consequently presents evidence in support of the proposed relationships among constructs examined in the study. This chapter is the link between the research model and related hypotheses developed in the previous chapter and the empirical results presented in the next two chapters (6 and 7). The present chapter focuses upon the philosophical assumption adopted as a method of enquiry in the current investigation, research design chosen, describes the data collection methodology employed in the development of the research instrument and the blueprint used for evaluating and testing the hypothesised model.
In sum, the present chapter is devoted to the discussion of the design of the empirical methodology process to be implemented in the testing of the hypothetical model developed previously. In other words, it focuses on the what, why and how questions pertinent to the selection of research methodology and their outcomes with respect to the reliability, validity and generalisability of the research undertaken.
5.2
In addressing the research questions as stated in Chapter 1 (see Table 1.2, p. 20), the thesis adopts a positivist paradigm. This positivist paradigm is a broad movement of thought, started in the second half of the nineteenth century. It is based on application of the methods of natural science onto the social sciences. The main objective of positivist approaches is concerned with discovering empirical generalisations concerning social phenomena (Johnston 1986). The thinker in this movement of thought appreciates the positive facts of restoring the world of nature. Positivists assume knowledge is achieved by following a precise, predetermined approach in gathering data. In addition, they believe that a commitment to quantitative precision and accumulation of facts is the only way to build an ever-closer estimation to a reality that exists independent of human perception (Rubin et al. 1996).
The research process in this study began with exploratory phase with an exhaustive examination and review of existing literature and a consultation with relevant people to obtain facts and figures related to the examined industry (see Section 5.5). This took 147
place before the development of the theoretical framework. Subsequently, the main stage in the research process adopts the positivist paradigm. The hypothetic-deductive approach McNeill and Townley (1986) was adopted. This approach assumes that the research process involves eleven steps (see Figure 5.1). A hypothesised conceptual model, which consists of relevant constructs were developed and tested. The hypotheses were tested through epistemological assumption of scientific approach, which suggests the use of a large-scale survey of representative sample. Using Statistical Package for the Social Science (SPSS) (Version 11), data were analysed for a descriptive statistics and the hypothesised conceptual model, which consists of relevant constructs, was tested by employing the structural equation modelling. Figure 5.1 clearly illustrates step by step the research process that should be undertaken if this approach is to be utilised.
Figure 5.1
2.
Observation/ideas
3.
Hypothesis (testable)
5. 7. Refute hypothesis 6.
9. 10.
148
5.3
The research design is the plan for a study that provides the specification of procedures to be followed by researchers in order to accomplish their research objectives or test the hypotheses formulated for their studies (McDaniel and Gates 1999). Churchill and Iacobucci (2002) describe it as the blueprint to be adhered to in completing a study. The function of a research design is to ensure that evidence generated from the data would be able to answer the research question (or theory) confidently and convincingly (de Vaus 2001).
In general, the research design is categorised into three basic types: exploratory, descriptive and causal (Churchill and Iacobucci 2002; Malhotra 1996). The primary purpose of an exploratory study is to acquire preliminary hunches or ideas into a vaguely defined research problem, which could provide the foundation and direction for a fruitful research investigation (Parasuraman 1991). Exploratory research comprises literature searches, experience surveys, focus groups and case analyses. Exploratory research is suitable to address any research issue where relatively little knowledge is available.
Descriptive research, as the name implies is essentially to describe something (Parasuraman 1991), and Sekaran (2000) specifies that it is undertaken in order to assess and describe the characteristics of the variables scrutinised in the study. Descriptive research can be classified into two basic types: cross sectional and longitudinal studies. Cross-sectional study is the most widely adopted by marketing research; it is a one-time study which primarily involves a sample from a specific population (Churchill and Iacobucci 2002; Malhotra 1996; Parasuraman 1991). Longitudinal study, on the other hand, is a repeated measurement approach which basically entails the selection of a panel sample from a population and gathering data from the same panel on several different occasions (Churchill and Iacobucci 2002; Parasuraman 1991). It is a widely accepted conception that longitudinal studies are more informative and realistic than crosssectional studies, just as a motion picture conveys more interesting and revealing messages than a still picture, but at the expense of higher costs and demanding a longer time frame.
Causal research design, which is typically implemented by experimentation, is the best method to determine cause and affect outcomes (Churchill and Iacobucci 2002). The 149
main purpose of this research design is to segregate cause(s) and to judge to what extent such cause(s) have impacts on effect(s). Simply put, the causal studies examine whether one variable causes or determines the value of another variable (McDaniel and Gates 1999).
Nevertheless, these three research designs can be conceived as stages in a continuous research process (Churchill and Iacobucci 2002). Figure 5.2 illustrates the interrelationships between the three basic types of research design.
Causal Research
Exploratory research is generally employed as an initial step to provide insights and understanding of the specific phenomena investigated. Given that the primary objective of the present study is to assess the applicability of the Expectancy Disconfirmation Theory within the consumption system perspective, it was deemed that a descriptive research design is the most appropriate in providing answers to the research questions. It is important to highlight that even though descriptive research design is primarily used in this study; inevitably in the early stage of the study, exploratory research was imperative in order to gather initial knowledge particularly in identifying the specific attributes, features and subsystems (components) which are distinctive to the research setting investigated.
At the exploratory stage, beside an extensive literature search, investigative personal interviews with key executives in the direct sales industry and an experience survey (key informant survey) were also executed. The simultaneous exploration of both information sources (interviews and relevant literature search) provided this study with more 150
profound knowledge by giving an overall picture (Churchill and Iacobucci 2002) of the consumption system under investigation. In concordance with the argument put forward by Parasuraman (1991), this study could still examine the hypothesised causal linkages among constructs, under the condition that the effects of uncontrollable variables are filtered through stringent measures to ensure its validity, reliability and generalisability (see Section 5.7). In summary, with respect to the nature of this study and the limitations under which it is implemented, clearly the amalgam of exploratory and descriptive cross-sectional research design was deemed most appropriate.
5.4
Figure 5.3 diagrammatically summarises the choices this study made at various decision levels with regard to the data collection method. It should be noted that in order to provide an overall picture of the data collection process, it is best to incorporate the research design which involves a series of rational decision-making choices, as been discussed earlier (Section 5.3).
Once the research problem has been defined and clearly stated, the process turns to data collection. Generally, data sources are classified into two main types: secondary and primary data. Churchill and Iacobucci (2002) suggest that first attempts at data collection should begin with secondary data. Secondary data are statistics or information already collected by prior research for some other purpose, not specifically used to solve research problem at hand. Primary data are the data collected by the researcher specifically for the purpose of examining the current research problem (Churchill and Iacobucci 2002). Given the nature of the present research questions and the research objectives to be accomplished, clearly primary data sources are the most appropriate; nonetheless, secondary data are also utilised to add insights and support for some facets of the research decision process.
Tull and Hawkins (1987) assert that the choice of suitable data collection methods should be determined by the type of research problem examined by the study. As such, in this section the choices made will be discussed in light of the particular problem investigated and consequently examined at each level. Figure 5.3 depicts the option this study has selected, which is presented in the shaded boxes. 151
Figure 5.3
Research Design
Section 5.3 Exploratory Descriptive Causal
Section 5.3
Longitudinal study
Observation
Section 5.4.1.1
Self -administered Interviewer-administered
Section 5.4.1.1.1
On-line Postal Drop off and collect
Telephone
Structured interview
Mall-interception
In-office
In-home
Source: Developed by author for the thesis from Churchill and Iacobucci (2002); De Wulf (1999) and Saunders et al. (2003)
5.4.1
Survey Research
Survey research tends to be the most popular method and is generally utilised in the descriptive and causal research design. One of its distinctive features is that it enables the researcher to collect large amounts of raw data using a question and answer format (Hair et al. 2003). Survey research lays emphasis on collecting standardised raw data that in turn permits the investigator to generate information to specifically address the key questions of how, who, what, what and when, pertaining to market factors and the environment, and its prime advantage is its capability to accept large sample sizes at relatively reasonable costs (Hair et al. 2003). Given that the present studys core focus is on consumer consumption behaviour pertaining to satisfaction evaluations and future behavioural intentions, a target sample size of 400 sampling units has been determined for the study; thus, the survey method may be relevant. For further discussion on determination of sample size please refer to Section 5.6.3.2. 152
5.4.1.1
Survey methods using questionnaires differ according to how the procedure is administered, and this specifically relates to the amount of contact the researcher has with the respondents. As illustrated (see Figure 5.4), a questionnaire could be administered using two main methods: self-administration and interviewer
administration. In the self-administration mode, the research instruments are usually completed by the respondents without the presence of an interviewer. It is suggested by Saunders et al. (2003) that self-administered survey can be administered in three alternative ways: 1) Research instrument is delivered and returned electronically using either email or the Internet (i.e. on-line questionnaire) 2) Research instrument is posted to respondents, then returned by post after completion (postal or mail questionnaire) 3) Research instrument is delivered by hand to each respondent and collected later by the interviewer (drop off questionnaire)
On the other hand, questionnaire responses could be administered personally by the interviewer via: 1) Telephone (telephone survey), to enable the researcher to gather information rapidly. Structured personal interview, where the interviewer administers the questionnaire face to face.
2)
The choice of survey method will be greatly influenced by various factors and directly depends on the research questions and objectives. It was suggested by Saunders et al. (2003) that the selection of questionnaire administration greatly depends on: 1) 2) 3) 4) Characteristics of the respondents from whom the researcher wishes to collect; Importance of reaching a particular person as the respondent; Importance of respondents answers not being contaminated or distorted; Size of sample the researcher requires for data analysis, taking into account the likely response rate; 5) 6) Types of question the researcher needs to ask to collect the data; Number of questions the researcher needs to ask to collect the data.
153
After examining the above factors concerning the present studys research questions and objectives, it was decided that the best option for administering the survey instrument was the self-administered method, specifically the personal delivery and collection technique widely known as the drop off survey (Assael and Keon 1982; Lovelock et al. 1976; Hair et al. 2003; Webster 1997). This is also known as the delivery and collection questionnaire approach (Saunders et al. 2003). The drop off and collect1 survey is a less familiar type of survey administration method compared to telephone, personal and mail surveys. In this approach, a member of field staff generally goes to the potential respondents home or work place to deliver the research instrument and returns at a specified date to collect it.
5.4.1.1.1
The drop off and collect survey administration technique is quite similar to the postal survey, the only difference being that the researcher or a member of field staff will deliver and call to collect the questionnaire upon completion by the respondents (Hair et al. 2003; Lovelock et al. 1976; Moutinho and Evans 1992). It is worth noting that the drop off and collect technique has distinct features that attempt to blend the advantages of both the mail survey and personal interview. Interestingly, the drop off and collect survey shares a key characteristic with the postal survey, which is the absence of an interviewer, hence eliminating response-errors due to interviewer bias (Lovelock et. al. 1976; Moutinho and Evans 1992). Past research has revealed that respondents are more likely to give false replies and socially desirable responses when a face-to-face technique is used (Blair et al. 1977).
Lovelock et al. (1976) suggest that drop off questionnaire delivery is suitable for extensive questionnaires and it also help in ascertaining and reaching an appropriate sample, and most importantly, personal delivery by trained survey-takers appears to yield a higher response rate than mail surveys and at a competitive cost. In addition, the drop off and collect technique has a number of advantages, including the availability of a survey-taker to screen potential respondents, stimulate interest in completing the questionnaire. Most importantly, upon collecting the questionnaire, to check for any non-answered questions and also offer clarification to respondents doubts and queries,
1
For this thesis this term is specifically adopted to refer to the delivery and collection of survey questionnaire by hand by the field staff
154
which eliminates or minimises non-response errors (Hair et al. 2003; Lovelock et al. 1976; Saunders et al. 2003).
One special feature of the personal delivery approach is that the researcher can communicate with the potential respondents personally and motivate them to cooperate in the study, rather than merely relying on a covering letter, and can counter any possibility of objections (Webster 1997). Conversely, (Lovelock et al. 1976) points out that a personal visit may be regarded as threatening and may result in the potential respondent declining to participate in the study. It is important to note that Assael and Keon (1982) demonstrate that mail and drop off methods have been empirically demonstrated to be most effective in minimising non-sampling errors (e.g. non-response errors and response errors) compared to personal and telephone surveys. It is speculated that the nature of mail and drop off methods, which do not demand immediate attention and response, is a major contribution to this favourable outcome.
Accordingly, after examining the various questionnaire administration methods presented above and taking into account the characteristics of the target respondents, the type of the questions, which demand respondents to recall specific purchasing events, the relatively long time required to complete the questionnaire and the geographical coverage for distribution of the questionnaire, the drop off and collect technique is considered the most appropriate mode of questionnaire administration for this study. This is despite the fact that this questionnaire administration method is fairly expensive in comparison to postal surveys and the time needed for the data collection increases markedly if the samples are geographically dispersed (Saunders et al. 2003).
In response to the above arguments, in an endeavour to minimise cost and time; a strategy for questionnaire distribution has been designed such that the questionnaire will be delivered to work places (i.e. in-office) as opposed to in-home. By accessing potential respondents in public places (common places) specifically in the office context, it was anticipated that multiple respondents could be obtained within the same premises; this could reduce travelling costs and shorten the time involved in distributing and collecting the research instrument. Furthermore, it is speculated that the in home questionnaire distribution method is likely to incur high non-response rates due to refusal, inability to gain entry to potential respondents homes or potential respondents not being at home 155
when the researcher calls. This is especially true with regard to the target area of questionnaire distribution, which is made up of three districts situated within an approximate 50 kilometre radius from Kuala Lumpur city centre. Most adult members of the family who reside within these targeted districts will be at work or studying in colleges from 7.00 in the morning until 7.00 in the evening during the week, and therefore it is quite impossible to distribute research instruments during these hours.
Potential respondents are likely to feel safer and more willing to accept an invitation to participate in a study when it is conducted in the office environment or in a public place places (i.e. shopping mall) as compared to private dwellings, and this could also ensure the safety of the field staff. Therefore, the best option appears to be to obtain respondents at their workplaces (De Wulf 1999; Hair et al. 2003). Interestingly, through casual observation by the author, generally the actual purchases occur in the office premises.
5.5
Questionnaire Development
The guidelines employed for the questionnaire construction for this study were based on the procedure suggested by Churchill and Iacobucci (2002) and refined by the author of this thesis by incorporating the translation process (see Figure 5.4) which was not integrated as part of the procedure in the questionnaire development. It was recognised that marketing research has advanced progressively; questionnaire designing is still an art not a science (Churchill and Iacobucci 2002, p. 314). Similarly Malhotra (1996, p. 320) adds his voice by asserting that the great weakness of questionnaire design is lack of theory. Additionally, Dillon et al. (1990, p. 376) assert that a vital facet in the survey approach is the development of an effectively structured questionnaire. Faulty questionnaires are major contributors to non-sampling errors and particularly to response errors (Dillon et al. 1990). Figure 5.4 illustrates a step-by-step procedure of
questionnaire development which was widely utilised by the researcher as a guideline for constructing an effective questionnaire.
156
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
Step 9
Questionnaire Translation
Step 10
Source: Developed by author for the thesis from Churchill and Iacobucci (2002) and De Wulf (1999)
5.5.1
The specification of what information will be collected depends primarily on the constructs stipulated by the researcher in their conceptual framework. In this regard, the questionnaire was designed to solicit responses for fifteen constructs incorporated in the 157
research framework of the present study (see Figure 4.2). The conceptualisation of all these constructs has been described in chapter 3 and 4. Apart from the questions associated with the constructs investigated in this study, questions pertaining to consumers consumption patterns and demographic characteristics were also included; in order provide a better understanding of the respondents consumption behaviour and their overall profiles.
5.5.2
After specifying the information pertinent to the research, it is important for the researcher to decide on how these questions will be framed to secure responses with the required detail and consequently how to administer the questionnaire. It was decided that structured questions would be the most appropriate for generating data, as the pertinent constructs examined have been conceptualised clearly in chapter 4 and the development of the measures were implicitly documented in section 5.6. It is important to note that three open ended questions were incorporated in the questionnaire in order to provide opportunity for respondents to give their opinions, which could add a further advantage for the researcher by soliciting a richer, more precise explanation and capturing further insights into features or aspects of the phenomenon under investigation that would probably not be covered by the structured questions. Since the method of questionnaire administration has been discussed earlier in section 5.3, this aspect of the questionnaire development will not be covered in this section.
5.5.3
Once the question type and method of administration have been established, the next step is to determine what to include in the individual questions. It was strongly suggested by Churchill (1979) that a concept should employ multiple items in order to minimise high level of measurement error, which are typically associated with single item scales. In response to this suggestion, in the current study attention was devoted to ensure that the measure developed would meet the required validity and reliability criteria, which are explained in section 5.6.5. As such, all the constructs incorporated in the research framework were represented by exhaustive multiple indicators (at least three items for each construct), except for direct seller satisfaction, product satisfaction 158
and direct sales company satisfaction which employed a summary statement (i.e. a single indicator) pertinent to specific aspect of the phenomenon examined (The full questionnaire is presented in Appendix 5.1(a)).
5.5.3.1
This measure was designed to capture the features of direct sales channel. This specific alternative shopping channel has been conceptualised to comprise three interacting subsystems, which are the direct seller, the product and the direct sales company. Generally, customer satisfaction studies have tended to utilise multi-item measures as opposed to single-item scales of customer satisfaction (Bearden and Teel 1983; Churchill and Suprenant 1982, Oliver 1980; Danaher and Haddrell 1996; Kassim 2001; Rust and Zahorik 1993). The reasons for this are twofold. Firstly, the single-item scale cannot provide explicit information with regard to specific features or components and various dimensions which could have formed from the specific phenomenon cannot be ascertained separately (Danaher and Haddrell 1996). In this regard, the single-item construct is unable to capture the complexity of customer satisfaction cohesively (Westbrook 1980). The second reason is the difficulty of assessing construct reliability, which would normally be ascertained by a test-retest format (Yi 1990). Moreover, prior research has reported that multi-item measures are more reliable than their single-item counterparts (Bearden and Teel 1983; Churchill and Suprenant 1982; Oliver 1980; Westbrook 1980). On the basis of these arguments, multi-item measures were utilised in this study. A five-point rating scale, ranging from (1) which denotes very dissatisfied, (2) for dissatisfied, (3) for neutral, (4) satisfied and (5) very satisfied was employed to elicit participants responses. (NA) which denotes not applicable provides an alternative option if respondents do not have knowledge or experience with a specific item in the scale. The present study, a rating scale along the dimension from very satisfied to very dissatisfied was utilised to assess performance instead of the typical performance scales, such as excellent to terrible (Churchill and Surprenant 1982). The main reason for using this scale response was that it was speculated in prior research that if respondents were to evaluate performance on the continuum from excellent to terrible, it might seem as if the respondents were required to visualise a standard or 159
benchmark in order to compare actual performance against the estimated standard. A satisfaction measure may be easier to conceptualise by respondents, as it simply requires them to formalise their opinions, which have resulted from their expectations and perception of quality. Most importantly, satisfaction is inherent to individual internal evaluation (Naumann and Geil 1995).
This response scale was employed empirically by Mittal, Kumar and Tsiros (1999) in their study of the automobile consumption system, which is characterised by product and service subsystems. They measured attribute level performance and satisfaction of the subsystem by employing the very satisfied to very dissatisfied response scale. Similarly, Allen and Rao (2000) portray satisfaction levels and performance interchangeably in an importance performance model and they posited that actual satisfaction levels are typically treated as performance in this approach. Most notably, Kassim (2001) concludes her study by suggesting that performance and satisfaction constructs might measure similar phenomena, which means that they might not be discriminately different. Therefore, based on these arguments, attribute performance and satisfaction at subsystem level should be best measured on the very satisfied to very dissatisfied response scale.
5.5.3.1.1
The first scale concerned with direct selling product performance, which consists of eleven items, each on a five-point scale, assessing the attributes related to the tangible product that respondents last bought, specifically in this context beauty or healthcare products. Since scales for this construct were not available in the literature, eleven items were generated after reviewing the relevant literature and carrying out exploratory interviews, The items utilised for the measure and their sources are exhibited in Table 5.1 (a). This particular question appears as question 34 in the questionnaire (see Appendix 5.1(a)).
160
Note: * This item is operationalised as a Product Satisfaction con struct in this study, consistent with Mittal, Kumar and Trsiros (1999); apparently it is a single item measure.
5.5.3.1.2
The same scale rating, response type and instructions as used for the product performance measure were utilised to assess the performance judgement of the direct seller. Since a scale on direct seller performance was not available to be adopted for this study, several existing scales were used to generate items for this measure coupled, with the exploratory interviews (see Section 5.6.1 and Section 5.6.2). Thirteen items were obtained. The measure assesses the dimensions related to direct seller salesmanship quality, personality and services offered by the direct seller to their customers. Interestingly Crosby and Stephens (1987) found that customers appraisal of the insurance agents who handle their service requests have significant directly impacts on their satisfaction with the specific agent and indirectly influence their overall satisfaction judgements. Table 5.1 (b) depicts the items concerning direct sellers along with their sources drawn from extant literature and exploratory interviews.
161
Table 5.1(b)
Knowledge of products and services Capable and competent Being consistently courteous Following through on his/her promise Provide payment flexibility (instalment) Trustworthy Provide after sales service Giving personal advice and attention Continuity of contact Availability of direct seller Maintain a professional appearance Have customer interest at heart Effectiveness of sales demonstration/presentation Overall direct seller*
Note: * This item is operationalised as a Product Satisfaction construct in this study, consistent with Mittal, Kumar and T rsiros (1999); apparently it is a single item measure.
5.5.3.1.3
Direct selling company performance was measured to assess items related to corporate image and customer service (Crosby and Stephen 1987). Previous research has provided substantial evidence of the role of the institution in influencing customer satisfaction. For example, from the perspective of the retail store context, Westbrook (1981) revealed that retail satisfaction is associated with customer experiences with the store and dealing with the organisation, apart from experiences pertaining to product consumption and services provided by the retailer. VanScoyoc (2000) empirically demonstrates the positive relationship between satisfaction with online shopping and the companys image. In addition, Crosby and Stephens (1987) maintain that a firms advertising and direct communication effort has a significant direct impact on customer satisfaction with the firm, and indirectly affects overall satisfaction
It is important to note that the same scale rating, response type and instructions which were employed in the assessment of product and direct seller performance were also utilised to assess the direct selling companies performance. Following a review of the relevant literature and exploratory interviews, ten items measuring company performance were obtained. The measure assesses dimensions related to direct selling companies corporate image and the corporate customer services provided to customers. Table 5.1 (c) exhibits the items concerning direct selling company measure and their sources. 162
Note: *This item is operationalised as Product Satisfaction construct in this study, consistent with Mittal, Kumar and Trsiros (1999); apparently it is a single item measure.
5.5.3.1.4
Prior literature suggests that a large number of customer satisfaction measurements have utilised three broad scales, performance scales, disconfirmation scales and satisfaction scales (cf. Danaher and Haddrell 1996). Rust et al. (1994) suggest that disconfirmation scales, as opposed to performance scales, should be utilised in customer satisfaction study, based on these rationales: 1) they incorporate the widely used disconfirmation paradigm which condenses the SERVQUAL two-stage measurement of expectation and perception into one succinct scale; 2) the winning feature of disconfirmation scales is that they can be illustrated in term of their positive correlation to satisfaction and customer retention as opposed to expectation construct in both service quality and customer satisfaction studies and 3) the use of disconfirmation scales could reduce the asymmetry in perception of service performance, because customers who rate a particular service as excellent on a scale from poor to excellent scale may also regard the same service as better than expected.
Disconfirmation is defined as the discrepancy between expectation and perceived performance. Churchill and Surprenant (1982) postulate that customers satisfaction is realised when perceived performance exceeds expectation evaluation (i.e. positive disconfirmation). On the other hand, negative disconfirmation arises when perceived performance fall short of customers expectations. In the current study, disconfirmation evaluations were operationalised based on attribute and dimension level (e.g. product disconfirmation, direct seller disconfirmation and direct selling company
disconfirmation). Respondents were requested to rate each attribute and overall 163
evaluation concerning performance against their expectations. The measure on disconfirmation appears as Question 36 in the questionnaire (see Appendix 5.1(a))
Disconfirmation evaluation employed a five-point scale anchored with (1) much worse than expected, (2) worse than expected, (3) just as expected, (4) better than expected, (5) much better than expected. (NA) which denotes not applicable is included as an alternative response for those respondents who have no experience or knowledge of specific attributes. It should be noted that in the disconfirmation measure, only the instruction to respondents and response type were different from Question 34 (Performance measure) and Question 35 (Importance measure) in the questionnaire.
Importance measures based on the same items and aspects as mentioned above were assessed in this study (Question 35) (see Appendix 5.1(a)). The Importance measure was not incorporated in the conceptual framework of this study. However it was utilised only for descriptive analysis (i.e. importance-performance gap analysis). The importance rating is measured on a 5-point scale anchored with (1) not at all important, (2) not important, (3) neutral (indifferent), (4) important, (5) very important.
5.5.3.2
Other Constructs
The following section will briefly describe the operationalisation of the other constructs integrated in the conceptual model, with regard to the scale items, scale rating, response type and instructions to respondents. The constructs are Purchase Decision Involvement, Perceived Equity, Perceived Value, Relational Commitment, Overall Satisfaction and Behaviour Intentions.
164
5.5.3.2.1
It has been defined as the degree of interest and concern that a consumer has taken into consideration in the specific purchase decision (Mittal 1989). The scale was adapted from (Mittal 1989) and refined to tailor the direct sales purchase context. Participants responses were captured by a four-item measure, based on five-point bipolar phrases. The items were presented as questions 15, 16, 17 and 18 in the questionnaire (see Appendix 5.1(a))
Table 5.2
1. Selecting from many types and brands of this product available in the market 2. Various types and brands of this product available in the market are alike or all very different 3. Important to make a right choice of this product 4. Concerned about the outcome of your choice
5.5.3.2.2
Perceived Equity
Three- items on a five-point Likert-type scale were utilised to gauge the degree to which respondents perceived that the purchase transaction was fair, particularly in relation to treatment he/she received from the direct seller from whom he/she purchased the product as mentioned in question 7 (see appendix 5.1(a)). The scale was adapted from Oliver and Swan (1989a, b), and was originally developed for exchanges involving a car dealer; some refinement was made to suit the context of the present study. Three items were selected from Oliver and Swans two scales, which consist of six items. Item 1 and 3 were adapted from Oliver and Swan (1989b) and item 2 from Oliver and Swan (1989a). The items appear as questions 19, 20 and 21 in the questionnaire. Likert scales
2
Question 7 in the questionnaire request participants to specifically indicate the beauty or healthcare product they last bought from a direct selling company.
165
with scale steps of 1 = strongly disagree, 2 = disagree, 3 = neutral, 4 = agree, and 5 = strongly agree were used to rate these items.
5.5.3.2.3
Relational Commitment
In the context of this study, commitment is conceptualised as the direct sales customers enduring desire to continue a relationship with the direct seller, accompanied by the willingness to make efforts at maintaining it. A three-item measure based on a five-point Likert scale was utilised to encapsulate the respondents reaction toward maintaining relationships with their direct sellers. The scale was adapted from Macintosh and Lockshin (1997) and modification of the items was made to suit the direct sales context. These items were presented as questions 22, 23 and 24 in the questionnaire. The questions were phrased as follows:
Please indicate your reaction to the following statements concerning your relationship with the direct seller from whom you purchased the product mentioned in Question 7. Circle one appropriate number using the scale below: Table 5.4 Relational Commitment
Scale item 1. I am very committed to maintaining my relationship with my direct seller 2. I believe my direct seller and I will put some effort into maintaining our relationship 3. I plan to maintain my relationship with my direct seller Source Macintosh and Lockshin (1997) Macintosh and Lockshin (1997) Macintosh and Lockshin (1997)
166
5.5.3.2.4
Perceived Value
Perceived value was conceptualised as the result of the customers trade off between the benefits received and the cost incurred (monetary and non-monetary) to make the purchase from direct seller as opposed to buying the product (beauty or healthcare) from a traditional retail store (Zeithaml 1988). Table 5.5 exhibits the items incorporated in the perceived value scale and their sources. Participants responses were captured using a nine-item scale, measured by five-point bipolar phrases. Respondents were requested to evaluate the extent to which their purchasing experience from the direct sales channel compared with the conventional retail store in terms in terms of product quality, value for money, direct sellers knowledge, convenience and enjoyment of purchasing. These items appear as questions 25 to 33 in the questionnaire (see Appendix 5.1(a))
167
5.5.3.2.5
Overall Satisfaction
Overall satisfaction is conceptualised as a summary of emotional evaluation with regard to all the aspects of the direct sales marketing channel (direct selling product, direct seller and direct selling company) and it is the focal construct of this study. Respondents were requested to evaluate their overall feeling towards buying beauty or healthcare products through the direct sales purchasing system. Overall satisfaction was measured by three semantic differential scale items which have been commonly used in satisfaction studies (Crosby and Stephens 1987; Oliver 1980; Oliver and Swan 1989; Spreng and Olshavsky 1993; Tse and Wilson 1988). However, item C I didnt like it/ I like it very much was utilised instead of favourable/unfavourable because when this phrase was translated to the Malay language, the national language of Malaysia, it was found to be inappropriate in the study context. Table 5.6 depicts the items used for the overall satisfaction construct, and a five-point scale was utilised. This measure is presented as question 37 in the questionnaire.
Table 5.6
a. b. c.
Overall Satisfaction
1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 Satisfied Pleasant I like it very much
5.5.3.2.6
Behavioural Intentions
A behavioural intentions measure was operationalised by asking respondents to respond to the sixteen statements that are exhibited in Table 5.7. They were requested to indicate how likely they would engage in the following activities as suggested in the measure. It should be noted that unlike previous customer satisfaction studies, which typically used few intentional items (Cronin and Taylor 1992; Macintosh and Lockshin 1997; Parasuraman et al. 1988; Taylor and Baker 1994), the current study integrates an exhaustive list of behavioural intentions in the measure. Most of the items presented in the construct were adapted from Zeithaml et al. (1996) and two new items which are 168
appropriate for the direct sales setting were incorporated. The behavioural intentions measure employed a five-point scale, anchored with (1) definitely will not, (2) probably will not, (3) might or might not, (4) probably will, (5) definitely will. The measure was presented as question 38 to 53 in the questionnaire (Appendix 5.1(a)).
Scale items
1. Say positive things about my direct seller from this company to others
Source
The scale was adapted from these sources: (Anderson 1996; Verhoef et al. 2001; Zeithaml et al. 1996) and two new items from exploratory interview.
2.
Encourage my friends to buy beauty/healthcare products from my direct seller If someone asked my advice, I would recommend my direct seller Say positive things about this direct selling company to other people Buy other product/s from my direct seller from this company Repurchase the same product from this direct seller when I need one in future Continue to purchase this product even if my direct seller quit or moved to another location* Continue to purchase this product from this direct selling company even if I could find an alternative with a lower price Continue to purchase this product even if there was a slight increase in price Say favourable things about this product to others Switch to a different product if I experience a problem with my present one Complain to my direct seller if I experience problems with the product Complain to my friends and family if I experience a problem Maintain the same amount of purchase with this company Continue to use this direct selling company as the main provider of my beauty /healthcare products Consider joining the direct selling business myself in future*
3. 4. 5. 6.
7. 8. 9.
10. 11.
16.
169
5.5.4
It should be noted that instructions on how to respond to questions and the response form or type utilised to capture respondents judgements of the constructs investigated have been presented in Section 5.5.3. The majority of the questions were closed-ended with predetermined response types accompanying each question or item; therefore, it is reasonable to believe that respondents should have had little difficulty in replying to the questions. Two main forms of scale response were adopted in this study, namely semantic differential scales and Likert-type scales. The Likert-type scale is a common response type utilised to elicit opinions and attitudes in social science research (Ryan and Garland 1999), and is particularly popular with customer satisfaction researcher (Allen and Rao 2000). In contrast, the semantic differential scale is a form of attitude measure achieved by rating the object or phenomenon investigated on a scale based on a set of bipolar adjectives (Dillion et al. 1990). In order to maintain uniformity, regardless of response type, a five-point scale was applied to all the items in the questionnaire.
Although the number of scale points is considered an important issue and has been the basis for the most disagreement among scholars, very little has said about an optimal number of response categories, as there are no unequivocal answers. It has been recognised that the number of scale points has impact on reliability estimates, and rating scales of five to seven are widely used. It was also recognised that researchers prefer large numbers of scale categories, which undoubtedly allow respondents to make finer judgement to express the intensity of their attitude (Dillion et al. 1990) and thus have greater discriminating power. Using a smaller number of item response choices was reported to reduce the likelihood of item non-response (Leigh and Martin 1987). However, Lehmann and Hulbert (1972) believe that enhancing precision by using greater numbers of scale points may be offset by increased levels of respondent fatigue. Accordingly, Leigh and Martin (1987) and Allen and Rao (2000) posit that the determination of the number of scale points to utilise should also involve a consideration of respondents characteristics, such as educational level, involvement and knowledge in understanding the phenomenon under investigation.
In the current study, all the constructs integrated in the research framework used a fivepoint scale of the interval type. In turn, a five-point scale was deemed most suitable, as opposed to a seven or ten-point scale, because of the target respondents characteristics 170
(Allen and Roa 2000). For example, it was speculated that the buying population for direct selling products generally belongs to the typical Malaysian consumer group, hence it is reasonable to believe that they are in favour of a fully anchored five-point scale which was typically noted for its simplicity and preciseness. Additionally, a fivepoint scale is posited to be appropriate when taking into account the nature of the questionnaire, such as measure complexity, and most importantly the analysis technique that is to be employed (Leigh and Martin 1987). Therefore it could be assumed at this point that a fully anchored five-point scale satisfied the respondents and questionnaire criteria as well as the statistical method employed in this study (structural equation modelling) (see Section 5.7.3)
The current research, the core constructs measured (performance and disconfirmation) utilised unforced itemised rating scales, because it was anticipated that the potential respondents to the questionnaire might not have all the knowledge of the exhaustive list of attributes stipulated in the questionnaire. Therefore, it was considered imperative to include the not applicable response category in these questions. It was advocated by many scholars (Allen and Rao 2000; Churchill 1991; Dillion et al. 1990; Leigh and Martin 1987; Ryan and Garland 1999) that respondents who were lacking in experience might mark the midpoint of the scale or fail to respond to the question if the scale does not have a no opinion, dont know or not applicable response option. As a consequence, this will distort measures of central tendency and variance (Dillon et al. 1990) hence casting doubt on the scales reliability. Most importantly, prior research has revealed that the non-response rates decrease when dont know, unsure or similar response options are provided (Ryan and Garland 1999). In essence, researchers are recommended to include non-response option in scale response categories and subsequently to achieve greater reliability of measures (Allen and Rao 2000; Leigh and Martin 1987; Ryan and Garland 1999).
5.5.5 Wording of Each Question This step involves the phrasing each question in such a fashion as to avoid item nonresponse to occur, as this could create problems when analysing the data. Prevalent wording problems in self-administered questionnaire include ambiguous, leading or uninteresting questions. It is recognised that question wording can affect the answer generated, yet there are very few basic principles that can be relied on when framing 171
questions. The most notable principle was suggested by Dillon et al. (1990, p. 382), who convincingly assert that: A basic admonition is to keep the words simple. Accordingly, they recommend that when there is an alternative between difficult and simple wording, it is best to go for simplicity. Similarly, Churchill (1991, p. 382) adds weight to the above notion by stating that, most researchers experience a vocabulary problem because most of them are more highly educated than the typical questionnaire respondent, researchers are prone to use words familiar to them but not understood by many respondents.
Evidence from Malaysia statistics (Malaysian Department of Statistic 2001) indicates the average person in Malaysia has a secondary school education (high school), but not a college education. Therefore it was decided that to minimise the potential problem in comprehending the questions, they must be pre-tested (first pilot test) to ensure that any misleading questions, inappropriate abbreviation, ambiguity, or potentially doublebarrelled questions could be detected before distribution to respondents for the second pilot study. In addition, on the introductory page of the questionnaire, definitions for the terms frequently used in the questionnaire were precisely defined and examples were provided. The pilot tests will be described in detail in section 5.5.10 Paynes (1979) guidelines addressing issues pertaining to wording in pre-testing phase were used; they are as follows: 1) Does it mean what we intend?; 2) Does it have any other meanings?; 3) If so, does the context make the intended meaning clear?; 4) Is there any word of similar pronunciation that might be confused? 5) Is a simpler word or phrase suggested? Comments and suggestion by participants were taken into consideration; as a consequence, some refinements to item statements and instruction were made. This procedure was envisioned as improving the phrasing of questions in a simple and precise manner, which would subsequently minimise measurement error.
5.5.6
Question Sequence
Once the form of response and appropriate wording for each question had been determined, the next step was to put together these questions into a questionnaire. In this regard, the sequence in which the questions are presented is essential to the success of the research effort (Churchill 1991). Even though there are no definite rules pertaining to the ordering principles, the rules-of-thumb postulated by Churchill (1991) were used 172
as guidelines to the ordering of the questionnaire. The suggested guidelines are as follows:
Firstly, the first few questions should be simple, interesting and non-threatening, as this will encourage respondents to relax and motivate them to talk freely. It is worth highlighting here that the first few questions (Section A) in this study are the screening questions, which ensure that respondents are qualified for the study.
Secondly, the funnel approach to question sequencing was recommended, which mean that the questionnaire should begin with broad questions before gradually narrowing down to a more specific scope. This logical sequence was clearly portrayed by this questionnaire; for example, the first section (Section A) involves questions which frame the respondents experience with the direct sales channel; if he/she is qualified to be included in the study then he/she is requested to proceed to the next section (Section B). This section is comprised of questions pertaining to respondents recent purchases from the specific product category under study, and subsequent sections examine performance, satisfaction measures and finally future behavioural intentions.
Thirdly, branching question should be designed with care. Branching questions are those that are used to direct respondents to proceed to different questions. For example in this study, the first question asks the respondent if they have ever purchased any product from a direct selling company before. If they reply yes, then they are requested to proceed to question 2, otherwise they are requested to stop at this point.
Fourthly, it is suggested that questions about classification information should be placed last. In this regard, questions pertaining to the personal profile of respondents in this study were positioned in the final section (Section C).
Finally, it was recommended that difficult or sensitive questions be placed late in the questionnaire. In line with this suggestion, the demographic questions, which are regarded as sensitive, were asked in the last section as mentioned above and in turn, the open-ended questions, which were regarded as quite difficult, presented as questions 54, 55 and 56, the last few questions before Section C. Overall, the ordering of the
173
questions in this study satisfied the recommended guidelines as agreed by many scholars (e.g. Churchill 1991; Malhotra 1996; Parasuraman 1991).
5.5.7 Physical Questionnaire Characteristics The physical characteristics of a questionnaire, particularly in the case of selfadministered questionnaires, can influence the perceived importance of the study in respondents eyes, which may influence their cooperation or willingness to participate in the study, and most importantly, can detrimental effects on the accuracy of the information obtained (Churchill 1991; Malhotra 1996). Based on these rationales, in the first pre-test of the questionnaire, emphasis was placed on layout as well as the wording of questions.
Effort was also devoted towards achieving a professional layout to reflect the credibility and the importance of the study. In this context, the questionnaire was bound as a booklet with a black backbone binder. The booklet format offered ease of reading and turning pages and reduced the likelihood of misplaced or lost pages. It has been recognised by previous researchers that a cover letter accompanying the questionnaire, which serves to introduce the study and motivate the respondents to cooperate, is crucial (Churchill 1991). In this case, a cover letter with the Cardiff Business School letterhead was used to communicate the credibility of the sponsoring institution. In addition, to motivate the respondents to cooperate in the study, information regarding a prize draw and assurance of confidentiality of their responses were clearly stipulated in the cover letter. The questionnaire was then inserted in a brown (8 x 10) envelope together with a complimentary Cardiff ballpoint pen as a souvenir for the respondents who qualified to participate in the study.
6.5.8 Re-examination and Revision of Questionnaire This step involves re-appraisal of all the decisions which had been made in the previous steps (1 to 7). The next crucial consideration after it had been established that the questionnaire satisfied all the criteria of a good questionnaire was the question of translating the questionnaire into the national language of the potential respondents.
174
5.5.9
Questionnaire Translation
As the questionnaire was administered in Malaysia, the original questionnaire had to be translated into Malay, which is the national language of the Malaysian. In the final questionnaire, both the English and Malay language versions were bound together as a booklet. As such, respondents were given a choice with regard to their preference of language, and interestingly it was found that majority of the respondents chose the Malay language version. Ensuring the correct equivalent translations to be used in this study proved to be an important and difficult task. To achieve equivalent translations, two main translation procedures were suggested by Adler (1983), that the question should be:
1)
Back-translated: Translated and then back-translated into the original language using a bilingual target population, or
2)
Translated by an expert: Translated independently by excellent bilingual translators who are familiar with both languages and the subject matter.
An experienced, qualified translator who was highly proficient in both English and Malay was appointed to translate the English version questionnaire into Malay. In consultation with the translator, the differences that emerged between the translator and the author of the thesis were reconciled and a committee of three panels was formed to further scrutinize the accuracy of the translation with explicit emphasis on the content and context of the study. Subsequently, the Malay language version was pre-tested with five participants and upon collection of the completed questionnaire, discussion with each individual was held with the aim of assessing the respondents ability to get through the translated questions.
5.5.10
Questionnaire Pre-Testing
Pre-testing is conceived as an indispensable aid for constructing a good questionnaire as it provides a real test before the full-scale data collection is implemented (Churchill 1991; Dillon et al. 1990). Many of the potential problems, such as item wording, sequencing and the overall flow of the questionnaire, can be corrected. In fact, a thorough pre-test is able to detect a range of potential mistakes, from the merely irritating and inconvenient to potentially catastrophic situations where the whole study
175
could be ruined by one error. Hence, pre-testing is viewed as the best safety net (Riley et al. 2000).
A two-phase pre-test was designed for this study. Firstly, a rough draft of the questionnaire was circulated among other PhD students to get their input. This resulted in minor improvement in the sequencing and wording of items and the overall appearance of the questionnaire. The revised draft was pre-tested with twenty qualified participants. Care was taken so that the tested consumers were similar to those specified as a target population in the final data collection. Respondents were a convenience sample selected to present the typical average Malaysian consumer. An announcement via the Malaysian-UK communitys website was posted to solicit potential participants and a personal invitation was made by the author to Malaysian students and their families living in Cardiff. Upon collection of the questionnaire, personal discussions with the participants were carried out to find out if they were able to get through the questionnaire. Any problem they encountered while responding to the questions were noted, specifically with regard to comprehension of phrases used in the question, item sequencing and the layout of the questions. During this pre-test phase, the questionnaire was also subjected to critical evaluation by ten members of the academic staff of Cardiff Business School, particularly from the Marketing and Strategy section.
As an outcome of the first pre-test exercise, a significant amendment of the original version of questionnaire was carried out regarding questions 34 and 35. Originally they were combined together to form bipolar five point Likert-type scales, anchoring on Very dissatisfied Very satisfied and Not at all important to Very important respectively. The recommendation put forth by the respondents and academics this particular question led to the fine-tuning of the question format and as a result it was split into two questions. Other minor adjustments in terms of item wording and sequencing were made (see Appendix 5.1(a)). The modification was consistent with the suggestion of Martilla and James (1977), that separating the importance and performance measures into two questions or sections could help to minimise compounding and order effects. It is believed that if the same item is utilised to assess two or more different measures that are positioned near to each other, the respondents answer to the first measure may influence his/her second answer, hence by distinct separation, this potential problem could be avoided. 176
The second pre-test phase was carried out with the revised questionnaires.
The
questionnaires were distributed to fifty relevant respondents who reside in Malaysia and the UK and thirty-two usable responses were obtained. The outcome of this pre-test stage was that only minor refinements were made. For instance, one item from the Perceived Equity scale was dropped due to prevalent misunderstanding experienced by the respondents. Additionally, in order to solicit expert opinion, the author of this thesis personally interviewed the marketing research manager of the worlds leading direct selling company at the Malaysia head office. In this discussion, special attention was devoted to the content coverage of the questionnaire, and upon his suggestion, minor fine-tuning of item wording was carried out.
Structured interviews with twenty-eight direct sellers were conducted by the researcher, based on question number 35 of the direct sales customers questionnaire (see Appendix 5.1(a)). The questions instruction was revised to obtain the opinions of the direct sellers regarding how the items pertinent to core construct are important in satisfying their customers (questionnaire exhibits in Appendix 5.1(c)). The results revealed that all the stipulated items were indeed viewed as important.
5.6
Following an exhaustive review of the existing literature, there is still very little research addressing the dynamics of the direct sales retail channel. As such, it was not surprising to find that measurement scales pertinent to the present study (product performance/satisfaction, direct seller performance/satisfaction and direct sales company performance/satisfaction) were not available. There is a need to develop scales, for the purpose of this study, which explicitly identify factors that consumer consider important in judging the performance of products, the direct sellers from whom they purchased the products and the direct sales companies that manufactured them. It is anticipated that consumers form satisfaction at dimension levels initially, and then subsequently form an overall satisfaction with the direct sales channel. Based on sparse literature that focuses on issues such as consumer perception and attitudes towards the direct sales industry (see, for example, Barnowe and McNabb 1992; Kustin and Jones 1995; Peterson et al. 1989; Raymond and Tanner 1994), an initial pool of items for each construct was generated. Measurement scales for certain constructs which were available had to be refined to suit the direct sales context. The procedure used in this study to develop the 177
research constructs was based on the guidelines suggested by Churchill (1979). Figure 5.5 illustrates the five steps involved in developing and testing constructs of interest in this study. The descriptions on the right side of the box indicate the method undertaken by the present study in each steps.
Step 1
Step 2
Step 3
Collect Data
Step 4
Purify measure
Step 5
Validation of Measure
Sources: Adapted from Churchill (1979) cited from Churchill and Iacobucci (2002)
5.6.1
The first step for developing better measures suggested by the procedure in Figure 5.5 involves identifying and specifying the domain of the constructs3 of interest. In addition to the literature reviewed in Chapter 3, trade literature and sales reports were considered. This includes the material presented in Chapter 2. To support this, two expert interviews were conducted. Five key informant interviews with successful direct sellers followed.
A construct is defined as a specific types of concept that exists at higher levels of abstraction, it is specifically created for theoretical use (McDaniel and Gates 1999)
178
5.6.2
Item Generation
The second step of the development of measures focuses on the process to generate initial list of items that capture the domain specified in the previous step. The main concern of this process is the critical issue of content validity which is acknowledged as the minimum requirement for measurement satisfactoriness, specifically in the validation of new, modified or untested measures (Schriesheim et al. 1993). Initially at this step, items relevant and related to the direct sales consumption setting were identified. The methods employed in the prior step, which were to identify and specify the domain of the constructs, are also applied at this stage. Two main sources of information has been utilised at this stage, namely:
5.6.2.1
Literature Search
A literature search conducted also served as a basis for identifying a range of factors that could contribute to customer satisfaction and its relationship with future behavioural intentions. In this stage of the measure development process, more emphasis was given to literature that could assist in generating a comprehensive list of items for the constructs investigated in this study. Generally, the literature describes how the construct has been conceptualised by previous research and how many dimensions and attributes it has. A handbook of marketing scales, which consists of compilation of scales developed by previous researchers, (e.g. Bearden, Netemeyer and Mobley 1993) was consulted to inspect the availability of measures examined in this thesis.
5.6.2.2
In-depth Interviews
The in-depth interviews conducted were more concerned with acquiring viewpoints from the direct sales customers. The participants were recruited through a nonprobability sampling method, by selecting a convenience sample of individuals who can offer views and insights into the phenomenon investigated (Churchill and Iacobucci 2002). Respondents were carefully selected so as to have experience pertinent to the subject matter and the ability to articulate their experience and knowledge during the interview sessions. A series of interviews were held individually with ten experienced direct sales customers who have been purchasing products from direct sales channel for at least the last five years. The interviews were executed personally by the author and lasted approximately 50 to 60 minutes per session.
179
The main purpose of the in-depth interviews was to generate explicit descriptions by the participants of their consumption experience, specifically what attributes, features or aspects of the direct sales channel were important in influencing their
satisfaction/dissatisfaction judgements. Relatively unstructured questions were posed to the participants of the interview and they were encouraged to elaborate and explain their answers. At times, participants were repeatedly probed to clarify important issues. With these specific emphases in mind, these interviews were designed in such a manner that they could unearth the underlying reasons or determinants of customer
satisfaction/dissatisfaction, particularly in questions 1, 2 and 6. The rest of the questions were utilised to understand customers future behaviour intentions, which are the consequences of overall satisfaction. The informal discussion with the interviewees was noted and recorded, and consequently emerging themes were sorted into categories with regard to attributes such as product, direct seller and company aspects and a list of items was bolstered through this discussion. Appendix 5.1(e) depicts the questions used to guide the in-depth interview.
In conclusion, the review and examination of relevant literature and the in-depth interview utilised in the second stage of measure development served as a basis for drawing up a comprehensive list of items that represent the phenomenon investigated by this study, specifically in generating pertinent measurement scale for constructs examined. Effort has been made to achieve adequate content, which is an essential precondition for instrument validity, as strongly suggested by Schriesheim et al. (1993). 34 multi-items scales for the performance constructs were generated from previous measures, a review of related literature, interviews and observations from the
involvement and interaction with direct sellers and direct sales customers.
5.6.3
Data Collection
Once an appropriate research design and data collection instrument has been developed, the next step in the research process is to select those elements4 from which the information will be collected. The description of the data collection method and questionnaire administration technique employed for the present study has been discussed in great detail in section 5.4. The present step involves collecting data
180
pertinent to the sample that matches the demographic profile of the research target population5. Sampling design decisions are crucial aspects of research design and involve both the sampling plan and the sample size determination (Sekaran 2000).
Initially, it is imperative to define the target population or the collection of elements about which the researcher is attempting to make an inference. The target population criteria for this study are those adult consumers (over 16 years of age) who have purchased beauty or healthcare products from the direct seller 6 within the last twelve months prior to the data collection period (May June 2002), and who live or work within the selected geographic locations; in this case three designated districts (Petaling, Kelang and Federal Territory of Kuala Lumpur). Churchill and Iacobucci (2002) posit that the less complex the definition of the target population, the higher the incidence7 and the easier and less costly it is to search the sample. In view of the fact that direct sales is a prevalent alternative shopping channel and particularly popular among working women, the incidence is relatively high.
5.6.3.1
Sampling Design
Sampling design can be broadly categorised into two main types: probability and nonprobability sampling (Churchill and Iacobucci 2002; Malhotra 1996; Parasuraman 1991; Tull and Hawkins 1987). A probability sample is recognised as one in which the elements in the population are selected by some known chance; there is a nonzero chance of being selected as part of the sample (Churchill and Iacobucci 2002; Sekaran 2000; Tull and Hawkin 1987). In contrast, a non-probability sample is one in which the elements in the population do not have a known chance of being selected as subjects, instead they are deliberately selected with the aim of generating an appropriate cross section of the population (Luck and Rubin 1987; Tull and Hawkins 1987).
As previously stated, to the best of our knowledge no prior information on the stipulated population units was available, which suggest that no sampling frame could be consulted, thus inhibiting the use of probability sampling techniques. Alternatively, since probability sampling was not a feasible option, the quota sampling technique was
Population refers to the entire group of people, events or things of interest that the researcher wishes to investigate (Sekaran 2000) Direct sellers, sometimes referred to as distributors or direct salespeople, are independent representatives of a direct selling company who have the right to sell and facilitate the distribution of the product to the end consumers 7 Incidence refers to the percentage of the general population that satisfies the criteria defining the target population (Churchill and Iacobucci 2002)
5 6
181
reckoned to be the most appropriate sampling procedure for this study (De Wulf 1999; Parasuraman 1991).
Parasuraman (1991) defined quota sampling as a non-probability sampling procedure in which (1) the population is divided into cells on the basis of relevant control characteristics, (2) a quota of sample units is established for each cell, and (3) interviewers are asked to fill the quotas assigned to the various cells. Based on the aforementioned features of quota sampling, it is not surprising that it is considered as the most refined form of non-probability sampling (Parasuraman 1991). It is widely employed in consumer marketing studies, particularly in the commercial studies (Dillon et al. 1990) as opposed to the random sampling plan. It is worth noting that quota sampling is recognised to be superior to both convenience and judgement sampling in terms of sample representativeness (Parasuraman 1991; Silver 1997).
A quota sample is selected purposively in the sense that that the demographic characteristics of interest (see Table 5.9b) are presented in the sample in a similar ratio to that found in the population (Tull and Hawkins 1987). Additionally, Parasuraman (1991) posits that if control mechanisms which are based on demographic characteristics are appropriately formulated, the ideal population of interest will precisely reflect the implied population. Furthermore, several scholars argue that under certain conditions, quota sampling obtains results that closely resemble the probabilistic sampling design, such as the stratified-random sampling procedure (Malhotra 1999; Parasuraman 1991; Sekaran 2000). Accordingly, Sudman and Blair (1999) claim that quota sampling is as good as those samples generated from more costly techniques, and when compared with probability sampling no major differences can be detected. In a similar voice, Luck and Rubin (1987, p. 222) posit that on the surface, non-probability sampling seems to be more effective than trusting to chance, as in probability selection. In fact, according to Moser and Stuart (1953), marketing practitioners admit that the aim of surveys is typically to get maximum accuracy at a given cost, and in many cases the quota sampling technique has been demonstrated to achieve this aim. Chisnall (1997, p.100 101) makes a strong case for the use of quota sampling by admitting that: In general, quota sampling is much more flexible than random sampling, and its advantages make it attractive, under certain conditions, to commercial researchers. Practical considerations may often be strong influences in survey work 182
In brief, despite the quota sampling method being criticised for its theoretical weakness, researchers have defended it on the grounds of it being economical to execute, its administrative convenience, the shorter time required to set it up and its relative ease of use as compared to the probabilistic techniques (Chisnall 1997; Moser and Stuart 1953; Sudman and Blair 1999).
Although there are distinct advantages to using the quota sampling technique, as mentioned above, it might produce biases if no effort has been made to minimise them. Next, we will discuss some of potential drawbacks of the quota sampling technique and subsequently describe the necessary steps taken by the current study to control and eliminate them.
(1)
Quota sampling has been criticised because if too many control characteristics
are used to improve a samples representativeness, it could drastically reduce the flexibility of the sampling procedure and become exorbitantly expensive to locate suitable respondents to fill the various cell quotas (Parasuraman 1991; Tull and Hawkins 1987).
In response to this contention, an effort has been made to avoid this potential problem by utilising only two main demographic variables as the main mechanisms to control the composition of the sample, namely gender and age (see Table 5.9 (a), p. 234). In this study, respondents were divided into cells based upon six age groups and two gender groups, which resulted in 6 x 2 = 12 sample cells.
(2)
Churchill and Iacobucci (2002) and Parasuraman (1991) argue that even though a
sample might satisfy the specified controls, it cannot not be generalised with a high degree of assurance if other demographic characteristics are not taken into account. In this regard, the sample representativeness might decrease as a result of the omission of relevant characteristics.
In order to minimise this potential bias, the sample selections would not solely based on the main predetermined control characteristics (age and gender). An attempt to enhance the representativeness of sample was made by taking into consideration the approximate proportion of other relevant population characteristics such as ethnic background and 183
socioeconomic variables. Prior research has suggested that demographic and socioeconomic characteristics evidently have some influence on consumer satisfaction judgements and repurchase intentions (see, for example Kassim 2001; Mittal and Kamakura 2001). As a result, the sample collected in this study did not over-represent or under-represent specific demographic or socio-economic variables.
(3)
places and times of interviewing in the effort to minimise potential biases, in this study questionnaires were administered at several office locations situated within the stipulated target location. It is worth noting that the administration of the survey was conducted mainly the office setting and with respondents working in various types of sectors (e.g. private and public sector) and as well as students studying in colleges. In contrast to mall interception surveys, where time of the day and shoppers traffic flow could be critical factors in influencing respondent biases, in office survey these potential biases could be avoided because we could visit the potential participants at their offices during specific office hours which were set up by the key personnel or coordinators of the organisations.
(4)
Quota sampling has been heavily criticised because it is impossible to verify that
quotas are being accurately filled by field workers, because it was suspected that interviewers judgement could cause selection bias (Parasuraman 1991). In a similar voice, Moser and Stuart (1953, p. 350) assert that it is the human involvement in respondent selection which is the crucial and most criticised aspect of quota sampling. Ideally, the quota control devised by the researcher has the advantage that the sample will conform precisely to the specified target population if the field workers fill the quotas correctly.
Sampling control, which is an important aspect in the supervision of survey administration (Malhotra 1996), was carefully designed to ensure that the research assistants strictly follow the sampling plan rather than selecting sampling units based on convenience or accessibility. In an effort to circumvent this critical bias, the author personally solicited potential respondents for the survey by contacting key personnel in several organisations to negotiate access to their staff, and explicitly specified the criteria of the respondents sought after for the study. Upon agreement from the key 184
personnel, we requested a list of names of those who volunteered to participate before an appointment was set up for delivery of the research instrument. Research assistants were not in the position to determine to whom they would distribute the questionnaire.
These procedures were applied to all organisations that when survey participants whose solicited. Research assistants were sent to a particular organisation to administer the questionnaire based on the list of names given by key personnel of the organisation. The respondents were initially briefed on the surveys purpose and subsequently asked the screening questions to ensure that were are qualified to be included in the sample. The questionnaire was then handed to those respondents who were qualified to take part, and collected on a later date. The procedure adopted for filling the required cell quotas and the approach utilised in the administration of the questionnaire for this study could reasonably minimise the potential bias which is closely associated with the quota sampling technique.
(5)
probability sampling with respect to statistical precision and generalisability (Dillon et al. 1990; Sekaran 2000). Specifically, in this context, quota sampling suffers from pejorative connotations because of its association with selection bias and the inability to assess the seriousness of that bias for estimation and inference. Probability sampling was not feasible for this study; the target population is well defined, and as such, no sampling frame on this population unit could be consulted. Indeed, Mason (1953) commenting on Moser and Stuarts (1953, p. 401) paper, convincingly pointed out that, Pure random sampling must remain idealistic; in many fields no suitable list can possibly be procured. In addition, in their study Moser and Stuart (1953) provide empirical evidence that quota samples were unbiased when compared to the random sample.
As noted by Dillion et al. (1990), all non-probability sampling procedures share a common characteristic, that is there is no way of exactly determining the chance of selecting any particular element into the sample; hence, estimates are not statistically projectable to the entire population. However, Smith (1983)8 demonstrates that under certain conditions, quota sampling can be justified. He further argues that a sample that is free from selection bias would be of far greater scientific importance than
185
philosophical arguments about randomization inference. On the contrary, Moser and Stuart (1953, p. 387) strongly affirm that, there is no theoretical basis for quota sampling surveys. Their contention was criticised by Durant (1953, p.397) who argued that, The authors are writing as statisticians, pure and simple. If we turn our eyes to another discipline, sociology, there is, of course, a theoretical basis for quota sampling.
To address the crucial question whether probability statistics can give valuable results to non-random samples, Smith (1983) believes that most dogmatic statisticians would strongly support this notion: If sample selection is non-random, whether by design, missing values or non-response, no valid statistical inference can be made using a randomization approach to inference (p.398). He further asserts that apparently a statistician who strictly adheres to this statistical self-righteousness would reject social surveys, because from his vast practical experience, data from social surveys are always subjected to non-response and thus the analysis requires assumptions beyond randomization. Whilst Finney (1974) who recommends that statisticians should assist researchers to analyse non-random samples, but must explicitly explain the limitations of the findings. In addressing the theoretical argument pertaining to utilising probability statistics to analyse quota samples, Smith (1984, p. 214) emphasises that: statistics must be applied and so statistical inference, which is constructed by statisticians, must be interpreted within the scientific areas to which statistical methods are applied. Scientific theories are never exact and so is it reasonable to expect that useful statistical theories can be exact in the sense of being assumption free? A model-based approach is never exact but tries to relate the probability distributions to the population being sampled. In that sense a model-based approach is closer to scientific inference
In addition, Smith (1983) points out that although a model-based approach to inference permits the researcher to analyse non-random samples, explicit explanation of the underlying assumption should be stated. He further advocates that if the proposed model is plausible, descriptive inference could be used as a statistical prediction for the values of those units that are not in the sample (Smith 1993).
Thus, it could be argued that the descriptive analysis in Chapter 6 and the main analysis (i.e. SEM) in Chapter 7 for testing the hypotheses in the current research are justifiable, 186
because the author has provided explicit explanation of the underlying assumptions pertaining to the analytical technique used. Furthermore, there are several papers in leading marketing and management journals that have applied probability statistics to non-probability samples to test their conceptual models and hypotheses (see example De Wulf 1999; De Wulf et al. 2001; Schroder et al. 2003; Patterson and Smith 2003; Tan and Chong 2003; Ting and Chen 2002; Wirtz and Lee 2003).
In conclusion, having made maximal effort to avoid the potential biases of quota sampling techniques, the sample seems representative, and most importantly that the findings of this study could be generalised with some confidence. Table 6.1 exemplifies the overall sample demographic profile of the present study.
The quota was determined after reviewing previous empirical research pertaining to the direct selling industry. Care was taken to ensure that a range of younger, middle-aged and older purchasers were included in the sample. Determination of sample size was not drawn from the national population census, because this will not reflect the buying population. The ratio of men to women in the sample was established at 1:3, based on several rationalisations, as follows:
Firstly, it has been reported in several empirical studies conducted in Malaysia as well in the other countries that women make up a significantly larger percentage of purchasers through the direct sales channel than men (see, for example, Barnowe and McNabb, 1992; Raymond and Tanner 1994; Chen et al. 1998; Endut 1999; Sargeant and Msweli 1999).
Secondly, the specific product categories under study (beauty care and healthcare) have more direct appeal to women. The present gender mix is acceptable for the present purpose. The sample is broadly in line with the buying population rather than the general population.
Previous research has reported that age is a significant discriminator of direct sales purchasing (see, for example, Peterson et al. 1989; Rehanstat 1999 and Endut 1999). The present study set a higher proportion of respondents within the age bracket of 20 39 187
years, which contributes over 60% of the total population; the proportion of respondents aged 4049 years was set at 25 percent suggested by prior research. Table 5.8 (a) depicts the proposed sample composition, whilst Table 5.8 (b) reveals the sample composition achieved after data the collection process, which demonstrates a slight divergence from the planned quota proportion, specifically among the female sample.
Table 5.8 (a) Proposed sample composition by gender and age
Gender Age Below 20 years old 20-29 years old 30-39 years old 40 - 49 years old 50 - 59 years old Over 60 years old Total Male Frequency 5 35 25 25 9 1 100 Per cent 5% 35% 25% 25% 9% 1% 100% Female Frequency Per cent 15 5% 105 35% 75 25% 75 25% 27 9% 3 1% 300 100% Total Number of Respondents 20 140 100 100 36 4 400
Below 20 years old 20-29 years old 30-39 years old 40 - 49 years old 50 - 59 years old Over 60 years old Total
5.6.3.2
After the population has been defined and a specific sampling technique has been selected for the study; the next important consideration is sample size determination. In making a decision pertaining to appropriate sample size, an inevitable trade-off between added information and added cost, time and resources must be taken into account. In other words, the determination of the sample size must take into consideration both statistical accuracy and cost.
In general, for important decisions, more information is necessary and that information should be obtained precisely. Obviously this calls for larger samples, but as the sample size increases, each unit of information is obtained at greater cost. Accordingly, Malhotra (1996) postulates that the degree of precision may be measured in terms of the 188
standard deviation of the mean. Standard deviation is inversely proportional to the square root of the sample size. This means that the larger the sample size, the smaller the gain in precision by increasing the sample size by one unit. In addition, the cumulative effects of sampling error across variables are reduced in a large sample. However, most scholars (e.g. Malhotra 1996) agree that the sample size decision should be guided by resource constraints.
It has been suggested by Tull and Hawkins (1987, p.396) that there are at least six methods of determining sample size, such as: (1) Unaided judgement: This arbitrary approach to arrive at a specific sample size does not take into account the precision of the sample results or the cost involved in generating it. (2) All-you-can afford: This method focuses on the cost of obtaining the information without considering its value. (3) Average size for samples for similar studies: This method will determine sample size by referring to the sample sizes of other similar studies. (4) Required size per cell: This method is appropriate for stratified random and quota sampling techniques. It has been suggested that a sample size of 30 per cell is needed for statistical analyses to be adequately conducted. Therefore, the present study, which comprises 12 sample cells, requires at least 360 overall respondents. (5) Use of a traditional statistical model: This method incorporates three common variables; these are an estimate of variance in the population from which the sample is to be drawn, the error from sampling that the researcher will allow and the desired level of confidence (6) Use of a Bayesian statistical model: This model involves finding the difference between the expected value of the information to be provided by the sample and the cost of taking the sample for each potential sample size.
However Luck and Rubin (1987) note that another method that has been widely used by researchers in making sample size decisions is based on the desired data analysis, and is thus influenced by the a priori requirements or constraints of the mathematical technique employed for measuring statistical relationships. Generally, the more sophisticated the data analysis, the larger the sample size needed (Luck and Rubin 1987). Silver (1997) postulates that researchers need to think clearly about the types of 189
analysis to be undertaken when considering questions pertaining to sample size. The sample size requirement for this study will be based on the proposed data analysis technique adopted. The statistical technique used in this research is structural equation modelling (SEM).
The question of sample size adequacy has long been debated and has been a major concern in the application of structural equation modelling because it has been acknowledged that sample size plays a crucial role obtaining stable, meaningful estimations and interpretations of results (Hair et al. 1998). Although several scholars have pointed out that sample size is crucial in the proposed statistical techniques, no precise sample size guideline has been stipulated. Nevertheless, one rule of thumb has been suggested; that a minimum recommended level is five observations for each parameter (Hair et al. 1998). If the observation/parameter ratio is less than 5:1, the statistical stability of the results may be doubtful (Baumgartner and Homburg 1996). This assumption implies that models with greater numbers of parameters require larger samples (Kline 1998). In addition, Hair et al. (1998, p.637) point out that a sample size of at least 200 and not exceeding 400 is considered adequate. They further observe that if the sample size becomes larger (exceeding 400 to 500), the SEM statistical analysis becomes too sensitive and almost any difference is detected, making goodness of fit measures show poor fit (Hair et al. 1998, p. 637). In line with the above suggestions and assumptions, it was decided that the target number of usable responses for this study was 400. This sample size is considered appropriate for this study, which is comprised of a large number of parameters. Next, the fourth stage, which is the scale purification procedures adopted by this study will be described.
5.6.4
The purpose of this stage was to purify the original item pool generated from previous stages. This means that items performing poorly in terms of item-to-total correlation and violating the predicted factor structure are to be investigated and discarded. In short, in this stage items in the measure will be analysed to determine their performance. Items analysis and exploratory factor analysis were employed to determine their performance and clarify the scales (Churchill 1979).
190
5.6.4.1
Internal Consistency
Item-total correlation utilisation in the construction of unidimensional scales has long been suggested (Churchill 1979; Nunnaly 1978). It was strongly advocated by Nunnally (1978) that researchers should ascertain item unidimensionality; items within a construct would be useful only when they share a common core (the domain to be measured). As such, items performing poorly in terms of item-to-total correlation are to be investigated and removed accordingly. The key decisive factor used in discarding items is the observation of how each item inter-correlates with other items in the measurement scale. Supposedly, if all the items in the measure were derived from the same domain of a single construct, the items should be reasonably strongly associated. Conversely, if they are not correlated, it could be assumed that the specific items do not fit appropriately into the scale, and as a consequence, could produce errors and an unreliable measure, and should therefore be eliminated (Churchill 1979). Threshold values for item-total correlations ranging from 0.30 and 0.60 were advocated by Steenkamp and van Trijp (1991).
Based on the above suggestions, none of the items generated from the analyses was deleted from the measurement, because the results of the correlation analysis revealed that all the items were significantly correlated at the 0.01 level (2-tailed) with one exception, which correlated significantly at 0.05 level (2-tailed) (see Appendix 5.2a). Furthermore, the item-to-total correlation scores were in the range of 0.35 to 0.67, which signifies a reasonably good item performance. Item-total correlation, the resultant alpha if the specific item is deleted and finally the coefficient alpha of the 34 performance items are reported in Appendix 5.2b.
It was strongly suggested by Churchill (1979) that coefficient alpha should be the first analysis to be conducted in the scale purification process in assessing the quality of the measure. The reason for this is that coefficient alpha could explicate whether the items in the scale have successfully captured the domain of the construct. Coefficient alpha is the recommended measure of internal consistency for a set of items in the data set. A low coefficient alpha score indicates that some items do not share the common core; therefore, the poor performing items can be identified and consequently discarded before exploratory factor analysis is performed. Further, Churchill (1979) points out that the reason for assessing internal consistency before carrying out factor analysis is that if the 191
ineffective items are not detected and discarded earlier, they might confound exploratory factor analysis results and as a consequent contribute to a meaningless factor structure.
Conversely, it has been strongly argued that high reliance on internal consistency in the early stage of measure evaluation could whitewash a complex factor structure by discarding items that seem weak but could potentially contribute a second dimension, as a consequence of which a conceptually important factor may be eliminated as a result of its failure to contribute to internal consistency at an acceptable level (Flynn and Pearcy 2001, p. 414). Hence it was suggested that the researcher should conduct exploratory factor analysis at the beginning of the scale development process and compare the results to those of the internal consistency analysis. Similarly, Flynn and Goldsmith (1999) advocate that exploratory factor analysis and internal consistency computations should be simultaneously performed in order to achieve the best decision with regard to item deletion. The present study adhere to the suggestion put forward by Churchill (1979) to compute the internal consistency first, then perform exploratory factor analysis and compare the results and subsequently make the decision as to which item(s) would be deleted from the measure (Flynn and Goldsmith 1999). As such, this procedure could strongly support the researchers decisions in the item purifying process and help in generating a high-quality measure.
It should be noted that there are no absolute guidelines on the acceptable level of Cronbachs alpha. However, Nunnally (1967, p.226) suggests that for early stages of basic research, reliabilities of 0.5 to 0.6 are considered adequate, whereas in applied research settings a reliability of 0.9 is the minimum that should be accepted and a reliability of 0.95 is considered the desirable standard. On the other hand, Gerbing and Anderson (1988) posit that alpha scores of over 0.7 are considered reliable. It is worth highlighting that the coefficient alpha () value for all 34 performance scale items was 0.94, which indicates a high degree of internal consistency (see Appendix 5.2b). In essence, the preliminary test of the performance scale demonstrates that it is highly reliable, and this favourable result provided substantial support to proceed with the next level of analysis, which is the exploratory factor analysis; this procedure was strongly recommended by Churchill (1979).
192
It was argued that even though coefficient alpha is important in the assessment of measure reliability, it does not delineate the dimensionality of the measure (Gerbing and Anderson 1988). Therefore, in order to ascertain the dimensionality of constructs, exploratory factor analysis is essential. The results presented in Table 5.9(b) clearly portray the item-total correlation values of each item and the overall performance scale reliability score; nevertheless, this table cannot suggest how these items could be grouped together in certain dimensions to form subscales or underlying factors. For this reason, the pool of items is subjected to exploratory factor analysis in order to delineate its underlying dimensions (Churchill 1979; Gerbing and Anderson 1988).
5.6.4.2
Exploratory factor analysis is the most commonly used analytical technique for reducing a large item pool to a more manageable set. In addition, this analysis has been recognised to be a valuable preliminary analysis when no sufficient theory is available to establish the underlying dimensions of a specific construct (Gerbing and Anderson 1988). Factor analysis is defined as a multivariate statistical technique that analyses data on a relatively large set of variables and produces a smaller sets of factors, which are linear combinations of the original variables, so that the set of factors captures as much information as possible from the data set (Parasuraman 1991, p. 757).
The main purpose of applying factor analysis is to determine the most suitable items for each construct from a list of items which have been analysed to measure their internal consistency. The analysis groups the data set into a number of new categories, known as factors (Luck and Rubin 1987; Sharma 1996). It has been recognised that there are several techniques for accomplishing this analysis procedure, and the most widely adopted extraction technique is principal components analysis (Luck and Rubin 1987); this technique was utilised in this study. Exploratory factor analysis was performed on 34 items in order to explore the underlying dimensionality of the scale that constitutes customer satisfaction judgements with regards to consumption experience via direct sales channel. Accordingly, three important decisions were considered when utilising factor analysis, which are:
193
(1)
An orthogonal rotation (varimax) was employed because it helps to facilitate the accurate interpretation of the underlying structure of the data (Hair et al. 1998). However, for completeness, the oblique rotation procedure was also applied to the data set, but the results of this analysis were unclear and it was quite difficult to ascertain the underlying factors. Moreover, it was recognised that most applied customer satisfaction research used orthogonal rotations (Allen and Rao 2000). Henceforth varimax rotation was adopted as the rotational method.
(2)
The most popular guidelines that have been advanced for deciding how many factors to retain for the factor solution are the eigenvalues-greater-than-one rule and examination of the scree plot (Churchill 1991; Hair et al. 1998; Sharma 1996). The scree test is performed to identify the optimum number of factors that can be extracted from the data set.
(3)
The minimal level of item loadings and maximal level of items cross-loadings:
In this study, items that had high loadings (greater than 0.5) on a single factor and did not have cross-loadings greater than 0.3 (Rentz et al. 2002) on multiple factors were retained for further analysis. There exists no commonly accepted standard as to which factor loading values can be regarded as low or high; for example, Gerbing and Anderson (1988) suggest that items that have a factor loading of at least 0.4 and do not cross-load as highly on other factors are considered acceptable, whereas a cut-off of 0.3 to 0.35 was typically considered sufficient by Churchill (1991). In addition, Hair et al. (1998, p. 111) posit that it is a rule of thumb used frequently that factor loadings greater than 0.30 are considered to meet the minimal level; loadings of 0.40 are considered more important; loadings of 0.50 or greater are considered practically significant. Indeed, according to Tabachnich and Fidell (2001) the choice of the cut off level of factor loadings to be taken as a factor is a matter of researcher preference.
5.6.4.2.1
Initially, to assess the structure of performance measure, all the 34 items were factor analysed, and the result revealed that the Kaiser-Mayer-Olin statistic of sampling 194
adequacy was 0.914. Sharma (1996) postulates that the cut-off level for the KaiserMayer-Olin statistic should be greater than 0.8, but that a value of 0.6 is tolerable. In addition, the Barlett test of sphericity, a statistical test to determine the presence of correlations among the variables, was also employed in this study, and it was statistically significant. In essence, these results indicate the suitability of the factor analysis technique for this study. A seven-factor solution was initially extracted, which accounted for 64.5% of the total variance explained (see Appendix 5.3). However, this initial purification exercise resulted in the deletion of ten items on the basis of failing to fulfil the criteria mentioned above.
Exploratory factor analysis on the remaining performance items was performed again and item-to-total correlations and Cronbachs alpha were computed for each factor extracted from this second run (Churchill 1991; Hair et al. 1998). As a result, five factors were extracted with 64.9% of the total variance explained. All items showed factor loadings of greater than 0.55 and each factor yielded a reliability coefficient (Cronbachs alpha) ranging from 0.73 to 0.90, which is greater than the recommended threshold of 0.70 (Nunnaly 1978). Table 5.9 (a) exhibits the final item factor loadings, the variance explained by each factor and the total variance explained. The exploratory factor analysis used principal components extraction with varimax rotation, which converged in 7 iterations. The first factor was labelled as Direct Seller Performance, followed by the second factor Corporate Image, the third factor was Product Quality, the fourth was Product Offerings and Information and finally Corporate Customer Service. All items loaded appropriately on the expected dimensions except Direct Seller Performance which was expected to produce at least two sub factors. Table 5.10 (b) reports the item-total correlations and Cronbachs alpha coefficients for constructs measured based on items retained after exploratory factor analysis.
195
Item Product performance Product function Product effective Product guarantee Product availability Product information Product catalogue Product innovative & unique Knowledgeable direct seller Courteous direct seller Personal advice Continuity contact Availability of direct seller Professional appearance Customer interest Demonstration Company reputation Promotion Corporate information Companys popularity Handle complaint Concern about customer Services charge Policy/product return Eigenvalues Percentage of variance explained Total variance explained
Corporate Image
.640 .719 .691 .619 .557 .651 .712 .785 .730 .749 .767 .591 .710 .751 .732 .729 .550 .557 .689 .660 1.0 4.39
9.6 40.24
1.8 7.60
1.3 5.50
Note:
196
0.67 0.55 0.71 0.56 0.73 0.49 0.61 0.52 0.46 0.90 0.62 0.64 0.63 0.71 0.71 0.74 0.74 0.61 0.81 0.63 0.65 0.57 0.67 0.86 0.70 0.72 0.67 0.71
5.6.4.2.2
Exploratory factor analysis was conducted separately on perceived value, perceived equity, purchase decision involvement, relational commitment, overall satisfaction and behavioural intentions. In the first run, which was comprised of 37 items, eight factors emerged which accounted for 61.6% of total variance explained (see Appendix 5.4). However, this initial purification exercise resulted in the deletion of 4 items on the basis of high cross loadings (greater than 0.30) on multiple factors. As a consequence, the exploratory factor analysis was performed again on the remaining 33 items, and itemtotal correlations and Cronbachs alpha were computed for each factor extracted from this second run (Pallant 2001). As a result, a seven-factor solution was extracted with 61.61% of the total variance explained, and all items displayed factor loadings greater 197
than 0.54. The Kaiser-Meyer-Oklin value was 0.91, exceeding the recommended cut off value of 0.8 (Sharma 1996). Additionally, Barletts test of sphericity was statistically significant, hence supporting the factorability of the correlation matrix (Pallant 2001). Table 5.10 (a) exhibits a summary of the remaining items factors loadings, variance explained by each factor and total variance explained. The exploratory factor analysis used principal components extraction with varimax rotation, which converged in 6 iterations. The seven factors extracted from this analysis loaded onto the predicted dimensions, with the exception of the behavioural intentions scale, which was anticipated to produce at least four sub factors. After performing the exploratory factor analysis for the second time only two sub factors emerged for this scale. Accordingly, a seven-factor solution was delineated; the first factor was labelled as Favourable Behavioural Intentions, the second Perceived Value, the third Relational Commitment, the fourth Purchase Decision Involvement, the fifth Perceived Equity, the sixth Unfavourable Behavioural Intentions and finally Overall Satisfaction. Table 5.10 (b) shows the item-total correlations and Cronbachs alpha coefficients for constructs measured on the basis of items retained after exploratory factor analysis.
198
.773 .826 .827 .791 .751 .643 .773 .789 .788 .570 .664 .585 .658 .566 .639 .622 .695 .712 .650 .623 .656 .729 .538 .711 .753 .652 .732 .736 .589 .710 .807 .713 .821 1.24 3.74
10.26 31.1
1.98 5.99
1.62 4.91
1.48 4.50
1.03 3.11
199
Table 5.10(b)
Constructs
Purchase Decision Involvement Involvement 1 Involvement 3 Involvement 4 Perceived Equity Fairness 1 Fairness 2 Fairness 3 Relational Commitment Commitment 1 Commitment 2 Commitment 3 Perceived Value Quality value Knowledgeable Enjoyable Product information Convenient Value for money
Time value
Overall Satisfaction Overall satisfaction 1 Overall satisfaction 2 Overall satisfaction 3 Favourable Behavioural Intentions Say positive things about direct seller Encourage friends Say positive things about company Cross buy Repurchase same product from d/s Continue purchase Price tolerance 1 Price tolerance 2 Say favourable things about product Maintain same amount of purchase Continue as main provider Unfavourable Behavioral Intentions Switch to other product Complain to direct seller Complain to friends
To summarise, the results derived from the exploratory factor analysis revealed that items generally loaded on their intended scales, except for some minor divergence as described previously. Factor loadings of each scales of interest achieved acceptable levels (see Table 5.9a and 5.10a). In addition, Cronbachs alpha, which was utilised to 200
test for internal consistency for all dimensions extracted from exploratory factor analysis, ranged from 0.73 to 0.90 (see Table 5.9b) for performance scales. For other scales, the alpha scores ranged from 0.74 to 0.91 (see Table 5.10b), which clearly indicates that the scales used in this study were highly reliable, being above the minimum acceptable score of 0.70 (Gerbing and Andeson 1988). It is important to note that although it was suggested that exploratory factor analysis be used as a scale validation, there was a strong argument that additional evidence should be collected and analysed (Churchill 1979). It is widely recognised that more rigorous statistical techniques should be employed to confirm and verify the dimensions underlying the factors derived from exploratory factor analysis. As such, a confirmatory factor analysis was strongly recommended after the exploratory phase (Gerbing and Anderson 1988). Therefore, in accordance with this suggestion, the 24 items of the Performance measure and the 33 items of the Other constructs measure derived from the exploratory phase were used in a confirmatory factor analysis model to verify their underlying dimensions using Structural Equation Modelling. The results of this analysis and internal consistency of each scale will be documented in Chapter 7. Subsequently, the next section will discuss the final step of the scale development procedure, which is the assessment of the scales validity.
5.6.5
Validation of Measures
This final step of the measure development is the validation of the measures. This term is used to mean demonstration of the measures validity and reliability (Olsen 2002 ). However, Ping (2004) maintains that the term validation implies demonstrating measure unidimensionality, consistency (i.e. model-to-data fit), reliability and validity. Steenkamp and van Trijp (1991) argue that the validity of constructs is a fundamental condition for theory development; it occupies at the very heart of scientific advancement in marketing. It would be impossible to test the hypothesised relationships among constructs as portrayed in the conceptual model formulated in Chapter 4 (Figure 4.2) without developing measures that fulfilled the basic criteria of good measurement, such as validity. This step of the scale development process will primarily involve determination of the scales convergent, discriminant, and construct validity and reliability. It should be noted that the scales content validity has been addressed
201
indirectly in Steps 1 through 4 of the measurement development process, however it will be discussed again in this section.
It is imperative to ensure that the scales developed for the study measure what they intend to measure; ideally, a measurement should generate a score that truly represents the characteristics one intends to measure. When a scale measures things other than what the researcher has attempted to measure, this indicates that measurement error has transpired (Churchill 1991). Churchill defines that there are two types of measurement errors, which are systematic error and random error. Systematic error is sometimes referred to as constant error because it frequently influences the measurement in a constant manner; as a consequence, the observed score is stable but it is regarded as an erroneous indicator of the measure. Random error is observed as a measure which lacks consistency, and thus fluctuates each time measurement of the same object, person or phenomenon is performed. Random error is attributed to transient factors such as a respondents mood during the administration of the survey instrument or the environmental setting where the survey was conducted (Churchill 1979).
Validity is closely associated with accuracy or correctness of the measurement scale. According to Selltiz et al. (1976, p. 169) validity of the measurement scale is defined as the extent to which differences in scores on it reflect true difference among individuals on the characteristic we seek to measure, rather than constant or random errors. Alternatively, Sekaran (2000) offers a simple definition of validity, which was viewed as an extremely complicated and controversial topic in social and behavioural research by Kerlinger (1986). Validity is described as the ability of a scale to measure the concept that it set out to measure. This definition places particular emphasis on what is being measured or the use towards which a measuring instrument is put, rather than the instrument itself. Simply put, the validity of the measurement depends on the scientific or practical purposes of its user. Henceforth it appears that the researcher is responsible for establishing that the measure precisely captures the phenomenon under study.
As previously stated, measurement is considered valid when there is no error and therefore the observed score should actually be equivalent to the true score. Such evidence is comprised of content validity, convergent validity and discriminant validity. In the next section the types of validity will be described. 202
5.6.5.1
Validity Assessment
Validity is concerned with measurement errors, specifically the systematic error rather than random error, which was the main source of reliability evaluation. Hayes (1998) conceptualises validity as the degree to which evidence upholds that the observed score represents the phenomenon it intended to appraise. It was argued that in any construction of theory, an indicator can only be regarded as valid when it is able to capture the domain it purports to estimate. In other words, it is implied that the assessment of validity is predominantly concerned with the relationship between an indicator and the construct it is supposed to represent (Nunnally and Bernstein 1994). For the purposes of this study, three main types of construct validity are assessed, namely content, convergent and discriminant validity.
5.6.5.1.1
Content validity
Content validity is concerned with the relevance and representative nature of the scale items in capturing all aspects of the phenomenon investigated in the study, which in this case is the direct sales channel. Churchill (1995) suggests that the key to content validity lays largely in the procedures used to develop the measure. Specifically, by specifying the construct domain, generating an exhaustive list of items and consequently purifying the resulting construct, the content or face validity should be clearly satisfied (Churchill 1979). All these processes have been undertaken by this study and were described in detail in the preceding section. Besides the above procedures, the measurement scales were also inspected by an experienced marketing research manager who represents the most popular and leading direct sales company in Malaysia, and pre-tested with experts, comprising direct seller customers and experienced and successful direct sellers. The feedback and recommendations received from them could be regarded as a valuable means to improve content validity.
5.6.5.1.2
Convergent Validity
In order to ascertain convergent validity, a robust statistical technique will be employed, namely structural equation modelling. Anderson and Gerbing (1988) postulate that convergent validity can be assessed from a measurement model by determining whether each indicators estimated coefficient on its posited underlying construct is statistically significant (greater than twice its standard error). The procedure to ascertain convergent validity involves the construction of a measurement model in which latent constructs are 203
presented by indicators. Alternatively, an indicator that demonstrates significant loadings (t-statistics greater than 1.96) on its posited construct provides evidence of the presence of convergent validity (Anderson and Gerbing 1988). In line with these guidelines, the confirmatory factor analysis results are presented in chapter 7. Indicators that displayed significant loadings on their posited construct provide evidence of the presence of convergent validity for the measures utilised in this study.
5.6.5.1.3
Discriminant Validity
Discriminant validity is ascertained when the measure of interest does not correlate too highly with other measures that it supposedly differs from (Churchill 1995). In other words, discriminant validity could be established by the correlation analysis; low to moderate correlations between constructs imply the existence of discriminant validity. It is the only type of validity estimate which is not enhanced by high reliability, because low correlations confirm the evidence of discriminant validity (Peter and Churchill 1986).
Two approaches have been suggested by Anderson and Gerbing (1988), both employing confirmatory factor analysis procedures. The first approach is a chi-square difference test; in accordance with this procedure, the measurement model between each pair of constructs is performed twice. The first time, the estimated correlation parameters between two constructs are constrained to unity and the second time, the two constructs are allowed to correlate freely. Consequently, a chi-square test is performed to examine whether the chi-square value of the two-factor model is significantly lower than the chisquare value of the constrained one. A statistically significant chi-square difference implies that the pair of constructs is not collinear (Anderson and Gerbing 1988), thus discriminant validity is supported. Accordingly, all constructs examined in this study will be assessed for discriminant validity by repeating these procedures. The results for assessing the discriminant validity among fifteen constructs incorporated in this study are presented in Chapter 7. Alternatively, the second approach for the complementary assessment of discriminant validity is to determine whether the confidence interval (+/two standard errors) around the correlation estimate between two factors includes 1.0 (Anderson and Gerbing 1988). In addition, Fornell and Larcker (1981) proposed a more rigorous test of discriminant validity. They posit that a criterion for discriminant validity is that the square correlation estimates between pairs of factors should be less than the 204
variance extracted for each construct. In this study, all three tests were employed to assess discriminant validity, and the correlation matrix by constructs and dimensions is also presented in Chapter 7 to further demonstrate evidence of discriminant validity (Peter and Churchill 1986).
To summarise, it is notable that Gerbing and Anderson (1988, p. 186) have updated the widely known paradigm for measure development proposed by Churchill (1979), by incorporating confirmatory factor analysis for assessment of scale unidimensionality. The rationale of this updated procedure is that confirmatory factor analysis offers a more stringent measurement of unidimensionality than the traditional approaches such as coefficient alpha, item-total correlations and exploratory factor analysis. It was recognised that coefficient alpha undoubtedly measures reliability, but that it does not assess dimensionality (Gerbing and Anderson 1988).
Additionally, they argue that exploratory factor analysis could not afford an explicit estimation of scale unidimensionality, which could be accomplished by a confirmatory factor analysis; thus it is a more rigorous measure which takes into account both the internal and external consistency. Accordingly, they further suggest that item-total correlations and exploratory factor analysis should be performed as preliminary analyses for scale development and to further confirm and refine the resulting scales, confirmatory factor analysis should be employed. Consequently, it was advocated that the reliability of the composite score should be assessed only after unidimensionality of scale has been established.
5.6.5.2
Reliability Assessment
The assessment of reliability could be considered to be part of the testing stage of the newly developed measure (Hinkin 1995). It should be noted that in the preceding step of the measure development process (see Section 5.6.4), assessment of scale reliability has been performed in purifying the scales of interest in the present study; Cronbachs alpha and item-total correlations were utilised to measure the internal consistency (Dillion et al. 1990). In ascertaining reliability, the researcher is concerned about how much the incongruity in scores is owing to inconsistencies in measurement (Peter 1984). In other words, reliability connotes the extent to which a measure is replicable and generates the same or nearly the same result (Allen and Rao 2000). Similarly, it is recognised as the 205
degree that a measure is free from random error; thus, if random error is significant, the measure will become unreliable (Peter and Churchill 1986). In short, measurement is considered reliable if it does not vary over time (stability) and to the extent that a similar measurement procedure employed in a different context will yield essentially the same result (equivalence). Additionally, it was observed by Peter and Churchill (1986) that a higher reliability may consequently generate higher correlations because correlations largely depend on systematic variance, and in turn reliability was closely associated with convergent and nomological validity.
There are three main approaches to the assessment of the reliability of a scale, as suggested by Bagozzi (1984). The first approach is the internal consistency measure, which is assessed by Cronbachs alpha score. Alternatively, the reliability could be ascertained by the test-retest approach, which essentially involves giving the same test to the same respondents after a period of time. However, it was argued that the test-retest reliability is rather costly and impractical to implement. In addition, the main problem with this approach is that the respondents can memorise their responses, and could thus deliberately respond to an item in the same manner as they did in the first administration of the survey instrument. The third approach is the split-halves method, which is more complex than the test-retest method, which again suffers from being costly and timeconsuming. This approach demands that the researcher split the items in the research instrument into two subsets. It was noted that the correlation results of the two set of the measurement scale greatly depend on how the researcher split the items in the measurement scale.
Due to the impractically, complexity and relative expensive involved in implementing both the test-retest and split-halves approaches, the assessment of reliability by employing the internal consistency approach appears to be the more attractive option. Not surprisingly, the internal consistency method has been popularly utilised by the academic research community. The scales reliability was tested by utilising the most commonly used index, the coefficient alpha score. Generally, scales that achieve alpha scores over 0.7 are considered reliable (Gerbing and Anderson 1988). As mentioned previously in this study, the Cronbachs alpha scores of all the measurement scales are greater than 0.70 (see Table 5.10 (b) and 5.11 (b)) and the majority of items achieved item-total correlation values greater than 0.5, which demonstrates an adequate internal 206
consistency value (Dillon et al. 1990; Nunnally and Bernstein 1994). However, with the advent of structural equation modelling packages such as LISREL (Joreskog and Sorbom 1988), EQS (Bentler 1995) and AMOS (Arbuckle 1999) a more stringent test of internal stability has been advocated; that the variance extracted by a constructs measure should be equal to or greater than 0.50 (Fornell and Larcker 1981).
At this point, it is worth mentioning that reliability is a necessary but not sufficient precondition for establishing that the scale developed and subsequently utilised accurately represent the domain of the construct examined. As acknowledged by Nunnally (1978), reliability does not simply assure validity.
5.7
The most difficult part of the research process is the selection of appropriate data analysis methods. The critical question faced by most researchers is: What data analysis technique should be used? With regard to this, Kinnear and Taylor (1991) have suggested three basic guidelines in an attempt to identify the appropriate statistical technique to adopt. These are: 1) 2) 3) How many variables are to be analysed at the same time? Does the researcher want to address description or inference questions? What level of measurement (nominal, ordinal, and interval) of the variable of interest is available?
5.7.1
Generally, there are three main classifications of techniques, which are based on the number of variables to analyse; they are known as univariate, bivariate and multivariate. If only one variable is to be analysed at a time, this is known as univariate data analysis, whereas if a relationship of two variables at a time is investigated, it is known as bivariate data analysis. Assessing the relationship of more than two variables at a time is called multivariate data analysis. In the current study, an amalgamation of selected statistical techniques will be employed. Table 5.11 exhibits a summary of the statistical techniques employed for the data analysis and subsequently the results of the analyses which will be documented in the chapters 6 and 7. 207
Table 5.11
Classification Univariate
Inferential analysis
Bivariate
Chapter 5 Chapter 7
5.7.2
Descriptive statistics is a branch of statistics that sets out to summarise data that has been collected. Typically, it will provide estimations of the central tendency (mean), dispersion (standard deviation) and shape (skewness and kurtosis) of the distribution. Generally, preliminary data analysis involves examination of the response frequencies and other descriptive statistics applied to the variables included in the study. It is important to highlight here that in the current study chapter 6 is devoted to documenting the results of descriptive analysis.
On the other hand, inferential statistics permit the researcher to make judgements about the whole population based upon the results generated from samples. It involves a more complicated set of statistical test, for example the paired sample t-test which was employed in this study (see Chapter 6). This statistical technique was utilised in the Importance-Performance Analysis to determine if the mean values of both measures (Performance and Importance) are significantly different.
5.7.3
Level of Measurement
It should be noted that all the constructs integrated in the conceptual framework of this study were uniformly measured on the basis of five-point semantic differential scales and Likert-type scales. These scales satisfied the interval measurement approximations for the purposes of the data analysis employed in this study (Allen and Rao 2000; Byrne 208
1994, 2001; Kline 1998; Naumann and Giel 1995). Interval scales permit valid inferences concerning distribution metrics such as mean values, standard deviations and most importantly, are appropriate for multivariate statistical analysis (Allen and Rao 2000; Hair et al. 1998). Most multivariate statistical procedures assume that data are ordinal if fewer than five points are used (Allen and Rao 2000).
5.7.4
The following section focuses on the main statistical analysis technique utilised specifically for testing the hypotheses formulated for this thesis (see Section 4.3 and 4.4).
5.7.4.1
Correlation Analysis
Correlation is a measure of the degree to which two variables are associated. In this regard, the strength of associations between two variables can be determined when this particular analysis is conducted. To assess the strength of association between variables, Pearson product moment correlation coefficient (r) is widely utilised for interval/ratio scales. The extreme value of r = +1.00 connotes that there is a perfect positive correlation in the bivariate relationship, whereas if the value of r = -1.00, this denotes a perfect negative correlation in the relationship among the variables. When r = 0, it indicates that there is no relationship between the variables of interest. It is important to note that correlation analysis is not used for the purpose of hypothesis testing; however, the correlation matrix illustrates the strength of association among construct dimensions. Additionally, the correlation values could indicate discriminant validity among the constructs incorporated in the structural equation modelling.
It is important to note that the scale reliability and validity analysis techniques, which are comprised of Item-Total Correlation, Cronbachs Alpha Coefficient and Factor Analysis, have been discussed in section 5.6.4 and section 5.6.5, and hence they are not described in this section. Next, the main statistical analysis technique which was employed to test the hypotheses formulated for the study, which is Structural Equation Modelling, hereafter referred to as SEM will be described.
209
5.7.4.2
Building upon the research problems explicitly stated in Chapter 1 (Section 1.3), the purpose of the study is to develop a theoretical model that will be able to elucidate the determinants and the outcomes of customer satisfaction within the direct sales channel. In this regard, the empirical research conducted for this study particularly aims to examine the relationships among the constructs incorporated in the conceptual framework (see Figure 4.2). The model hypothesised interrelationships among multiple independent and dependent variables. It has been suggested by many scholars (Byrne 1998, 2001; Hair et al. 1998; Tabachnich and Fidell 2001), that when attempting to examine simultaneous effects of multiple independent and dependent variables, the best analytical strategy is to use structural equation modelling as a statistical technique. It is widely acknowledged that SEM is a confirmatory analytical tool and its usefulness lies in its ability to estimate the strength of hypothesised relationships of constructs in the proposed cause and effect model (Maruyama 1998). Furthermore, building upon the scale development procedure initiated by Churchill (1979), Gerbing and Anderson (1988, p. 186) confidently propose an updated paradigm by asserting that: confirmatory factor analysis affords a stricter interpretation of unidimensionality than can be provided by more traditional methods such as coefficient alpha, item-total correlations, and exploratory factor analysis and thus generally will provide different conclusions about acceptability of a scale
Additionally, they advocate that exploratory factor analysis should be valuable as a preliminary analysis for scale development but confirmatory factor analysis is required for further estimation and refinement of the resulting measure. Given the importance of developing a reliable and valid measure, a rigorous statistical tool such as structural equation modelling is highly recommended. Similarly, Howard (1977, p.289) made a strong assertion for the application of structural equation modelling in investigating consumer behaviour phenomena. He states that: Structural modelling sharply highlights the intimate, powerful, mutually reinforcing relationship between theory and measurement.
Recently the above notion has received strong support from Mackenzie (2001, p. 160), who maintains that consumer research could be improved by adopting SEM in 210
estimating and testing complex systems of conceptual relationships often specified by the theories of this field of study. In disappointment, he laments that: the diffusion of these ideas into the field has been slow, and some of the most powerful capabilities of latent variable SEM are not being used in consumer research and he claims that even though it was first introduced to the field more than 20 years ago only about 6 per cent of the papers published in the Journal of Consumer Research have tested research theories via SEM.
There are two endemic problems faced by researchers in the social science field, particularly, in this case, the consumer behaviour discipline; the first is the conceptualisation of the concept (unobserved variable) investigated and the second is unavoidable measurement error in data. Fortunately, these issues are addressed by the advent of SEM, which is a powerful tool, empirically proven to be able to deal with these difficulties by utilising simultaneous equation framework with unobserved constructs and manifest indicators which are error-prone (Jedidi et al. 1997).
On the basis of the above arguments, there are sufficient rationalisations for the current study to utilise SEM as the main statistical analysis technique for the testing of its hypotheses. SEM is the only analytical tool that permits a complete and rigorous test of complex phenomenon investigated, such as those found in the current study.
5.7.4.2.1
Structural equation modelling (SEM) is a statistical methodology which has recently become a prominent component of the methodological arsenal of the social sciences (Bollen and Long 1993, p. 1). The term SEM does not refer to a single statistical technique; in fact it is associated with a family of related procedures. For this reason, it is sometimes known as covariance structure analysis, latent variable analysis, causal modelling, linear structural relationship or LISREL (the name of one of the software packages for SEM). It is widely acknowledged as an analytical tool that improves upon and supersedes other statistical techniques such as multiple regression, path analysis, ANOVA, factor analysis and principal component analysis. Since its introduction more than two decades ago, this major breakthrough in the field of multivariate analysis began to gain attention in the consumer research which was further spurred by the development 211
of the first computer programme to implement this general procedure (LISREL) and the publication of Bagozzis (1980) book Causal Models in Marketing further added to its substantive growth (Mackenzie 2001).
Bryne (2001) says that SEM is typically characterised as a statistical technique that is confirmatory in nature and typically portrays the causal relationships of multiple variables. It is comprised of two distinctive features, namely: 1) the causal processes as mentioned above established by a series of structural (i.e. regression) equations and 2) the structural relationships are depicted pictorially to enable clearer visualisation of the proposed theory. The entire system of variables in the hypothesised model can be estimated and tested simultaneously to establish the adequacy of fit of the model with the sample data. If the goodness of fit of the hypothesised model is satisfactory, the model is regarded as plausible, and thus the stipulated interrelationships of variables in the model are tenable.
The core of the SEM technique can be conceived as a fusion of path analysis, confirmatory factor analysis and the evaluation of hybrid models which have features of both of these analysis procedures (Kline 1998). The path analytic element of the model emphasises the structural relationships between constructs integrated in the proposed model, whereas the factor analytic facet focuses on reliability, validity and the degree of the items quality in representing the measure (Dillon et al. 1997). Alternatively, SEM is also viewed as an amalgam of multiple regression and factor analysis in one statistical device. It includes one or more linear regression equations that describe how the exogenous construct is influenced by the endogenous constructs, and their coefficients are called path coefficients or regression weights.
SEM differs significantly from the older generation multivariate analysis such as multiple regression and exploratory factor analysis in various aspects (Bollen and Long 1993; Fornell 1982): 1) Multiple regression limits the researcher to exploring interrelationship among dependent variables, and furthermore only a single relationship between dependent and independent variables can be examined at any one time. Even though statistical techniques such as multivariate analysis of variance (MANOVA) and canonical correlation accept the estimation of multiple dependent variables, they only permit a single relationship between the dependent and independent variables to be 212
examined at any one time (Hair et al. 1998); 2) SEM is an a priori technique, and thus requires researchers to conceptualise it in terms of a model, by specifying variables directionalities and effects among them. It has been argued that a priori does not mean exclusively confirmatory; rather SEM is widely applied as a blend of exploratory and confirmatory analysis (Jreskog 1993; Kline 1998); 3) It is able to take account of measurement error in the estimation procedure (this error threatens the validity of research findings). As such, SEM is viewed as a more powerful method than other traditional multivariate procedures (Bryne 1994; Maruyama 1998; Mackenzie 2001); 3). The system of structural equations comprises of both unobserved9 (i.e. latent) and observed10 (i.e. manifest) variables and it allows these two types of variable to be explicitly represented, thus making it possible to test a wide variety of hypotheses. Given all these desirable characteristics, it has become an indispensable analytical technique for testing and developing theories (Bryne 1994; 2001).
There are two typical classifications of variables in SEM analyses, latent and observed variables. Latent variables represent theoretical constructs (i.e. abstract concepts) which cannot be observed directly, sometimes regarded as factors. In turn, latent variables can be exogenous or endogenous. An exogenous variable is an independent latent variable which acts as a predictor for other variables, while endogenous or dependent variables are determined by other variables within the model (Bollen 1989). Since latent variables are unobservable, logically their measurements are derived indirectly. This is accomplished by linking the unobserved variable to one of the observed (manifest) variable. In the SEM context, they serve as indicators of the underlying construct they are expected to represent (Byrne 1994). As such, for this reason the credibility of a study is critically depends on the observed variables which are presumed to represent the underlying latent construct (Bryne 2001).
5.7.4.2.2 Important Issues Related to SEM There are three critical issues that must be addressed namely: sample size considerations, the choice of one-step or two-step approach, and the theory driven approach. Sample size is a very important issue that must be addressed in the application of SEM because it will determine whether it is sufficient to execute the model with the given
9 10
The terms latent, unobserved and unmeasured variable are used synonymously to present a hypothetical construct or factor The terms observed, manifest and measured variable are used interchangeably
213
number of parameters to be estimated (Baumgartner and Homburg 1996). It was established that measurement indices in SEM are either directly or indirectly related to sample size, such as significance testing of parameter estimates, model misspecification, model complexity and procedure (Hair et al. 1998). Despite the prevailing notion of the importance of sample size, no definite or absolute sample size rule has been stipulated by the previous scholars. However, Hair et al. (1998) posit that 200 is the critical sample size. Sample size determination has been discussed previously in Section 5.6.3.2. Following examination of the related literature and given the complexity of the proposed conceptual model by this study, a sample size of 400 was deemed appropriate.
In the SEM literature, the issue of whether a one-step or a two-step approach is the most appropriate has been hotly debated. The two-step approach initially involves the assessment of the validity of the measurement model. Once the validity of the measurement model is established, the researcher could proceed to the second step, which is the estimation of the overall structural models (Anderson and Gerbing 1988). The one-step approach entails both the measurement and structural model being estimated simultaneously (Hair et al. 1998). This approach is considered appropriate when the model possesses a strong theoretical rationale and the measures used in the study are highly reliable (Hair et al. 1998). Although the latter procedure has its supporters (Fornell and Yi 1992; Kumar and Dillion 1987), the majority of SEM researchers prefer the two-step approach (Anderson and Gerbing 1992, Byrne 2001; Koufteros 1999). They have argued that it is difficult to achieve a good model fit in a single step.
In the application of SEM, theoretical justification is required for specifications of the dependence relationships, modifications of parameters and other facets of the model. Theory has been described by Hair et al. (1998, p. 624) as a systematic set of relationships that provides a consistent and comprehensive explanation of a phenomenon. In this regard, theory is not considered to be strictly derived from academic research; it could also stem from experience and practice generated by observation of actual behaviour.
214
5.7.4.2.3
The true value of SEM comes from the benefits of using the structural and measurement models simultaneously, each playing distinct roles in the overall analysis (Hair et al. 1998, p.626) In order to ensure that the measurement and the structural model are precisely specified and the results are valid, researchers have proposed several stages and steps in the SEM process. For example, Hair et al. (1998) suggest a seven-step process whereas Kline (1998) postulates a six-step procedure and Bollen and Long (1993) describe the SEM process in five stages. Upon reviewing the SEM literature, it could be summarised that the stages involve in the SEM analysis process suggested by Hair et al. (1998) is well structured and thus easy to comprehend, especially for those who are novice to this statistical technique, who will find the step-by-step explanation to be a user-friendly SEM road map. Hair et al.s (1998) seven-step structure will be used to describe the SEM analysis process (see Figure 5.6).
215
Figure 5.6
Step 1
Step 2
Step 3
Step 4
Choose The Input Matrix Type and Estimating The Proposed Model
* Correlations / Covariance * Sample size / Select method of model estimation
Step 5
Step 6
* * * * *
Assumptions of SEM Identify and correct offending estimates Overall model fit Measurement/Structural model fit Comparison of competing model (s)
Model Interpreting
Step 7
* Examine standardised residuals * Consider modification indices * Identify potential model changes
Model Modification
If modifications are indicated, can theoretical justification be found for proposed model changes?
NO FINAL MODEL
216
Step 1:
As stated earlier, SEM is based on causal relationships among constructs which are strictly grounded by theoretical rationale. It should be emphasised here that the researcher should not assume that this analytical tool, if employed in the analysis, could act as a means of proving causation without any underlying theory to support it and for this reason, SEM should not be used for exploratory research. In addition, it is suggested that in an effort to develop a theoretically based model, researchers must strike a careful balance, weighing incorporating all pertinent variables against the limitations of SEM. Researchers should not omit a concept solely because the model is becoming complicated, although parsimonious and concise theoretical models are recommended where possible. This condition is truly germane within the framework of the present study, which incorporates fifteen constructs in its model. Since the research model is based on the consumption system approach, which includes three subsystems, it inevitably becomes complicated, yet it is believed to coherently present the real consumer consumption decision (Howard 1977; Jedidi et al. 1997; Mackenzie 2001).
Step 2:
A path diagram is constructed by depicting several constructs connected to each other by arrows. A straight arrow denotes a direct causal relationship, whereas a curved line indicates a correlation between constructs (see Figure 4.2 which illustrates the present studys conceptual model). The assumptions that underlie path diagrams are as follows: 1) All causal relationships as portrayed in the diagram connote the theory postulated by the researcher on the basis of inclusion or omission of relationships and the causal paths and correlations can be theoretically justified and 2) Just as in other multivariate techniques, the causal paths among constructs are assumed to be linear.
Step 3:
Convert the Path Diagram into a Set of Structural Equations and Specifying the Measurement Model
Once the conceptual model, which is depicted in a path diagram, has been constructed, the next step is that the model will be specified. Specifying the model means that the causal relationships among variables in the hypothesised model are expressed in the form of a series of equations. It is important to note that the computer software package 217
used to execute SEM for the current study (AMOS) can automatically translate these figure symbols into a series of equations. These equations define the models parameters, which correspond to stipulated relationships among observed or latent variables that the computer eventually estimates with sample data. Generally, SEM models can be decomposed into two sub-models: the measurement model and the structural model (consisting of latent variables). The measurement model postulates the interrelationships between the observed and unobserved variables, which is represented in confirmatory factor analysis (CFA; see Figure 7.1). On the other hand, the structural model portrays the relationships among the unobserved (latent) variables. In other words, the path diagram explicitly defines the specific causal effect either directly or indirectly among the latent variables. Drawing upon the results of exploratory factors analysis, Performance measures (Table 5.9(a)) and Other measures (Table 5.10(a)), were specified into the relevant measurement model. The specification of the measurement model was made by transforming the latent variables derived from exploratory factor analysis with their items (indicators) to a confirmatory mode; the researcher specifies which indicators define each construct (factor). When the model has been completely specified, the next step is to select the type of input matrix (covariant or correlations) to be used for model estimation.
Step 4:
Choosing the Input Matrix Type and Estimating the Proposed Model
It should be noted that SEM uses only the covariance or correlation matrix as its input data, as its focus is not on individual observations but on the pattern of relationships among respondents (Hair et al. 1998). If the researcher attempts to test theory, covariance should be employed, and on the other hand if a pattern of relationships is investigated, then the correlation matrix is sufficient. Inevitably, the correlation matrix has been widely employed because it is easier to interpret as opposed to results generated from the covariance matrix. However, usually the preferred mode of analysis uses raw data input, perhaps imported from an SPSS data file to the SEM program, which then computes the covariances as part of its analysis. Consequently, sample size determination should be taken into account because it plays a critical role in the estimation and interpretation of SEM findings. The arguments pertaining to sample size determination have been discussed in Sections 5.6.3.2 and 5.7.4.2.2.
218
After the structural and measurement models are specified and the input data type is chosen, the computer program to be used for estimation must be determined. There are many alternatives programs available for conducting SEM, including LISREL (Linear Structural Relations; Joreskog & Sorbom 1993), EQS (Equations; Bentler, 1995), and AMOS (Analysis of Moment Structure; Arbuckle, 1997) among others. SEM was once cumbersome and seen as an esoteric method that is difficult to learn and use. This is no longer the case, because recently most of these programmes have been made available in versions that can be run on personal computers with an added sophisticated graphical user interface (Mackenzie 2001). Indeed, much of its widespread applications lies in the ease with which it allows non-statisticians to solve estimation and hypothesis testing problems that would once have required the service of a specialist.
An interesting aspect of AMOS is that it was developed with the Microsoft windows interface and it has gained increasing acceptance recently as it is an add on to the SPSS statistical programme (Arbuckle 1997). Execution of SEM is easy with AMOS; for example, its graphical interface with the drag-and-drop drawing features provides rapid model specification. It accepts path diagrams as a model specifications and displays parameter estimates graphically on a path diagram. Most importantly, all modelling can be performed within the diagrams and no programming skills are required. In essence, perhaps its user friendly features make it suitable for novice SEM user. Based on the above appealing reasons, this research implemented SEM with AMOS (version 4).
Step 5:
The issue of model identification focuses on whether or not there is a unique set of parameters consistent with the data (Bryne 2001). It seeks to explore the extent to which the information provided by the data is adequate to enable parameter estimation. A statistical model is identified if the known information available indicates that there is one best value for each parameter in the model whose value is not known. In order to be identified, there should be at least as many observations as free parameters. If a model fails to meet this requirement, attempts to estimate its parameter may not be fruitful.
There are three levels of model identification (Bryne 2001; Shumacker and Lomax 1996). The structural model could be just-identified, over-identified or under-identified. Models with an equal number of parameters and observations are described as just 219
identified or saturated. In this situation, the number of data parameters yields a trivially perfect fit, the chi square (2) ) and the degree of freedom are equal to zero, and therefore can never be rejected (Bryne 2001; Tabachnick and Fidell 2001). On the other hand, models that have more parameters than observations are described as under-identified. In turn, models which are comprised of fewer parameters than observations are referred to as over-identified and may not fit the data, because the discrepancies between the model and data (Kline 1998). It should be noted that in this research, none of the models estimated were under-identified.
Step 6:
Evaluation of model fit demonstrates how well the a priori model fits the observed data (Kline 1998). This statistical procedure takes into consideration features of the data, the model and the estimation method. Basically, there are three categories of SEM goodness-of-fit index. The first type is the absolute fit measure, which ascertains only the overall model fit, such as the likelihood ratio 2 statistics, the goodness-of-fit (GFI), the root mean square residual (RMSR) and the root mean square error of approximation (RMSEA). The second type is an incremental fit measure, which compares the proposed model to some baseline model; this is most often referred to as the null or independence model. This measure is comprised of the adjusted goodness-of-fit index (AGFI), the normed fit index (NFI), the incremental fit index (IFI) and the comparative fit index (CFI). The last type of goodness-of-fit measure consists of the parsimonious fit measure, which includes the parsimonious normed fit index (PNFI), the parsimonious comparative index (PCFI) and the parsimonious goodness-of-fit index (PGFI). Table 5.12 provides descriptions of these fit indices.
Model evaluation is one of the most unsettled and difficult issues with regard to structural modelling (Arbuckle and Wothke 1999; Bollen and Strong 1993). However, it should be highlighted here that several scholars (see Bollen and Long 1993, p.7) have reached a consensus with regard to the assessment of model fit: 1) A first principle is that the best guide to ascertain model fit is strong substantive
theory. It is argued that if the model makes little substantive sense, it is difficult to justify it even if its statistical fit is excellent.
220
2)
The chi-square test statistic should not be the primary basis for determining model
fit. It is widely acknowledged that the null hypothesis of perfect fit is not credible for a study with a large sample. To draw conclusions about model fit based only on the chisquare test statistic is questionable.
3)
unsettled argument about overall fit, it is unwise to evaluate model fit by using one fit index rather than several. Specifically, advocated many scholars advocate that one or more measures from each type be employed in order to gain consensus across the various types of measure concerning the acceptability of the proposed model (Hair et al. 1998; Kline 1998).
4)
The fit of the components of a model should not be ignored. Components of the
model refer to specific aspects, such as the R-square of equations and the magnitudes of coefficient estimates. For instance, the researcher should examine the presence of improper solutions or offending estimates. Excellent fit indices can be untenable due to non-admissible results of the components of the model.
5)
because this will allow the researcher to determine the model with the best fit, rather than making a judgement based exclusively on a single models fit.
Additionally, Kline (1998, p. 130) acknowledges that whatever combination of indices is utilised by the researcher, three caveats of all fit indices should be kept in mind, which are: 1) Values of fit indices imply only the overall fit of a model. Hence, some parts of the model may fit the data poorly even if the value of the index seemed adequate. 2) 3) Fit indices do not demonstrate whether the results are theoretically meaningful. Finally, good values of fit indices do not indicate that predictive validity is also high. To address the question about what is good fit, Kline (1998, p. 131) posits that, there is no single answer to the question about what is good fit and further points out that the more criteria a model satisfies (see Table 5.12), presumably the better is its fit. 221
Similarly, Bollen and Long (1993, p. 9) admit that, The test statistic and fit indices are very beneficial, but they are no replacement for sound judgement and substantive expertise.
Clearly, the above suggestions are intuitively and logically appealing, hence the present study adheres to them. Table 5.12 summaries the characteristics and acceptable levels of fit for each of the fit indices.
222
Table 5.12
Fit index
1) Measure of Absolute Fit Chi-Square (2) statistic * With degree of freedom and probability of significant difference Test of the null hypothesis that the estimated variance-covariance matrix deviates from the Non significant (2) at sample. Greatly effected by sample size. The least p-value >0.05 larger the sample, the more likely it is that the p-value will imply a significant difference between model and data. Chi-square statistics are only meaningful taking into account the degrees of freedom. Value smaller than Also regarded as a measure of absolute fit and 2 and as high as 5 parsimony. Value close to 1 indicate good fit but value less than 1 imply over fit Representing a standardised summary of the Value < 0.05 good fit average covariance residuals. Covariance Value 0.1 - 0.05 is residuals are the differences between observed adequate fit and model-implied covariances. Representing how well the fitted model Value 0.05 to 0.08 approximates per degree of freedom is adequate fit Representing a comparison of the square residuals adjusted for the degree of freedom Value >0.95 good fit 0.90 0.95 adequate fit
Standardised Root Mean Square Residuals (SRMR) * Root Mean Square Error of Approximation (RMSEA) * Goodness-of-Fit (GFI) * 2) Incremental Measures Fit
Adjusted Goodness of fit index (AGFI) Bentler-Bonett normed fit index (NFI)
Tucker-Lewis Index (TLI)* also known as BentlerBonett non-normed fit index (NNFI) Comparative Fit Index (CFI)* identical to Relative Non centrality Index (RNI) Bollens incremental index (IFI) * 3) fit
Goodness-of-fit adjusted for the degree of freedom. Less often used due to not performing well in some applications. Value can fall outside 0 1 range. Representing a comparative index between the proposed and more restricted, nested baseline model (null model) not adjusted for degree of freedom, thus the effects of sample size are strong Comparative index between proposed and null models adjusted for degrees of freedom. Can avoid extreme underestimation and overestimation and robust against sample size. Highly recommended fit index of choice. Comparative index between proposed and null models adjusted for degrees of freedom. Interpreted similarly as NFI but may be less affected by sample size. Highly recommended as the index of choice. Comparative index between proposed and null models adjusted for degrees of freedom
Close to 1 very good fit Value >0.95 good fit 0.90 0.95 adequate fit
The index takes into account both the model Higher values indicates being evaluated and the baseline model better fit, comparison between alternative models The index takes into account both the model being evaluated and the baseline model Same as above
Note: * fit-index employed in the present study. A broad mix or indices is employed Source: Adapted from Arbuckle and Wothke 1999; Bollen and Long 1993; De Wulf 1999 and Kline 1998
223
Step 7:
Once the model is deemed acceptable, the researcher may wish to examine possible model modifications to improve the theoretical explanation or the goodness-of-fit. Model modification could be derived from examination of the residual of the predicted covariance or correlation matrix. Standardised residuals (also known as normalised residuals) with values greater than 2.58 are considered statistically significant at a 0.05 level, which signifies substantial prediction error for a pair of indicators (Bryne 2001). Another way of ascertaining the fit of a specified model is the modification index; a value of 3.84 or greater suggests that a statistically significant reduction in the Chi-square would be obtained when the coefficient is estimated (Hair et al. 1998). However, Bollen and Long (1993), Hair et al. (1998), and Byrne (2001) cautioned that researchers should not make model changes based only on the modification of indices; some theoretical justification should be available before its implementation.
5.8
Concluding Remarks
This chapter has documented various themes pertinent to the empirical approach adopting a quantitative research methodology. These are comprised of a section on the preference in relation to research design employed, the data collection method adopted, and an explicit discussion of the development of research measure, which encompasses a five-step procedure including a detailed description of the assessment of the scales reliability and validity. It is important to note that in this section, the results of item analysis and exploratory factor analysis of core constructs and other constructs incorporated in the study were discussed. Accordingly, these items will be further examined via a more rigorous statistical method, namely is structural equation modelling, in Chapter 7. Subsequently the questionnaire development process which was advocated by Churchill and Iacobucci (2002) and De Wulf (1999) was used as a guideline to describe the process. Finally the data analysis methodology, which focuses on the choice of statistical techniques employed for testing the hypotheses for the present research, was discussed.
The following two chapters (6 and 7), will assess and discuss the empirical findings of the analyses. Chapter 6 will specifically focuses on the results of the descriptive analysis, while chapter 7 will document results pertinent to the measurement model and the structural model. 224