Sie sind auf Seite 1von 24

Session 2

“Getting Started”
Core Skills for Data Processing
ORSC 2004 - Internal Training

1
1 Core Skill Training Session Six: “Data Analysis”
Objective

At the end of the training program, participants


should be able to
 Understand data layouts
 Understand how tables will look like
 Defining data structure for various formats of data
 Understand coding conventions
 Get an appreciation of basic elements

2
Various data formats

 Questionnaire data can be computerised in many ways

 Market Research software mostly uses FLAT files

 There are customised software available for capturing MR


data

 QINPUT, MERLIN, Surveycraft are some of the most


popular ones

3
Single Card data
Serial Number/ Respondent ID
1000290022 00061860200310041324 040800100000000000 1.3979167 R1

1000390022 00061860200310041359 040800100000000000 0.6460563 R2


1001210022 00061860200310041249 040800100000000000 0.8865789 R3
1013240022 00061867200310051800 040800100000000000 0.6759740 R4
1013250022 00061867200310051831 040800100000000000 0.8857447 R5
1013260022 00061867200310051842 040800100000000000 1.3810526
1013300022 00061867200310051857 040800100000000000 1.5300000
1015240022 00062321200310041216 040800100000000000 1.4328262

Record length
Respondent ID is the unique ID for the record
Number of lines in the file = Sample Size
Maximum Length of record = 32,767 (Size of integer)
4
Multicard data
00048011 01 04070917213204070917374232570237550
000480202837525750 111020744t242-345235849862468-2486
R1 0004803 1 111-4 208050505050810 245248609824096
0004804001010 55334333333433453145555413155 646890
0004805 2115245444433353443442343435514334333 425924
00070011 01 040709173010040709175624 245982496
000700201395277173 231019074646464060
R2 0007003 1 112-7 105080803050308 426246
0007004030707 33543553245533535255452355555553
0007005 21113123322&2133222122431232323212313

Each respondent will have more than 1 line of information called “CARD”
In general the length of card is 99 characters
Can also have more than 99 card length
Unique identification in this data format is Respondent ID + Card ID
Maximum Length of record = 32,767 (Size of integer). Maximum record
Length in this case is sum of record lengths of all cards

5
Quantum data format
 Quantum can handle both single card/ multicard data formats

 In both the formats, quantum allows something called multi-


punch

 In multi-punch data format, each column is capable of holding


12 values – the individual constants, 0123456789-&.

 Any combination of the above 12 codes (punches) can exist


in a single column

 The advantage of using this format is more data can be fit


into the available maximum record length – 32,767 chars

6
Introducing Quantum – What does it do?
 Check and validate the data
 Edit and correct the data
 Produce different types of lists and reports of data
 Produce new data files
 Recode data and produce new variables
 Generate tables
 Perform Statistical Calculations

7
Underlying concepts
Quantum consists of 2 phases or sessions

For each questionnaire:


-Check and correct data
-Modify/ Recode data

Edit Count questionnaires


Section Produce Tables
Format tables

Tabulation
Section

8
Underlying concepts

Edit section
 
•Data examination
•Data modification
•Data correction
 

Tables section
 
•Cross tabulation of data
•Control statements to determine layout

9
Layout of a table
Table title
Project Heading
X-break

Base size
Base
Title

Side
headings

Frequency

Percentage

Mean score
10
Coding conventions
A Quantum program is a file created using an editor – Text
editor
  The tables section consists of statement types

Each statement starts on a new line


Each statement consists of parameters and options
A statement may be up to 200 characters
  The standard Quantum separator is the semi-colon (;)

  Long statements may be continued on new lines with a + in the first


position. In certain cases long statements may be continued with a ++
in the first position
  Comments are denoted by /* at the start of the line. You may see
Quantum programs that use C at the start a line for comments.

11
Coding conventions
A Sample of Quantum Program

/*

/* Here is a comment

/*

tab q5 brk1;c=c115’1’;nz

+dsp

12
Fundamentals and Terminology

13
Fundamentals
Individual constants

These are ASCII characters or multicodes which are any combination of


the codes 1234567890-& or blank alone. They are enclosed in single
quotes: ‘1’ ‘2’ ‘123’ ‘ ‘…. A slash (/) between two numbers denotes
‘through’ in the order &-01234567890-&.

 Punch codes are referenced in apostrophes. Punches are listed


individually and range of punches is denoted by a / to represent through

Examples: 

‘1’ Punch 1 ; ‘123’ Punches 1 or 2 or 3

‘1/5’ Punches 1 or 2 or 3 or 4 or 5; ‘ ‘ no punches (blank)


 Order of punches is & - 0 1 2 3 4 5 6 7 8 9 0 - &
 ‘&/9’ is the same as ‘1/&’

14
Fundamentals
Individual constants

The – punch is sometimes referred as the 11th or X punch, and & is


sometimes referred as 12th or Y or V punch.

Each code represents one answer to a question. For example,


‘What is your favorite color?’ which has the response list:
Red : 1
Yellow : 2
Blue : 3
Green : 4
Black : 5
White : 6
coded into one column. If my favorite color is green, this will appear in the data file as a
4 in the appropriate column, just as if your favorite color is red, there will be a 1 in that
column.
15
Fundamentals
Strings of Data Constants

Strings are lists of single ASCII characters. They are


enclosed in dollar signs ($).

Strings are referenced in dollar signs


Refer to more than one column of data
Examples:
$1234$
$ABC$
$ $

16
Fundamentals
Numbers

- Whole Numbers

- Real Numbers

Variables: Variables or arrays may be defined as being data,


integer or real types. Names up to 10 chars.

Example: int unit 1

real weight 10s

whenever “s” is used varn is interpreted as var(n)

17
Variables/ column referencing
 Columns are referred by their actual position in the data. This means, if you open the
data file in any editor and see the cursor position on which the data is highlighted, the
column position refers to the cursor position

 In the case of single card data file, the actual column position itself is directly used for
referring to a column. For example, c12 refers to column 12 in a single card data file

 In the case of milticard data file, the column should be referred in combination with
the card number. The format of column referencing is “cXNN” if the number of cards
are less than 9 and “cXXNN” if the number of cards are more than 9. Where X refers
to the card number and NN refers to the column position. One digit column positions
should be referred by preceding the column number with “0”.

Example: c108 refers to 1st card 8th column

c412 refers to 4th card 12th position

c1009 refers to 10th card 9th position

18
Variables/ column referencing
 A series of columns may be considered as either string or numeric and is
referenced as c(m,n) where m is the start column position and n is the
end column position

Examples:

c(12,15) refers to columns 12 to 15 in a single card data file

c(106,110) refers to columns 6 to 10 of 1st card in a multicard data file

19
Describing Data Structure

20
Data Structure
 By default Quantum reads one record or a line from your data file at a
time. Each record may be up to 100 columns long

 Most Market Research surveys consist of multi-card records

 Some surveys consist instead of long records with more than 100
columns of data

 These data structure must be described on the struct statement

 Format: struct;options

 The “struct” statement must be the first statement in your program

21
Data Structure – contd..
Specifying Long records

  struct;reclen=n

where n is the length of the record in columns

the maximum length of a record is approximately 32,000 columns

 Specifying Multi-card Data Sets

This is the most common form of struct statement

struct;read=2;ser=c(m,n);crd=c(p,q)

Where, read = 2 denotes a multi-card set; ser = defines the


columns of the serial number; crd = defines columns of the card
number

Example: struct;read=2;ser=c(1,4);crd=c80

22
Data Structure – contd..
When a multi-card set is read, the cards are defined as follows: 
Card 1 Columns 101-200
Card 2 Columns 201-300
Card 3 Columns 301-400
Card 4 Columns 401-500
…..
Card 10 Columns 1001-1100

By default a maximum of 9 cards are permitted in a set.

Reading Multi-card data sets with 10 or more cards

The option max=n is used to define the maximum number of cards in the set

Example:

struct;read=2;ser=c(1,5);crd=c(6,7); max=19

23
Data Structure – contd..
Checking the structure of multi-card data sets

 Quantum automatically checks for - Duplicate card types within serial number
and adjacent duplicate serial numbers

 It is not mandatory that all cards should be present for every respondent in a
multicard data file

 It is possible check that specific cards are present using req=

 Example:

  struct;read=2;ser=c(1,5);crd=c(6,7); max=19;req=1,2

 In this example each record must have a card 1 and 2 present. If either or both
are missing the record will be rejected

 If you require a series of cards to be present specify the first and last separated
by a slash

  struct;read=2;ser=c(1,5);crd=c(6,7); max=19;req=1/5

24

Das könnte Ihnen auch gefallen