Beruflich Dokumente
Kultur Dokumente
WHAT IS QUANTUM AND WHAT DOES IT DO?..................................................................................................................4 Stages in a Quantum run:...........................................................................................................................................4 Basic Elements In Quantum........................................................................................................................................5 Different Number types that can be used in Quantum:...............................................................................................7 Variables and arrays...................................................................................................................................................7 Data variables.................................................................................................................................................... 8 Integer variables................................................................................................................................................. 8 Real variables..................................................................................................................................................... 9 Subscription..................................................................................................................................................... 10 Expressions...............................................................................................................................................................11 Arithmetic expressions....................................................................................................................................... 11 Combining arithmetic expressions........................................................................................................................ 12 Counting the number of codes in a column...............................................................................................................14 Generating a random number ..................................................................................................................................15 Logical expressions...................................................................................................................................................15 Comparing data variables and data constants..........................................................................................................16 Fields of data variables.............................................................................................................................................19 Checking the arithmetic value of a field of columns.................................................................................................22 Combining logical expressions..................................................................................................................................23 Comparing variables and arithmetic expressions to a list.........................................................................................26 Naming lists..............................................................................................................................................................28 Speeding up large programs.....................................................................................................................................28 How Quantum reads data..........................................................................................................................................29 Types of record............................................................................................................................................ 29 Ordinary records.......................................................................................................................................... 29 Multicard records.......................................................................................................................................... 29 Multicard records with Trailer Cards.........................................................................................................................30 Reading data into the C array.............................................................................................................................. 30 Ordinary records............................................................................................................................................... 30 Multicard records............................................................................................................................................... 30 Ignoring card types........................................................................................................................................... 30 Processing the data...................................................................................................................................................30 Changing the contents of a variable..........................................................................................................................31 Trailer Cards .............................................................................................................................................................31 Allread............................................................................................................................................................. 32 firstread and lastread......................................................................................................................................... 32 Reserved variables............................................................................................................................................ 32 Describing the data structure for Multicard records............................................................................................ 32 Record type...................................................................................................................................................... 32 Ordinary Records......................................................................................................................................... 33 Multicard Records......................................................................................................................................... 33 Record length................................................................................................................................................... 33 Serial number location....................................................................................................................................... 34 Card type location............................................................................................................................................. 34 Required card types........................................................................................................................................... 35 Repeated card types.......................................................................................................................................... 35 Highest card type number................................................................................................................................... 36 Dealing with alphanumeric card types................................................................................................................... 37 Merging Data using Quantum....................................................................................................................................38 Merge sequence for Trailer Cards......................................................................................................................... 38 Merging data files.............................................................................................................................................. 38 Merging complete cards...................................................................................................................................... 39 Merging a field of data from an external file........................................................................................................... 40 Writing out data........................................................................................................................................................42 Print files......................................................................................................................................................... 43 Printing out individual records............................................................................................................................. 43
Writing Out Parts of Records............................................................................................................................... 46 Data files......................................................................................................................................................... 47 Creating new cards............................................................................................................................................ 48 Some General Instances for forcecoding cleaning etc..............................................................................................48 Writing to a report file........................................................................................................................................ 48 Assignment statements...................................................................................................................................... 49 Copying codes.................................................................................................................................................. 50 Assignment with and, or and xor......................................................................................................................... 51 Adding codes into a column................................................................................................................................ 52 Deleting codes from a column............................................................................................................................. 53 Forcing single-coded answers.............................................................................................................................. 53 Setting a random code in a column...................................................................................................................... 54 Reading numeric codes into an array.................................................................................................................... 54 Clearing variables.............................................................................................................................................. 57 Flow control...................................................................................................................................................... 58 Statements of condition...................................................................................................................................... 58 Examining records............................................................................................................................................. 61 Holecounts....................................................................................................................................................... 61 Frequency distributions...................................................................................................................................... 61 require............................................................................................................................................................ 63 Column and code validation................................................................................................................................ 63 Comments with require...................................................................................................................................... 64 Checking codes in columns................................................................................................................................. 65 Exclusive codes................................................................................................................................................. 65 Automatic error correction.................................................................................................................................. 66 Validating logical expressions.............................................................................................................................. 67 Testing the equivalence of logical expressions........................................................................................................ 68 Actions when a require statement fails.................................................................................................................. 68 Data correction................................................................................................................................................. 69 Forced editing (forced cleaning)........................................................................................................................... 69 Introduction to the tabulation ..................................................................................................................................70 The hierarchy of the tabulation section................................................................................................................. 71 Components of a tabulation program.................................................................................................................... 71 Run control statements...................................................................................................................................... 71 Defining run conditions....................................................................................................................................... 71 Table control statements.................................................................................................................................... 73 Creating a table................................................................................................................................................ 73 commonly used options in tab section................................................................................................................... 74 Axis control statements...................................................................................................................................... 75 factors............................................................................................................................................................. 78 Miscellaneous n statements............................................................................................................................... 78 More commands to generates counts.................................................................................................................... 79 The col statement.............................................................................................................................................. 79 The val statement............................................................................................................................................. 80 The fld statement.............................................................................................................................................. 80 Weighting in Quantum..............................................................................................................................................81 Weighting methods............................................................................................................................................ 82 Types of weighting............................................................................................................................................ 82 Descriptive statistics.................................................................................................................................................84 Quanvert...................................................................................................................................................................84 Structure of Quantum Spec:...........................................................................................................................................87
A. First, the data is read onto a disk. Data on disk can come from a number of different
sources, for example: o It may be entered directly via a terminal by a telephone interviewer using Quancept CATI. o It may be collected over the World Wide Web using software such as Quancept Web. o It may be entered directly into a computer by an interviewer conducting a personal interview using Quancept CAPI. o It may be entered by a data entry clerk using a data entry package.
B. Next, the tasks to be performed are defined using the Quantum language. C. Then, Quantum translates these tasks into instructions that the computer can
understand.
D. Finally, the computer itself uses this program to run your job.
Quantum comprises two sections an edit and tabulation section. The edit section checks and validates the data, generates lists and reports, corrects data, produces new data files, and recodes data and creates new variables. The tabulation section produces tables and performs statistical calculations. Quantum reads the records in the data file one at a time and passes them through the various parts of the Quantum program. As long as there are records remaining in the data file, the loop of read a record -edit - tabulate is repeated; once the last record has been processed, the tables are ready for printing.
Which are stored in variables: o o o Data variables store data constants Integer variables store whole numbers Real variables store real numbers
Individual constants An individual constant is one or more of the codes 1234567890& or blank. The is sometimes referred to as the 11 or X punch, and & is sometimes called the 12, V or Y punch. Each code represents one answer to a question. For example, lets take the question What is your favorite color? which has the response list: Red 1
Yellow 2 Blue 3
Green 4 Black 5 White 6 These codes are coded into one column. If my favorite color is green, this will appear in the data file as a 4 in the appropriate column, just as if your favorite color is red, there will be a 1 in that
column. To refer to these answers inside your Quantum program (maybe we only want our table to include those respondents whose favorite color is blue), type in the code enclosed in single quotes: 3 You will also have to tell Quantum which column to look in. Several codes may be combined in the same column and are called multicodes.. Multicodes or multicoding mean two or more codes in the same column. Suppose the next question asks me to choose three colors from the same list; I pick yellow, black and white. If these answers were all coded in the same column (a multicoded column), They would be referred as : 256 or 526 or 652
Or
Any other variation of those three codes. Quantum does not care what the codes are entered in. If you have a series of consecutive codes in the order &01234567890& you may either type each code separately or you may enter the first and last codes separated by a slash (/) meaning through, as shown below: 1/7 means 1234567 &/4 means &01234 &/9 means &0123456789 (all 12 codes) 1/& means 1234567890& (all 12 codes) As you can see, the last two examples mean exactly the same thing. However, the notations 0/& and 0& are not the same: 0/& means 01234567890& whereas 0 & is 0, and & only. Some combinations of codes represent ASCII characters; that is, they represent characters which you can type on your screen: &1 is the equivalent of A &2 is the equivalent of B The only time you would use letters rather than codes (i.e., A rather than &1) is when the questionnaire tells you that a column should contain a letter. Sometimes we may need to write a notation for no codes for instance, if my favorite color does not appear in the list of choices. To do this, we write (i.e., a blank enclosed in single quotes). Strings of data constants To refer to a string of codes in a field of columns, it has to be provided between two $ signs: e.g. $codes$
When data constants are single-coded or the multicodes correspond to ASCII characters (e.g. A, B) they may be strung together. Strings of data constants are sometimes called literals or column fields. Strings are enclosed in dollar signs, with the component single codes losing their single quotes. For example: $12345$ $ABC$ $916 7&$ The first string is five columns long with 1 in the first column, 2 in the second, 3 in the third, and so on. The third string is six columns wide with the fourth column being blank. Instances when strings might be used are: When we want to refer to a questionnaire serial number When the answers to a question are represented by codes of more than 1 digit. For example, in a car ownership survey the car make and model owned may be represented by a 3-digit code. To pick up respondents owning a particular type of car you would need to check whether the relevant columns contained the code for that car. For instance, to look for owners of Ford Escorts you might ask Quantum to search for the string $132$ in a particular field of columns.
Quantum can deal with whole numbers (integers) in the range -2,147,483,647 to 2,147,483,647.
Real numbers are numbers containing decimal points. To be valid, they must have at least one digit on either side of the decimal point: 0.1 and 1.0 are correct .1 and 1. are not Quantum deals with real numbers of any size with accuracy up to six significant figures. Numbers with more than six significant figures have the sixth figure rounded up or down depending on the value of the remaining figures. 96.82529 is rounded to 96.8253 189462.1 is rounded to 189462.0
information stored (e.g., the variable called meals might contain a count of the number of meals eaten during the day) or you may use the ones offered automatically by Quantum. Sometimes it is useful for a series of variables to have the same name. Each variable may then be addressed by its position in the group. This arrangement is known as an array.
Data variables
To define a data variable, type: data var_name sizes <<Syntax>> At the start of every job, Quantum provides you with an array of 1,000 data cells called C. This array is sometimes referred to as the C matrix. The individual cells are called C-variables. Each C-variable stores one column of data. Quantum reads data from your data file into this array. Lets say we have a very small questionnaire which uses 43 columns to store the data. Quantum will read the data for each respondent into cells 1 to 43 of the C array, one respondent at a time. The codes from column 1 of the data are copied into cell 1 of the C array, the codes from column 2 of the data are copied into cell 2, and so on. When Quantum has finished with that respondents data it clears out the cells in the C matrix and reads the data for the next respondent, placing it in cells 1 to 43 of the array we can access this data by defining the columns whose contents we wish to inspect or change. Lets take the questions about color that we mentioned earlier. The printed questionnaire tells us that the respondents favorite color will be coded into column 15, to look at this column we would write: c15 or c(15) C-variables are reset to blank before a new respondents data is read. Thus, you can be certain that Quantum never muddles the contents of column 10 for the first respondent with those of c10 for the second respondent. As we mentioned above, you may create your own data variables to store specific pieces of data. For instance, in a shopping survey we may want to store data about visits to Sainsburys in an array called sains and data about visits to Safeways in an array called safe Before we can use these arrays, we must create them. If each array is to contain 100 cells or column of data, we would write: data sains 100s data safe 100s where the s at the end of each statement causes Quantum to recognize that, for example, safe1 is the same as safe(1), just as it knows that c15 and c(15) refer to the same column of data. If you created the arrays without the s, then Quantum would not recognize safe1 as being the same as safe(1).
Integer variables
To define an integer variable, type: int var_name sizes
To refer to an integer variable, type: name[cell_number] Integer variables store whole numbers. Strings of integer variables are called integer arrays, and each cell in the array may store any whole number from -2,147,483,647 to 2,147,483,647. At the start of each run, Quantum provides an array of 200 integer variables called T. The first cell in this array is the integer variable t1 which may store any value within the given range; the second cell in the array is the integer variable called t2 which may also store any value within the given range. To illustrate the difference between a data variable and an integer variable, lets suppose that our data contains the value of the respondents car to the nearest whole pound. If the value is 6,000, this will take up 4 columns in the data (assuming that we are only concerned with the digits) that is, four data variables, the first of which will contain the 6, and the other three of which will all contains zeroes. If we placed this same value in an integer variable, we would only need one variable to store the whole value because each variable can store values in the range from -2,147,483,647 to 2,147,483,647 We have already mentioned that Quantum provides an integer array of 200 integer variables. You may create your own arrays using statements similar to those shown above for data variables. Suppose you have a household survey in which you have collected the value of each car that the family owns. You want to set up an integer array in which to store each value, so you write: int carval 10s This creates an array called carval which contains ten separate integer variables called carval1 to carval10. Notice that we have followed the array size with the letter s so that we can omit the parentheses from the individual variable names. We can then copy the value of the first car into carval1, the value of the second car into carval2, and so on. If a particular household owns three cars values at 6,000, 2,500 and 500, then carval1 would have a value of 6,000, carval2 would be 2,500 and carval3 would be 500. If you create your own integer variables, it is recommended that you name them with names that reflect their purpose in the run.
Real variables
To define a real variable, type: real var_name sizes To refer to a real variable, type: name[cell_number] You may define real variables and arrays to store real numbers with accuracy up to six significant figures. Values with more than six significant figures have the sixth figure rounded up or down according to the value of the extra figures. As with integer variables, the names of real variables should give some clue to the type of information they contain. Real arrays are created by statements of the form: real liters 5s this example creates a real array called liters which has five
real variables named liters1 to liters5. It can store five real values, the first in liters1 and the fifth in liters5. Quantum also provides a set of 100 real variables named X which you may use. As an example, lets say that the data contains information on how long, on average, each person in the household spent watching television during a given week. We want to manipulate these figures so we create an array of real variables in which to store the average viewing figures real tvwatch 8s this provides room for up to eight peoples figures. If our household contains four people with viewing averages of 20.8 hours, 15.75 hours, 9.75 hours and 10.0 hours, then tvwatch1 will have a value of 20.8, tvwatch2 will have a value of 15.75, tvwatch3 will be 9.75 and tvwatch4 will be 10.0 hours. The rest of the variables in the array have values of 0.0. Reading real numbers from columns To read real values from the C array, type: cx(start_col, end_col) Data from the questionnaire is read into columns for use during the run. When the data contains real numbers you will have to tell Quantum that the dot is to be treated as a decimal point rather than as a multicode representing a number of different answers. The way to do this is to refer to the field as cx: cx(15,20) cx(131,135) Here we have two fields containing real numbers: the first is six columns wide including the decimal place, which means that the number itself contains five digits, whereas the second is only five columns wide with four digits Notice that there is no need to tell Quantum where the decimal point is
Subscription
As we have shown above, you may refer to specific variables in integer and real arrays and cells or columns in data arrays by naming their position in the array. For example: c1 is the first column of the C array t5 is the fifth variable in the T array time3 is the third variable in the array called time seg(2) is variable 2 of the seg array Variables within an array may also be referred to using any arithmetic expression. In this case, parentheses must be used. For example:
c(t1)
the column number depends on the value of t1. If t1 has a value of 10, then the variable is c10; if t1 is 67, the variable is c67.
c(t4,t5)
the field delimiters depend on the values of t4 and t5. If t4 has a value of 12 and t5 has a value of 19, the column field referred to is c(12,19).
t(c4)
the variable number depends on the value in c4. If c4 contains a single code in the range 1 to 9, the integer variable will be one of t1 to t9 depending on the exact value in c4. If c4 is multicoded, then the result is nonsense.
time(c4*23)
the variable number is the result of multiplying the value in c4 by 23 As in the previous example, c4 must be single-coded in the range 1 to 9 for this example to make sense. Thus, if c4 contains just a 4, the value of the expression is 92 so the variable referred to is time92.
When variables are referenced in this way, the value of the expression must be positive. The expression c(t15) is acceptable as long as t1 is at least 5. If the expression has a zero or negative value Quantum will issue an array dimension error when it comes to read the data during the datapass. Also, if the variable refers to columns, the value of the subscript must not exceed 32,767. These are called subscripted variables and they greatly increase the flexibility with which you can write your edit.
Expressions
Quantum recognizes two types of expression arithmetic and logical. Arithmetic expressions are used to produce numeric values and logical expressions, when evaluated, produce a value of true or false.
Arithmetic expressions
The simplest form of arithmetic expression is a single positive or negative number such as 10 or 26.5 or an integer or real variable. Although the C Array is data, columns may also be used in arithmetic when the response coded into those columns is a numeric response, such as a respondents age or the number of different shops he visited. For example, if columns 243 to 247 contain the codes 4,7,2,6 and 0 respectively the value in c(243,247) could be read as 47,260. Similarly, if columns 45 to 48 contain 7, 8, a dot and 2 respectively, the value in cx(45,48) would be 78.2. Blank columns in a field are ignored when the codes in those columns are evaluated. Thus, if columns 20 to 21 contain the codes 6 and 7 respectively, and column 22 is blank, the codes in c(20,22) will be evaluated as 67. A similar result is produced if the blank column appears anywhere else in the field. All the examples of c(20,22) below produce an arithmetic value of 67:
The same applies to multicoded columns. If you use a multicoded column as part of an arithmetic expression, the multicoded column will be ignored. The exception to this is a multicode of a digit and a minus sign which creates a negative number: a minus sign anywhere in a numeric field negates the value in the field as a whole, not just then number it is multicoded with.
For example:
t5 + c(134,136) / tot c(150,152) * 10 + 2.5 Quantum evaluates such expressions in the following order: 1. Expressions in parentheses. 2. Multiplication and division 3. Addition and subtraction
If you wish to change this order you should enclose the expressions which go together in parentheses. The first expression in the example above will be evaluated by dividing the value in columns 134 to 136 by otot and adding the result to t5. If you change the expression to: (t5 + c(134,136)) / tot this adds the values of t5 and c(134,136) first and then divides that by otot. Lets substitute numbers and compare the results. If t5=10, otot=5 and the value in c(134,136) is 125 the two versions of the expression would read as follows: 10 + 125 / 5 = 35 and (10 + 125) / 5 = 27 Where two integer expressions are combined, the result is integer (any decimal places are ignored), but if an expression contains a real then the result will be real. Therefore, if t1=5 and t2=3, then:
If you use parentheses in expressions which contain both integer and real variables, you need to take extra care to ensure that your expression is producing the correct results. Lets look at an example to illustrate how an expression can look correct but can still produce unexpected results. If we assume that t40=2 and t41=70, the expression: t40 * 100.0 / t41 yields a result of 2.8 (i.e., 200.0/70). The final value will be 2.8 if the result is saved in a real variable, or 2 if it is saved in an integer variable. If we use parentheses: (t40 / t41) * 100.0 the result is 0.0 (or 0 if saved in an integer variable). The reason for this is as follows Because Quantum evaluates expressions in parentheses before it deals with the rest of the expression, it treats that expression as integer arithmetic. The rules for integer arithmetic dictate that real results are truncated at the decimal point, so the true result of 0.28 becomes 0. Any multiplication involving zero is always zero, so the final result is zero. If you find that a run gives unexpected
zero results, try looking for expressions of this type and checking whether the parenthesized part of the expression has been truncated because the integer division results in a decimal number.
To count the number of codes in a column or list of columns, type: numb(cn1[codes], cn2[codes], ... ) If any columns are followed by a code reference, only those codes will be counted for those columns. The function numb is an arithmetic expression which counts the number of codes in a column or list of columns. Its format is: numb(cn1,cn2, ... cnn) where cn1 to cnn are the columns whose codes are to be counted. So, if we wanted to count the number of codes in columns 132 to 135 we would type: numb(c132,c133,c134,c135) Notice that even though the columns are consecutive, each one is entered separately, with each column number preceded by a c. It is incorrect to define only the start and end columns of a field when using numb. Therefore it is wrong to write numb(c(132,135)) or numb(c(132,135)) and, if you write statements such as these, Quantum will flag them as errors. Sometimes you will only be interested in certain codes, for instance you may want to know how many 1, 2 or 3 codes there are in a group of columns. In this case the function is entered as: numb(cnp1,cnp2, ... cnnpn)
where p1 to pn are the codes to be counted. Only the named codes are counted any others appearing in the columns are ignored. Lets say our data on card 1 is as follows:
and we want to count the number of codes in column 115 and also the number of codes in the range 5/8 in columns 121 and 157. The expression would be entered as: numb(c115,c1215/8,c1575/8) When Quantum checks these columns and codes, it will tell us that there are 9 codes in these columns which are within the given ranges. These codes are all four codes in column 115 (we did not specify which codes to count in that column), codes 5 and 6 in column 121 (codes 2 to 4 are outside the given range), and codes 5 to 7 in column 157 (codes 1 to 4 are outside the given range). Generating a random number To generate a random number in the range 1 to n, type: random(n) Quantum can generate random numbers automatically with the random function: random(n) where n is the maximum value the random number may take. So, to generate a random number in the range 1 to 100, the expression would read: random(100) The number produced may be saved for later use in an integer variable or column, thus: rnum=random(32) c(110,112)=random(156) When using random with columns, always make sure that the number of columns allocated to the number is sufficient to store the highest possible number that can be generated. In our example, we need three columns in order to store numbers up to 156. Logical expressions
Logical expressions are used for comparing values, codes and variables.
Comparing values to compare the values of two arithmetic expressions, type: <<arith_exp>> log_operator <<arith_exp>> where log_operator is one of the operators .eq., .gt., .ge., .lt., .le or .ne
Values are compared when you need to check whether an expression has a given value for example, did the respondent buy more than 10 pints of milk? Values are compared by placing arithmetic expressions on either side of one of the following operators: Exp. .eq. .gt. .ge. .lt. .le. .ne equal to greater than greater than or equal to less than less than or equal to not equal to / unequal to Value
If the number of pints of milk that the respondent bought is stored in columns 114 and 115, the expression to check whether he bought more than ten pints would be: c(114,115) .gt. 10 If the number in these columns is greater than ten the expression is true, otherwise it is false. Earlier we have said that integer variables may take numeric values or the logical values true and false depending upon whether or not the value is zero. To check whether the respondent bought any packets of frozen vegetables, we can either write: fveg .gt. 0 To check the numeric value of the variable fveg, or we can simply say: fveg to check whether the logical value of fveg is true. To check whether fveg is false (i.e. zero), we would write .not. fveg Comparing data variables and data constants In virtually every Quantum run you will want to check which codes occur in which columns. This is easily done using logical expressions. There are several forms of expression depending on whether you are checking a column or a field of columns.
Data variables
To test whether a data variable contains at least one of a list of codes, type:
var_namecodes To test whether a data variable contains none of the listed codes, type: var_namencodes To test whether a data variable contains exactly the given codes and nothing else, type: var_name = codes To test whether two data variables contain identical codes, type: var_name1 = var_name2 To test whether a data variable contains codes other than those listed, type: var_nameucodes To test whether two data variables do not contain identical codes, type: var_name1uvar_name2 To check whether a column or data variable contains certain codes, place the codes, enclosed in single quotes, immediately after the name of the column or data variable: e.g. c11 c15623 brand5 The expression: Cnp checks whether a column (n) contains a certain code or codes (p). The expression is true as long as column n contains at least one of the given codes. It does not matter if there are other codes present since these are ignored. For example, to check whether column 6 contains any of the codes 1 through 4 we Would type: c61/4 The expression is true if C6 contains any of the codes 1, 2, 3 or 4 or any combination of those odes, regardless of what other codes may also be present. For instance:
----+----1 5 7 9 is false. In our original example we chose the codes 1 through 4. You can, of course, use any codes you like and they may be entered in any order.
cnNp which checks that a column does not contain the given code or codes. The expression is true as long as the column does not contain any of the listed codes. For example: c478n5/7& is true as long as column 478 does not contain a 5, 6, 7 or & or any combination of them. A multicode of 189 returns the logical value true, because it does not contain any of the codes 5/7& whereas a multicode of 1589 makes the expression false because it contains a 5. The = operator is used to check that the contents of a column are identical to the given codes. The expression: c312=1/46 is true as long as c312 contains all of the codes 1 through 4 and 6, and nothing else. The expression: c142= checks that column 142 is blank. The equals sign is optional when checking for blanks, so we could simply write: c142 to check whether column 142 is blank. The = operator may also be used to compare the contents of two data variables. For example: c56=c79 checks whether c56 contains exactly the same codes as c79. If so, the expression is true, otherwise it is false. If we have +----6----+ ... +----8---1 5 1 5
+----6----+ ... +----8---1 5 9 yields the value false because column 79 contains a 9 when column 56 does not. If you have defined your own data variables, you could write a statement of the form: brand1=c79 to check whether the data variable called brand1 contains the same codes as c79. 1 5
The opposite of = is U (unequal): cnUp This checks whether column n contains something other than just the code p. Suppose we have two sets of data: ----+-----5 ----+-----5 1 4 7 1 5 9
and we write: c44u7 The expression is true for both sets of data. In the first example, the 7 is multicoded with a 1 and a 4, while in the second example, column 44 does not contain a 7 at all. The only time this expression is false is when column 44 contains a 7 and nothing else
To test whether a field contains a given list of codes, type: var_name(start, end) = $codes$
To test whether two fields contain identical strings, type: var_name1(start1, end1) = var_name2(start2, end2)
To test whether the codes in one field differ from a given string, type: var_name(start, end)u$codes$ To test whether the codes in one field differ from those in another, type: var_name1(start1, end1)uvar_name2(start2, end2) The contents of data fields must be enclosed in dollar signs with each code in the string referring to a separate column in the field. For instance, to check whether columns 47 to 50 contain the codes , 6, 4 and 9 respectively we would type: c(47,50)=$649$ The only data for which this expression is true is:
+----5-----+ -649
+----5-----+ -529 164& the expression would be false because all columns are multicoded. All our examples have used columns, but the same rules apply to data variables that you define yourself. For example: rating(1,4)=$1234$ checks whether the field rating1 to rating4 contains the codes 1, 2, 3 and 4 in that order That is, it checks whether rating1 contains a 1, whether rating2 contains a 2, and so on. When checking the contents of fields in this way, make sure that you enter as many columns as there are codes in the string (i.e. five codes require five columns). The exception to this rule occurs when you are checking for blanks when the expression may be shortened to: c(50,80)=$ $ This type of statement may also be used to compare two fields, to check whether the second field contains exactly the same codes as the first field. When you compare one field with another, Quantum takes each column in the first field in turn and looks to see whether the corresponding
column in the second field contains exactly the same codes. For example, if the first column of the first field contains a code 1 and a code 2 and nothing else, then Quantum will check whether the first column of the second field also contains a code 1 and a code 2 and nothing else. If all columns of the second field are identical to their counterparts in the first field, then the expression is true; otherwise it is false. Here is an example: c(129,132)=c(356,359) For this expression to be true, column 129 must contain exactly the same codes as column 356, column 130 must be exactly the same as column 357, and so on. Once gain the two expressions on either side of the equals sign must be the same length Comparisons of one data variable against another are concerned with columns and codes: they are not concerned with the arithmetic values of the codes in the fields as a whole.
If we have:
----+----3----+---02 2
the expression: c(24,25)=c(34,35) is false because the string $02$ is not the same as the string $2$. If you want to compare fields arithmetically (i.e., is 02 the same as 2) then you will need to use the eq. operator: c(24,25).eq.c(34,35) to test whether the value in c(34,35) was equal to the value in c(24,25). The .eq. operator is described in the section entitled "Comparing values" To check whether the codes in one field match a given string or the codes in another field, we can use the = (equals) operator: c(m,n)=$codes$ cm=cn c(m,n)=c(m1,n1) If codes in the field c(m,n) match the given string or the codes in c(m1,n1) then the expression is true. If the two fields are not identical, then the expression is false Lets look at an example of the unequals operator. The statement: c(67,69)u$123$ is true at all times unless our data reads:
The expression:
c(67,69)uc(77,79) is true as long as columns 67 to 69 differ by at least one code from columns 77 to 79. If our data is:
the expression is true because each of columns 77 to 79 differ from columns 67 to 69 Also, if we have:
+----7----+----8 123 123 5 the expression is true because column 77 is multicoded 15. The only time the expression is false is when columns 67 to 69 are identical to columns 77 to 79.
A variation of range is rangeb which allows columns to the left of the field to be blank if the number is right-justified in the field. In all other respects it is exactly the same as range. If our data is:
----+----2 123 6 the expression: rangeb(17,18,1,10) will be true because the string $ 6$ will be read as 6. With range the value would be false. However, the expression: rangeb(15,18,2000,3000) returns false because of the blank in c17.
For this expression to be true, columns 249 to 251 must contain nothing but a 1, 5 and 9 respectively or the number of codes in columns 132 to 135 must be greater than 4. It is also true if both expressions are true. However, if both are false, the overall result is false. Expressions are reversed (negated) simply by preceding them with the keyword .not. Although it is not wrong to use it with a single variable, it is more generally used to reverse an expression containing the keywords .and. and .or.. Thus, it is not wrong to write .not.c151/5 but it is much simpler to write this as c15n1/5.
Example: The .and. operator requires that all the expressions preceding and following the .and. be true for the whole expression to be true. Thus, the statement: int1.eq.9 .and. c1161 is true if the integer variable int1 has a value of 9 and column 116 contains a 1. If either subexpression is false, the whole expression is false too. By comparison, the .or. operator requires that one expression or the other, or both, be true in order for the whole expression to be true. c(249,251)=$159$ .or. numb(c132,c135) .gt. 4 For this expression to be true, columns 249 to 251 must contain nothing but a 1, 5 and 9 respectively or the number of codes in columns 132 to 135 must be greater than 4. It is also true if both expressions are true. However, if both are false, the overall result is false. Expressions are reversed (negated) simply by preceding them with the keyword .not. Although it is not wrong to use it with a single variable, it is more generally used to reverse an expression containing the keywords .and. and .or.. Thus, it is not wrong to write .not.c151/5 but it is much simpler to write this as c15n1/5. Take care when using .not. with the .eq. operator. Statements of the form: .not. c(1,3) .eq. 100 are incorrect and will not work. They should be written as either: (not.(c(1,3).eq.100)) with the expression to be reversed enclosed in parentheses, or: (c(1,3).ne.100) Any of the operators .and., .or, and .not. may appear in a statement more than once, as long as you use parentheses to define the order of evaluation. For example:
(c151/47 .or. c163579) .and. c22& causes Quantum to check whether the .or. condition is true before dealing with the .and Suppose our data is:
----+----2----+ 13 79
The first expression (c151/47) is true because column 15 contains a 1 and a 7 and the second expression (c163579) is also true since the codes it contains are amongst those listed as acceptable. Thus, the .or. condition is true. Column 22 contains an ampersand so the last expression is also true, therefore the expression as a whole is true regardless If both expressions in the parentheses were false, the whole expression would be false not. with .and. and .or. When you use .not. with expressions in parentheses, be very careful that what you write is what you mean. Lets take the conditions male and married and forget about columns and codes for the minute. The condition: (Male .and. Married) refers only to married men. The opposite of this is: .not. (Male .and. Married) which refers to unmarried men and all women. This can also be written as: not.Male or.not.Married The first .not. collects all the women, the second collects everyone who is not married (e.g. single, widowed etc), and together they collect people who are female and unmarried. We use .or. instead of .and. here because the latter will gather unmarried women but will ignore the unmarried men and married women. Reversing .or. expressions works in exactly the same way. The expression: (Male .or. Married) means anyone who is Male, or anyone who is Married, or anyone who is Male and Married. The opposite of this is: .not. (Male .or. Married) which means anyone who is not Male or is not Married or is not both; that is, anyone who is a woman and is unmarried. This can be written as: .not. Male .and. .not. Married
&
Positive
(A .and. B) (A .or. B)
Negative
.not. (A .and. B) .not. (A .or. B)
Is the Same as
.not. A .or. .not. B .not. A .and. .not. B
Here is an example using columns and codes: .not. (c(135,137)=$519$ .or. c1606/0) If our data is:
variable-name .in. (list) or arithmetic-exp .in. (list) where variable-name is that of the variable to be checked and list is a list of permissible values. The arithmetic expression is an expression consisting of data or integer variables, arithmetic operators and integer values as described earlier in this chapter. If the variable or arithmetic expression has one of the listed values, the expression is true, if not, it is false. The left-hand side of the expression may contain integer variables, columns or data variables containing whole numbers, or expressions using these types of variables. If it is a data variable, then the list may contain codes enclosed in dollar signs. Quantum will then compare the codes in the data variable with the codes inside the dollar signs. We could therefore check that the frozen vegetables have been coded correctly by keying in a statement which says:
Quantum will flag any records in which c(145,147) does not contains exactly 205, 206, 207, 210, 215 or 220 (i.e. three single-coded columns) as incorrect. If the data variable contains a valid positive or negative whole number, then the list may also contain such values. Ranges of values may be entered in the form min:max, where min is the lowest acceptable value and max is the highest. Since the frozen vegetables have numeric codes, we could write the expression as:
c(145,147) .in. (205:207,210,215,220) Any columns in the field which contain non-numeric data (e.g. multicodes) will be flagged as incorrect, as will any which contain values which do not match the specification Sometimes, though, the codes and numbers will not be interchangeable. If you have 2- digit codes in a 3column field, the statement:
c(206,209) .in. (10:13) unless column 206 is always blank. If the 2-digit codes have been padded on the left with zeroes instead of blanks (i.e., 010, 011) or if they all start in column 206 (i.e., $10 $, $11 $), then the first expression will be false, even though the second one will still be true. If the left-hand side of the expression is an integer variable or an arithmetic expression, the list may contain positive or negative whole numbers: total .in. (100,200,500:1000) Lists may contain up to 247 values or codes, which may be entered in any order. In our examples, we have always entered them in ascending order, but this is not a requirement of Quantum. You may enter codes in a list in any order you like. The exception is numeric ranges which must be entered in the form lowest:highest
Naming lists To assign a name to a list of values, type: definelist name=(list) where list is a comma-separated list of numbers, ranges or code strings enclosed in dollar signs. If you have a list that is used more than once you may give it a name and refer to it by that name instead of typing in the complete list each time. To name a list, write: definelist name=(list) For example: definelist fveg=(205:207,210,215,220) To use a defined list, simply replace the list with the name: c(145,147) .in. fveg
Speeding up large programs To speed up your Quantum program by converting expressions of the form c(1,4)=$1234$ into C in a more efficient way, type: inline n where n is the maximum field width to be converted in this manner. This statement must appear at the start of the edit. If you have a large edit, you can speed up the time it takes to run by including the inline statement in your edit. This instructs the Quantum compiler to convert expressions of the form c(1,4)=$1234$ into statements in the C programming language in a different way to the way it
normally does. You need not worry about these different methods of conversion, apart from deciding whether or not to use them. If you want to speed your program up, place a statement of the form: inline n at the beginning of the edit section, where n is the maximum field width to be converted in the special way. For example: inline 6 Here we are saying that fields of six columns or less should be converted in the special way rather than in the normal way.
How Quantum reads data In order for the answered questionnaire to be processed, the information contained on the questionnaire must be read into the computer into a location where Quantum can access it. This is done by reading the data into the data variable array called C which is supplied automatically with every Quantum run. You may then access this data by addressing this array. Different types of records are read into the C Array in different ways. Types of record Quantum deals with three types of record: ordinary, multicard and multicard with trailer cards. Ordinary records These are strings of codes and numbers, one per respondent, up to a maximum of 32,767 characters per respondent. Multicard records When data originates from punched cards and each questionnaire requires more than 80 columns, the data is spread over several cards. So that all cards belonging to a particular respondent may be easily identified, each questionnaire is assigned a serial number which is entered as part of the data for each card. Within this, each card has a unique card type or card number to distinguish it from others in the group. It is important that both the serial number and card type be in the same relative positions on all cards in the file, since this is the only way that Quantum can tell which data belongs to which respondent. If the questionnaire serial number is in columns 1 to 4 of each card and the card type is in column 5, and we are looking at questionnaire 1005, we will see that it has two cards whose first five columns are 10051 and 10052 respectively. Quantum can deal with records that contain up to 327 cards per respondent. occasionally you may have multicard records in which each card is greater than 80 columns. The notes that follow refer to multicard records of up to 100 columns per card.
Multicard records with Trailer Cards Sometimes a record contains very repetitive data which is tabulated over and over again in the same way. For instance, a shopping survey may ask the respondent a series of identical questions for each store he visited. In this case, there may be a separate card for each store. Processing this type of data is often easier if we treat all cards containing the same questions as if they were, in fact, one card with one card number. These cards are called Trailer Cards Thus, if the respondent visited five stores, and the questions about these stores are coded on a card 2, the record for that respondent would contain five cards of type 2. If demographic details were stored on a card 1, the whole record would be 6 cards in all. In Quantum, the demographic data would be described as the higher level and the stores as the lower level.
Ordinary records
Ordinary records are read into cell 1 onwards of the array. Therefore, for example, the 50th column is referenced as c50 and the 200th cell as c200.
Multicard records
Records are read into c101 to c200 for card 1, c201 to c300 for card 2, and so on. For example, 80-column cards are read into c101 to c180 for card 1 and c201 to c280 for card 2. Columns 181200, 281-300, etc remain blank. In this case, the C Array may be pictured as ten rows of 100 cells each. Column 50 of card 1 is then accessed by referring to it as c150, and column 67 of card 8 is referred to as c867.
Processing the data Each time an ordinary record or set of cards comprising a multicard record is read in, hat data is processed first by the edit section and then by the tabulation section of your program. The complete record is edited and tabulated in one go. The exception to this is the trailer card record where processing can take place a number of times within each record for each lower level.
To ensure that only the part of the edit section applying to a particular level is used, the edit section is defined separately for each level. Similarly, the table instructions specify the level at which the table should be incremented.
priority, all of which may be used to alter the contents of a variable. Emit, delete and priority
are used only with columns whereas assignment statements can deal with character, integer and real variables. When we say that these statements change the contents of a column we mean that they change the contents of that column as it exists during the run: at no time do they change the corresponding column in the data file. Trailer Cards By using the Levels facility, the user need not know how Quantum deals with trailer card data internally. However, there are occasions when it may be necessary to edit or tabulate the data without using levels. To do this, it is necessary to know more about how trailer cards are processed. Quantum deals with trailer cards in a number of reads. Cards are read into the appropriate rows of the C Array until: a) a card is located with a card type matching that of the previous card (e.g., two consecutive card 2s), or b) a card is read with a type lower than its predecessor and matching one of the card types already read in during the current read (e.g., a card 2, a card 3, and then another card 2).
In order to produce useful tables, you will need to know which cards are currently in the C Array.z` Quantum has four reserved variables thisread, allread, firstread and lastread which it uses to keep track of which cards it has read for each respondent.
thisread
The array called thisread is used to check which cards have been read in during the current read. thisread1 will be true (or 1) if a card type 1 has just been read in; thisread2 will be true if a card 2 has just been read, and so on.
There are nine such variables (thisread1 to thisread9) available unless extra card types have been specified using the max= option In this case, these variables will be numbered 1 to max; if there are 13 cards, we will have thisread1 to thisread13.
Allread
allread notes which cards have been read in so far for this questionnaire. If cards 1, 2 and 3 have been read so far, allread1, allread2 and allread3 will all be true. Additionally, each cell of allread will contain the number of cards of the given type read in for instance, if two cards of type 3 have been read, allread3 will be true and it will contain the number 2.
As with thisread, there are nine allread variables available unless extra card types have been specified with max=.
Reserved variables
Other reserved variables associated with reading in data: lastrec set to true when the last record in the file has been read or, in the case of trailer cards, the last read of the last record has occurred. rec_count stores the number of records read in so far. card_count counts the number of cards read so far.
Describing the data structure for Multicard records To describe the structure of the data, type: struct; options All programs dealing with multicard records must contain a struct statement unless the data contains trailer cards which will be read and tabulated using the levels facility. In this case you may choose between using a struct statement or using a levels file. If the run has no struct statement and no levels file, Quantum assumes that the data contains ordinary records to be read into c1 onwards of the C array. The struct statement is used to define the type of records, the location of the serial number and card type in the record and the number of the highest card type if greater than 9. Its format is: struct;options
Record type
To define the record type, type:
struct; read=n where n is 0 for ordinary records, 2 to read multicard records in sections according to the card type, or 3 to read multicard records in all in one go.
Quantum recognizes two types of record: single card and multicard. The type of record is defined by the keyword read= on the struct statement:
Ordinary Records Ordinary records are defined using read=0. Each record is read into c1 onwards of the array. Since it is the default, you need only use it when other options are required; for example, when the records contain serial numbers and you wish to have the serial number printed out as part of the record, or when you are working with long records of more than 100 columns.
Multicard Records Multicard records are identified by the keyword read=2. Each card in the record is read into the row corresponding to the card type of that card that is, card 1 in c(101,200), card 2 in c(201,300), and so on. We mentioned briefly that it is possible to read all cards in a multicard record in at once and ignore the card type. The first card goes in c(101,200), the second in c(201,300), and so on. This is achieved with read=3.
Record length
To define the record length of records greater than 100 columns, type: struct; reclen=n The keyword reclen=n defines the maximum number of characters to be read into the C rray, the number of cells to be reset to blanks and the number of cells to be written out by the write statement. With ordinary records reclen may take any value, but with multicard records the maximum is reclen=1000. In both cases, the default is reclen=100. When data is being read into the matrix, any record which is longer than reclen characters is truncated to that length and a warning message is printed.
When ordinary records are written out with write or split, cells c1 to c(reclen) are copied, with any trailing blanks being ignored. For instance, if we have: struct;read=0;reclen=200 and the current record is only 157 characters long, the record written out will be 157 characters long. This length can be overridden by an option on a filedef statement. When multicard records are written out, columns c101 to c(100+reclen), c201 to c(200+reclen), and so on will be output. Thus, if we write: struct;read=2;reclen=70 and we have 2 cards per record, Quantum will write out c(101,170) and c(201,270). Finally, with ordinary records cells c1 to c(reclen) are reset to blanks between records, but with multicard records cells c101 to c(100+reclen), c210 to c(200+reclen), and so on are reset.
more than one digit. Once again, m and n are column numbers only, not card type and column number.
For example: struct;read=2;ser=c(1,4);crd=c5 tells us that we have a multicard record with serial numbers in columns 1 to 4 and the card type in column 5 of each card. Each card will be read into the row corresponding to its card number.
Sometimes some cards will be optional and others mandatory. You may define those cards which must appear in every record by using the keyword req= followed by the numbers of the cards that each respondent must have. For example: req=1,2 tells us that cards 1 and 2 must be present in each record for that record to be accepted. Any other cards are optional. If a record is read without one of these cards, the error message Card Missing in Set and a note of the records position in the file are printed and the record is ignored. If you have ranges for required card types, you may type the numbers of the lowest and highest cards separated by a slash (/) or a colon (:) rather than listing each card type separately. For example, if cards 1 to 4 are all required, you may type: req=1,2,3,4 or req=1/4 or req=1:4
facility is not used, you must list their card types with the keyword rep=. For instance, if card 2 is a trailer card we would write
rep=2. Where there is more than one trailer card, each card type is listed separated by a
comma. If cards 2, 3 and 4 are all trailer cards we could write: rep=2,3,4
If you have ranges for required card types, you may type the numbers of the lowest and highest cards separated by a slash (/) or a colon (:) rather than listing each card type separately.
For example, if cards 2 to 4 are all required, you may type: rep=2,3,4 or rep=2/4 or rep=2:4 If rep= is not used and a record is read with two or more cards of the same type, the last card of that type will be accepted and the message Identical duplicate or Non-identical duplicate and a note of the records position in the file will be printed. For example: Record structure error: serial 026, card 234 in run, card 234 in dfile card type 2 non-identical duplicate Because rep= refers to trailer cards only, it will be ignored if read=2 and crd= are not both present on the struct statement.
The only time you need to inform Quantum of the highest card type is when you have records with more than nine cards. This is so that Quantum can allocate sufficient cells
in the C array to store the extra cards. The highest card type is defined with max=n, where n is the number of the highest card type. Cells 1 to max*reclen are then cleared between respondents. For example, to read a data set with 11 cards per respondent we might write:
struct;read=2;ser=c(1,4);crd=c5;req=1,2,3,4;max=11
If you forget max=, and a record is read with more than nine cards, the message Too many cards per record is printed and the record is rejected. On the other hand, if a card is read with a card type higher than that defined with max=, the record is rejected with the message Card number out of range.
Data from card A would be read into cells 1001 to 1100 of the C array.
When trailer card data is merged during a run with the merge facility, you may wish trailer cards to be merged in a specific order, according to a sequence number entered as part of the data. The location of this sequence number can be defined with the keyword
seq=cn for a single column code or seq=c(m,n) for a multicolumn code. For more
information on merging data see the next section.
appropriate manufacturers code from the external file into the main data in the C array. In this case, merging is based on finding matching keys in the main record and the records in the external file.
1 merge on serial number. Cards are read in from each data file according to their serial number only the card type and sequence number, if any, are ignored. You might use this option when you have two files, dat01 containing cards of type 1 and dat02 containing cards of type 2, and you want the files to be merged so that card type 1 is read into the C-Array, followed by card type 2.
3 merge on serial number and card type (default). With this option, cards with the same serial number read from different data files are merged to form a single record by comparing the serial number and card type. Cards within a record are then sorted sequentially from 1 so that each card is read into the appropriate cells of the C-Array. For example, if dat01 contains cards 1 and 3, and dat02 contains cards of type 2, the merge will produce records containing cards 1, 2 and 3 in that order.
5 merge on serial number, card type and sequence number. This is similar to merge type 3, except that trailer cards are merged according to their sequence number. For example, if dat01 contains cards 1 and 2, where card 2 is a trailer card with a sequence number of 2, and dat02 contains cards 2 and 3, where card 2 is a trailer cards with a sequence number of 1, the merged record will contain cards 1, 2/1, 2/2, and 3, in that order.
This is the first item in the merges file, and is followed by the names of the files to be merged with the main data file named in the Quantum command line. Items may be entered on separate lines or all on the same line separated by semicolons. For example, if we want to merge data in files dat02 and dat03 with data in the main file, dat01, by serial number, card type and sequence number, the merges file would look like this: 5; dat02; dat03 Notice that we have not mentioned dat01 in the merges file because it will be named on the Quantum command line instead.
ex_file is the name of the file containing the extra data. key_field is the location of the key in the main data file, entered using the standard Quantum notation for columns and fields key_start is the start column of the key in the external data file. copy_to is the field in the main data record in which to place the external data. The field is defined using the standard Quantum notation for columns and fields. data_start is the start column of the data to be copied. This statement returns in int_var_name a 1 if a match was found or 0 if not.
The mergedata statement merges a field of data from an external file with the main data at the datapass stage of the Quantum run. Merging is by means of a data key present in both the main records and the records in the external file. If a record in the external file has a key which matches that of a record in the main data file, the external data will be merged into a user-defined field of the main record when it is read into the C array.
In order for data to be merged correctly, both the main data file and the external file must be sorted in ascending order by key value. If the key is the record serial number then the data file will already be sorted in the correct order (assuming, of course, that the data is sorted by serial number). If you are using a key that is not the record serial number you must sort the data file so that it is ordered by key rather than by serial number.
signs. key_field is the location of the key in the main data file, entered using the standard Quantum notation for columns and fields.
key_start is the start column of the key in the external data file, for example, 1 if the key starts in column 1. The length of the key is taken from the length of
key_field.
copy_to is the field in the main data record in which to place the external data. The field is defined using the standard Quantum notation for columns and fields. data_start is the start column of the data to be copied. Quantum copies as many columns as are defined by copy_to. For example: t1 = mergedata($manuf_codes$,c(178,180),15,c(168,175),1) tells Quantum to compare the key in columns 178 to 180 of the main record with the key which starts in column 15 of the external records in the file manuf_codes. Because the key field in the main record is 3 columns long, Quantum reads columns 15 to 17 of each external record to obtain its key. If the keys match, Quantum copies the data from the external record into columns 168 to 175 of the main record in the C array. The external data to be copied starts in column 1 and, since the destination field is 8 columns long, Quantum copies 8 columns starting at that column. This statement returns a value of 1 if a match was found (i.e., merging took place), or 0 if not. There is no limit on the number of mergedata statements in a specification, but you may only merge data from up to nine different files per record.
There are three ways of writing out your data once it has been read into the C-Array. You
may: a) create a new data file b) copy records to a print file c) write information to a report file
Data and print files are both accessed by the write statement, but the exact format of the statement varies according to the type of file and the information being written. Report files are written to with the report statement.
Print files
Print files are printouts of records or parts of records with headings, descriptive texts and page numbers. They cannot be used as data for subsequent Quantum runs.
The word write by itself prints out a whole record in the form it is when the write statement is executed, together with a ruler showing which codes fall in which columns, the line number of the record in the data file and the message write indicating that the record was generated by a write statement. Any multicodes in the record are shown as asterisks, but you may change this with an option on the filedef statement.
If the record contains more than one card, each card is listed separately beneath the ruler. For example, the statement: write
Quantum edit report 1 in file ----+----1----+----2-- ... --9----+----0 column 1 - 100 are |12345 write
2 in file
Each write statement will produce a line in the default print file, out2, telling you how many records were written out, as follows: 2 (1%) write
The example above was very simple; more often than not your program will contain several write statements and you will want some way of identifying which records were printed by which statement and why. If the write is dependent upon some other statement for instance, it is part of an if statement the whole statement is printed underneath each record, thus: Here, as you can see, we are checking that column 14 contains a 1/4. This record has been printed out because it contains a 5 instead.
67 in file
----+----1----+----2-- ... --9----+----0 column 1 - 100 are |0015263-16*735 *837361 ... 79& if (c14n1/4) write
Here, as you can see, we are checking that column 14 contains a 1/4. This record has been printed out because it contains a 5 instead. Sometimes it is more helpful to have an explanatory text printed instead of the statement itself. In this case all that is necessary is to follow the word write with the text to be printed enclosed in dollar signs:
Record 17
51 in file
----+----1----+----2-- ... --9----+----0 column 101 - 200 are |00170116548986131*46*1 ... column 201 - 300 are |0017026464515 875 ** ... column 301 - 400 are |0017031929-5897231 ... C308 incorrect too many choices
Record 32
94 in file
----+----1----+----2-- ... --9----+----0 column 101 - 200 are |003201837021 **53798 ...
column 201 - 300 are |0032021353452 763736 ... column 301 - 400 are |003203212 & ... too many choices
Our first statement writes out all records in which column 308 does not contain any of the codes 1/5, and the second picks up all records having more than 3 codes in columns 117 to 119. Normally all output from write goes to the default print file, and whenever the current record is written to this file, the variable printed_ becomes true. You may change the output file by following the word write with the name of the file to write to. For example: write pfile $First Print$ writes to the file pfile, whereas; write errors $Second Print$ writes to a file called errors. All files named on write statements must be defined on a filedef statement before they are used.
If two or more write statements apply to a single record, the record is printed out once in the state it was when the first applicable write was read, with all relevant write statements or texts listed below it. If a record satisfies two or more write statements which write to different files, Quantum will write the record out once for each statement, in the state it is when each write is executed.
if (c1102.and.c1192) write c(110,120) $Married woman$ checks that columns 110 and 119 both contain a 2, and if so prints out columns 110 to 120 in the print file, followed by the text Married woman. If you are writing out less than ten columns, Quantum does not print a ruler above the codes. If you are dealing with multi-card records, you may prefer to use this form of write to have only the card containing the error printed, rather than all cards in the record. If we take our previous example where we were checking the contents of column 308:
prints only card 3. To write selected parts of a record to a particular file the notation is: write filename c(m,n) [$text$]
Data files
To write records or fields to a data file, type: write file_name [c(start_col, end_col)]
write may also be used to copy records to a data file. This is useful if you want to separate
a particular card type from the rest of the data, or if you want to correct errors and save the corrected data in a new file for later tabulation.
to write the whole record to the named file, or write filename c(m,n) to write columns m to n only.
/* Copy the data into the new card c(310,341)=c(148,179) /* Delete it from its original place c(148,179)=$ $ /* Give it a serial number and card type c(301,304)=c(101,104); c3803 /* Set thisread true for card 3 thisread3=1 /* Define pfil as a data file filedef pfil data /* Copy cards 1, 2 and 3 to pfil write pfil
where variable_names is a comma-separated list of the variables and texts to print. Use reportn rather than just report to start a new line each time the statement is executed.
A report file is a special type of print file in which you can print out records, fields or variables in the format of your choice. To write information in a report file, use the report statement, as follows: report filename parameters where filename is the name of the file to be written to, and parameters define exactly what is to be written.
Lines in a report may be up to 1024 characters long. Report does not start a new line automatically at the end of each write, but you may tell it to do so by following the keyword report with the letter n: reportn filename parameters In both cases, the named file must be identified as a report file using a filedef statement, as mentioned below. The parameter list defines what is to be printed in the report file. It may contain variables, texts, and special characters representing tabs and spaces.
Assignment statements
to copy codes from one column into another. to replace certain codes in one column with those from a second column. to assign the value of an arithmetic expression to a variable. to copy codes from groups of columns into another column using the logical operators and, or and xor.
In spite of the diversity of these functions the basic format of any assignment statement is:
variable=item
where item defines what is to be copied into the variable. Remember that comments can be identified by a capital C in column 1. If the first variable in your statement starts with a C, make sure that you type it in lower case otherwise the whole line will be read as a comment and ignored. For example: col 1 c(15,16)=$12$ is correct, but C(15,16)=$12$ will be read as a comment even though the syntax is correct Alternatively, you may precede assignment statements with the word set, thus: set c(15,16)=$12$
Copying codes
To copy codes into a single data variable, overwriting the variables original contents, type:
variable=codes
To copy a string of codes into a field, type:
var_name(start,end)=$codes$
To copy the contents of one variable or field into another, type:
variable1 = variable2
Assignment statements are most commonly used to copy codes into a column or to copy the contents of one variable into another. For instance: c121=159 c121=c134 You can also copy strings of characters into fields of columns. Lets say we want to copy the code 59642 into columns 76 to 80 of card 3; we would write: c(376,380)=$59642$
To replace a code or set of codes in one data variable with a code or set of codes in a second data variable, type:
variable1codes1=variable2codes2 codes1 and codes2 must contain the same number of codes, and the codes must be in
superimposable order
variable = expression
To copy a real value into a data variable, type:
10.22
The final type of assignment is copying codes from a set of columns. The codes copied
depend upon the type of operator used: and Copy codes present in all columns or Copy codes present in one or more columns xor Copy codes present in one column only
Emit inserts codes into a column leaving the original contents intact. Its format is:
emit cnp
More than one column may be entered on each line, provided that each one is separated by a comma. emit c5677, c1102
emit can only be used with single columns; string variables are not valid: emit
c(100,110)$99$ does not work.
The delete statement is the opposite of emit in that it deletes codes from a column leaving the remainder intact. Its format is: delete cnp
More than one deletion may be effected with the same delete statement as long as each column is separated by a comma. delete c1105, c17956
The statement used for this is: priority cncode1, code2 ,code3,[cn2code1a, code2a ,code3a, ... ] where cn is the column whose codes are to be checked and p1 to pn are the positions to check, entered in order of priority, the most important first.
priority checks only the listed positions; if any other codes are present they are
ignored.
the statement: priority c2495, 4, 3, 2, 1 causes Quantum to scan column 249 to see first whether it contains a 5 and, if so, to delete all subsequent codes in the list. If c249 contains a 5 and nothing else, obviously there will be no extra codes to delete; this does not matter. If there is no 5 in c249, Quantum then checks whether it contains a 4; if so, any other codes in the range 1/3 are deleted, otherwise the program skips to the next code in the list and checks for that. If none of the listed codes are found, the column remains unchanged.
data_var_name=rpunch(codes)
To choose a random code from the codes present in a column, type:
data_var_name=rpunch(col_number)
For example: c115 = rpunch(1/5) will place one of the codes 1 through 5 in column 115.
Alternatively, you may use rpunch with another C-variable, thus: c115 = rpunch(c120) Once this statement has been executed, column 115 will contain one of the codes present in column 120.
column_specs are references to the fields containing the numeric codes. code is a
non-numeric code present in those fields and cell_number is the cell of the array which should be incremented whenever that code is encountered. Cells in the array are reset to zero at the start of each new record. To prevent this happening, enter the statement name as fieldadd rather than field. The rest of the statement is as shown.
The format of the field statement is: field output_array = column_specs [,special_specs] output_array is the name of the array in which you wish to store the counts of responses. You can use spare columns in the C array, but you may find your program is easier to read if you define an integer array of your own with a name which reflects the type of information it contains. For example, if you want an integer array called films, you might write: int films 5s ed field films = .....
When you define the integer array, make sure that you request as many cells as there are codes in the data. In this example there are five films so you define the array as having five cells. Quantum automatically creates an extra cell (cell 0) which it uses to count responses for which there is no cell allocated. If there were six films, for example, Quantum would increment cell 0 each time it found code 06 in the films columns. You might like to check the value of this cell as a means of reporting on invalid codes: if (films0 .gt. 0) write c(1,20) $Bad film code$ Negative and zero values also cause cell zero to be incremented. Codes which are shorter than the field width are accepted as long as they are padded with blanks or zeroes. The input_specs part of the statement defines the columns to read. You have a number of
choices here. First, you may list each column or field reference one after the other, separated by commas. The list must be enclosed in parentheses. In our example this would be: field films = (c(12,13), c(14,15), c(16,17))
Second, if you have sequential fields as you do here, you can type the start columns of each field followed by the field length. The list of start columns is separated by commas and enclosed in parentheses, and the field length comes after the closing parenthesis and starts with a colon. If you use this notation for the film example you would write:
If you wish, you can abbreviate this further by typing just the start columns of the first and last fields, followed by the field length.
Third, if the fields are not sequential, you list the start columns and field width of each group of columns (as shown above) and separate each group with a slash. For example, to read data from columns 12 to 17 and 52 to 57, with each field being two columns wide, you would type:
This reads c(12,13), c(14,15), c(16,17), c(52,53), c(54,55) and c(56,57). You can also use this notation for single non-sequential fields. For example: field films = c23 / c36 / c71 :2 means c(23,24), c(36,37) and c(71,72).
The special_specs part of the statement is optional. You use it when a field contains non-numeric codes such as $&&$ for None of these films. If you want to count codings of this type, you must remember to allocate cells in the array for each code or group of codes you wish to count. You then include the notation:
code = cell_number
to count those codes. For example:
int films 6s ed field films = (c12, c14, ch16) :2, $&&$=6 If you want to count more than one non-numeric code, list each one individually, separated by commas.
Quantum normally resets the cells of the integer array to zero at the start of each record. If you want counts to continue from one record to another, use a fieldadd statement instead of field. For example: fieldadd films = (c12, c14, c16) :2
Clearing variables
To remove values from variables, type: clear var_name1, var_name2, var_name3
Changing the contents of a variable Chapter 8 / 103 Variables of any type may be cleared using a clear statement: clear var1, var2, .... varn where var1 to varn are any valid Quantum variable or range of variables. For example:
clear c(109,180), t(1,200), myarray(29,33), myint, myreal Data variables are reset to blank, integer variables are reset to 0 and real variables are reset to 0.0. Variables can also be cleared using assignment statements (e.g., t1=0), but there are advantages to using clear instead. Firstly, clear is much easier to write. Secondly, with clear the compiler checks that the subscripts are in the correct range (e.g., 1 to 33 if myarray has only 33 cells); this is not possible with the loop method because the subscript is a variable. However, if you use variables as subscripts with clear (e.g., clear c(t1,t1+5) subscript checking once again cannot be done.
Flow control
Statements in the edit section are usually dealt with in the order in which they occur in the program. Quantum provides statements which may be used to alter this normal order of execution, for example, by missing out a statement or repeating a group of statements a number of times.
Statements of condition
1) Ed -Defines start of edit section of a quantum run. The statement is essential if a Quantum run contain an edit section 2) End -Defines the end of the edit section. This statement is a must if the run contains An edit section.
2) Else -To define statements to be executed if a given condition does not exist, For example:
if (c1151); else; emit c1402 3) go to - Ensures Quantum program will include statements which refer to certain respondents only; For example: The statement: if (c121n1) go to 50 causes Quantum to go immediately to the statement labeled 50 if column 121 does not contain a 1 Any statements between this if statement and statement 50 are ignored whenever a record is read where c121n1 is true.The statement labeled 50 may be any Quantum statement, but many people just write: 50 continue
4) continue- This statement is a dummy statement whose sole purpose is to join various bits of a program together. It is often used with a statement label as a destination for routing with go to, or to identify the end of a loop. 5) Loops- Are used to define repetitive statements. Loops are extremely important structures because they enable the same set of basic statements to be executed over and over again on a changing series of numbers, columns or codes. Their use can reduce the work involved in checking data. The statement which introduces a loop is do which is formatted as follows:
if (c738) reject if (c801) t5=t5+1 end to reject records in which column 73 contains an 8 from the tabulations but not from the rest of the edit. Therefore, even if c738, the record is still checked for a 1 in column 80 and if one is found, t5 is incremented. 7) Return - To send the record to the tabulation section, The word return in Quantum bears no relation to the same word in English. It does not mean go back to the start of the edit or anything like that, rather it means terminate the edit immediately and jump to the tabulation section. Once the record is tabulated Quantum reads in another record as usual. If there is no tabulation section, the next record is read in straight away.
Return is very often used with reject to reject a record without finishing the edit. For example:
if (c738) reject; return if (c801) t5=t5+1 end Here any records in which c738 are rejected from the tables, but, because reject is followed by return which sends records to the tabulation section, editing is terminated immediately. Thus, only records in which c73n8 will be tested for a 1 in column 80.
8) Stop -To stop editing records and start tabulating records read so far Stop tells Quantum to stop the run and print tables once editing has been completed on the current record. For example, we may want test tables for first 100 people,so we set up a counter and terminate the run when it reaches 100: The statement: if (rec_count.eq.100) stop will stop editing records and start tabulating records read so far
9) Process - To send a record temporarily to the tab section Process is an edit statement which is
similar to return but must not be confused with it. When return is executed, the record is sent on to the tabulation section; after the tables are completed for that record, the program returns to the start of the edit section and the next record is read in.
When process is executed, the record is also sent immediately to the tabulation section where it is used in table creation. However, after the record has been tabulated, control is passed back to the edit section to the statement immediately following the word process. The record continues through the edit and any statements after process applicable to the record are executed. At the end of the edit the record is passed through the tabulation section again. 10) Split - To write correct records out to a clean data file and incorrect records out to a dirty data file Clean and dirty data files are the terms used to refer to files of correct and incorrect or rejected records created automatically by the edit statement split.
Creating a holecount
To create a holecount, type: count c(start_col, end_col) [$text$] where text is the holecount title. To create a holecount you will use the count statement: count c(start_col,end_col) [$text$] where text is the heading to be printed at the top of each page. This is optional; if it is omitted the holecount will simply be headed Holecount. Our example was created by the statement: count c(1,16) $Demonstration Holecount$
Frequency distributions
A frequency distribution enables you to inspect the contents of a field of columns containing alphabetic or numeric data. For example, in a shopping survey the price the respondent paid for a bottle of mineral water may be stored in columns 112 to 114. A frequency distribution will tell you how many respondents bought mineral water at particular price. This is very useful for
determining how the values in these fields should be grouped for tabulation, as well as for rough estimates of medians. To create a frequency distribution sorted in alphabetic and rank orders, type: list c(start_col, end_col) [$text$] where text is the heading to be printed. To produce a frequency distribution sorted in alphabetic order only, type lista instead of list. For a distribution sorted in rank order only, type listr instead of list. Here are some examples: listr c(107,108) $Contents of cols 7 and 8$ lista c(100,104) $First Set of Car Brands$ The first example produces a frequency distribution of the contents of c(107,108) sorted in numeric order; the second example generates a list of car brands which will be sorted in alphabetic order.
Data validation
In earlier section we discussed ways of examining the data for a set of records (with count) or for an individual record (with write). In general, however, we want to check the validity of the data for individual records by putting in the edit a set of testing sentences which will tell us not only whether a record contains an error but also what that error is. There are two types of checking sentence. The first involves checking whether a column contains the correct type of coding (single-coding/ multi coding) and whether the codes in that column are valid. Take the question on a respondents sex which may be Male, coded c1061, or Female, coded c1062. c106 must be single-coded since no person can have two sexes, and the only codes which may appear in that column are 1 and 2.Any record in which c106 is not single-coded with a 1 or a 2 will be flagged as incorrect. The second type of checking involves making sure that columns whose contents depend on the contents of other columns contain the correct codes. For instance, suppose the questionnaire asks whether the respondent has ever used a particular brand of washing up liquid. The answer is coded into c125 as 1 for Yes and a 2 for No. If the answer is Yes, the next questions concerning price and quality are asked. If c1252 indicating that the respondent has not used that brand of washing up liquid, the following columns must be blank. Conversely, if c1251, the following columns must be coded according to the codes on the questionnaire.
require
Both tasks listed above can be carried out using if but sometimes they can become very complicated and repetitive. Therefore, Quantum has an additional testing statement, require, specifically designed to increase the efficiency of this checking process.
stated, that the error action code is the default Print and Reject (code 3) and will omit it from most of the examples accordingly The most basic form of the require statement simply checks whether the column or field of columns contains the correct type of code; it does not check the individual codes themselves. Code types may be: b nb sp Blank Not blank (i.e., single-coded or multi coded) Single-coded (literally, single-punched)
spb Single-coded or blank One of these types must follow the word require since it tells Quantum what to check for. All that remains is to say which columns are to be inspected; just list each column or field of columns at the end of the statement. If more than one column or field is defined, each one must be separated by a comma. Here are some examples in which the record to be checked is: ----+----1----+----2----+----3----+----4----+ 002411123481231&- *1927235537*&& 1 1 1 The statement: require nb c10, c(25,35) checks that columns 10, and 25 to 35 inclusive are not blank they may contain any number of codes. This record satisfies both conditions so it passes on to the next statement in the edit. The statement: r sp c11, c15, c23, c41 looks to see whether columns 11, 15, 23 and 41 are single-coded. In our record they are, but if this were not the case (say c11123) the record would be printed out and rejected from any tables that may be produced. Additionally, Quantum would tell us Column 11 is 123.
end of the statement. This text will then be printed in place of the default text when errors are found. For example, if c329 is multicoded when it should be single-coded, the statement: r sp c329 will print the whole record and tell us which codes were found in that multicode: Column 329 is 13 Instead of being told which codes the column contains, you may prefer to see a message linking the error to a question on the questionnaire. In this case you will need to add your own error text as follows: r sp c329 $q21a not sp$ These texts may be as long or short as you like.
codes in this column are ignored. Thus, a record in which c22314 is incorrect because it
contains two of the listed codes, whereas a record in which c22327 is correct because it contains only a 2 from the range 1/5. Of course, any record which does not contain a 1, 2, 3, 4 or 5 at all is also incorrect, regardless of whether or not it is single-coded: c2239 is just as wrong as c223789&.
Exclusive codes
To check that a column or field contains no codes other than those listed, type: r [/err_code] condition col1codes1o
If col1 contains any codes other than those given in codes1, the test is false. Now that you know how to check codes, the next thing to discuss is how to check that all other code positions are blank. We have said that statements of the form: r sp cap accept all records containing only one of the codes p in column a, regardless of what other codes are also present. To check that a column contains only the listed codes and nothing else, follow the code specification with the letter O (for only) in upper or lower case. For example, to indicate that c356 must be single-coded in the range 1/5 and that all other positions (6/&) must be blank, you should type: r sp c3561/5o which is the same as if (c3566/&.or.numb(c356).ne.1) write; reject Any of the following would cause the record to be printed and rejected: c35634 c35659 c3568 c356
Require may define conditions for more than one column. Just follow each column with the code
positions to be checked and separate each set with a comma: r sp c16412-, c1651/70, c1661/3, c1671/9-, c1681/5 Here the columns to be checked are consecutive but have been listed separately because they each have different sets of valid codes. If all columns could be single-coded in the range 1 to 7 we might abbreviate this to: r sp c(164,168)1/7 $q10a/e$ since this notation means that each column in the field must be single-coded within the given range rather than that the field as a whole may contain only one of those codes.
new_code is the code or codes to be inserted in col1 if it fails the test condition. Any codes
already in that column are overwritten.
As you know, records found to have errors are printed, coded and/or rejected according to the error action code. When the run is finished you will look at these records and, if possible, correct the errors by using the on-line edit or correction file facilities. Occasionally you will know in advance what to do with certain types of error; say, for instance, the respondents sex has been miscoded. You may decide or be told to recode this person as a 3 in the appropriate column indicating that the sex was not known. The way to do all this in one go is to write the normal require statement that checks columns and codes, and to follow the code specification with a colon (:) and the replacement code (in this case 3) enclosed in single quotes, thus: r /2/ sp c10612 :3 Any record in which c106 is not single-coded with either a 1 or a 2 will have the contents of c106 overwritten with a 3. The equivalent using if and an assignment statement would be written: if (numb(c10612).ne.1) c1063; +write $c106 incorrect$ Once again, the require is shorter and quicker. When working with fields, it is not possible to define replacement strings for the field as a whole. You should, however, note that if a single replacement code is given for a field of columns, any incorrect columns in that field will be overwritten with the replacement code. The correct columns remaining untouched. If we have: +----4----+ 1927 and we write c(237,240)1/5 :&" we will have: +----4----+ 1&2&
For example: r /3/ (c1334 .and. c140n5) $Cols 33/40 incorrect$ says that c133 must contain a 4 and c140 must not contain a 5. If one or other or both expressions are false, Quantum prints the record out with the message Cols 33/40 incorrect and rejects it from the tables.
Require can evaluate groups of expressions and perform given tasks depending on whether all
expressions are true or all are false. When all the expressions have the same value (i.e., all true or all false) Quantum continues with the next statement in the program, whereas if some are true and some are false, the record being tested will be dealt with according to the given (or default) error action code. This statement has five parts:
1. The word require or the letter r. 2. An equals sign which must be preceded by a space.
3. An optional action code. 4. The expressions to be evaluated, each one enclosed in parentheses . 5. Optional error text enclosed in dollar signs. This type of statement is generally used to check routing patterns. For example: if a 2 in c125 means that the respondent did not try Brand A washing powder, we would expect columns 126 to 145 which record his opinion of it to be blank. On the other hand, if he tried the washing powder, we would expect to find his opinions about it coded in columns 126 to 145. This can be written: r = (c1252) (c(126,145)=$ $) which says that to be accepted, a record must either have a 2 in column 125 and blanks in columns 126 to 145, or something other than a 2 in c125 with at least one code somewhere in c(126,145).
that the respondents sex is coded as a 1 or a 2 only, you may wish to blank out the column if it contains any other code or codes. You could write this as: r sp c12312 if (failed_) set c123 The test for failure is made on the last require statement executed for the current record. This may not always be the most recent require statement in the program, and it may not be the
Data correction
There are four ways to correct data: o o o o Correct the data in the original data file. Correct the data in the C array interactively. Replace the incorrect codes with specific codes using edit forcing statements. Write a file of corrections to be merged with the original data when it is read in by a Quantum program.
A record which generates too many error messages, or which is clearly incorrect can be removed, as noted. Suppose its serial number is 2004. Then we have: if (c(101,104)=$2004$) reject; return This rejects the record from the rest of the edit and the tabulation section as well. This statement should be at the beginning of the edit to avoid unnecessary editing of a useless record. Columns within a record can be removed by blanking them out or setting them to a common reject code, often a minus or ampersand. For example: if(c125n12) c125&; c(126,145)=$ $ All records in which c125 contains neither a 1 or a 2 will have the contents of that column replaced with an ampersand, and whatever is in c(126,145) blanked out. As a real-life example, suppose a 1 in c125 means that the respondent visited the market, and a 2 in that column means he did not. Information about purchases made at the market are stored in c(126,145). If column 125 contains neither a 1 or a 2, we cannot clearly establish whether or not the respondent visited the market so we set c125 to a special code and blank out any information about purchases. Inserting correct data is generally more difficult than removing invalid data, because you very often dont know what the correct data is. However, if you do know, you can correct the data record by record, or make the same correction for any record which is incorrect. For instance: if(c(101,104)=$2222$) c1122; c(113,114)=$ $ corrects the record whose serial number is 2222 by setting a 2 into c112 and blanking out c(113,114). If you do not know what the correct data is, you may decide to replace the incorrect code or codes with a valid code chosen at random. For example: if (c(101,104)=$3625$) c145=rpunch(1/5) replaces whatever was in column 145 with one of the codes 1 through 5 for the record whose serial number is 3625.
colwid=n Defines the width of columns in the printed tables where no p statements exist in the column csort Sort tables column-wise (i.e., horizontal sorting rather than vertical row-wise sorting). date By default, tables are printed without a date. Use of the keyword date causes the current date to be printed in the top right-hand corner of each table. The date is in the format dd mm yy dec=n This determines the number of decimal places for absolute figures. If
decimal places are allowed, as long as you make each column wide enough to accommodate them. dsp This leaves one blank line between each row of data in a table. Without this, one line follows directly underneath another. flt=name Invokes the filter conditions and titles named on the flt= statement. If the filter defines conditions, the rules governing data options apply. flush Causes rows containing percentages to be printed with the percentages directly below the absolutes rather than one column to the right. indent=n Where a row text is longer than the space allocated to the row text in the table, Quantum breaks the line in between words and contin ues the text on the next line. To have these continuation lines indented from the left margin, specify the amount of indentation required with indent=. Texts may be indented by between 0 and 15 spaces: the default is indent=0. op=n This keyword governs the type of output in the tables. Output types are & Total percentages. The value in the cell is percentaged against the number in the upper left-hand corner of the table (normally the base) rather than on the totals in the relevant column or row. If the table contains more than one base element, percentages are calculated using the leftmost figure in the most recent base element. - Row rank figures are printed below each cell. Figures are ranked within rows, using 1 for the largest figure. Where two or more numbers have the same rank, they are all assigned the lowest rank possible. Thus, if the previous rank was 2 and the next value to be ranked occurs in the row three times, those numbers will all be ranked 5. 1. 0 Row percentages. 2. 1 Absolute figures (default). 3. 2 Column percentages. 4. 3 Column rank figures are printed below each cell. Figures are ranked within columns, using 1 for the largest figure. Where two or more numbers have the same rank, they are all assigned the lowest rank possible. Thus, if the previous rank was 2 and the next value to be ranked occurs in the column three times, those numbers will all be ranked 5. 5. 5 Prints the text 100% on each cell of the base row.
age This option invokes automatic page numbering. Since this is the default pages are numbered from 1 automatically this option is generally used in its negative form of nopage which suppresses automatic page numbering. paglen=n This determines the number of lines printed on each page. The default is paglen=60 lines but any value between 10 and 10,000 is valid. pagwid=n Normally tables can be up to 132 characters wide. pagwid= enables you to decrease the page width or to extend it to a maximum of 10,000 characters. pc This prints percent signs after percentage figures. This is the default, so this option is usually used negatively nopc to print percentage figures without percent signs. sort: Creates sorted or ranked tables. wm=n This keyword names the weighting matrix to be used.
Creating a table
To create a table, type: tab [axis1] [axis2] [axis3] [axis4] row_axis column_axis [;options] In order to create a table, Quantum needs to know which is the column axis and which isthe row axis. If the table has more than two dimensions you will need to say which axes should be used for the extra dimensions. Each table must be created separately using a tab statement, as follows: tab row-axis column-axis Tab statements must precede the axes definitions in your program file. multidimensional tables Multidimensional tables are ones created from more than two axes. They occur when a series of tables has the same rows and columns, but each table in the group has additional characteristics which are themselves the conditions of other axes. This sounds complicated, so lets take an example. Our basic table is of age by sex created by the tab statement: tab age sex
We have been asked to produce a separate table of age by sex for each region of the country. Whereas before each cell had two conditions (age and sex) it now has three (region, age and sex). There are two ways of writing this specification. You may either: a) write as many tab statements as there are regions, and filter each table of age by sex to include only those respondents resident in a given region, or b) write a single tab statement to create a three-dimensional table. Both methods produce the same results the main advantage of (b) over (a) is that (b) involves you in a lot less work. The tab statement to create the multidimensional table is:
tab region age sex
und region age;inc=c(35,38) will place the second table underneath the first one To add tables, type a tab statement for the first table and follow it with: add[col_offset[,row_offset] ] axis_names where axis_names is the same number of axis names as appears on the tab statement. for example: tab ax01 bk01 add ax02 bk02 Here we are creating the table ax02 by bk02 and adding it to the table ax01 by bk01. To divide one table by another, define the top table on a tab statement followed by: div axis_names [;options] where axis_names is a list of as many axis names as there are on the tab statement, and
options is any of the keywords anlev=, c=, inc=, maxim, means, median, minim or wm=. The
statements: tab ax06 brk1 div ax07 brk2
Defines the denominator of a table to be produced by dividing the table specified On the
previous tab statement by that on the div line.
Types of elements within axes There are four types of element in an axis: o o o o Text and condition elements Text elements Arithmetic elements Statistical elements
Text elements
These elements create nothing but text; no cells containing counts or values are created from these elements.
There are three statements which are used within an axis to create text-only elements. These are: n03 create a text-only element n23 create a subheading n33 continue long element texts If you would like subheadings to be underlined, place one of the options unl1, unl2 or unl3 on the n23. The hdlev= keyword allows you to define various levels of subheading, starting at level 1 for the top subheading down to level 9 for the lowest level. If you would prefer the text to be left justified above the columns to which it refers, add the option hdpos=l to the n23. If you would prefer the text to be right justified, use hdpos=r instead. (hdpos=c is also available for centered text but since this is the default you are unlikely to need it).
Arithmetic elements
These are elements which contain arithmetic values rather than counts. For example, one element may tell you the number of times a product was bought rather than the number of people who bought it.
Statistical elements
Part of Quantums power lies in the fact that it offers you the ability to create various types of statistical output without having to know the formulae necessary to calculate them. These elements contain totals, subtotals or statistical functions such as means and standard deviations. Statements which perform statistical calculations are: n07 average n12 mean n13 sum of factors n17 standard deviation n19 standard error of the mean n20 error variance of the mean n30 medians n04 total n05 subtotal To define incremental values for means, standard deviations, standard errors and error variances, type:
n25[element_text; inc=arith_expr [;c=log_expr] [; row] [; col] The n25 does not normally print anything in the table. Use row and/or col to print these values as the rows and/or columns of the table.
factors
fac= defines factors when the numbers in the data are not to be used (e.g., the data may be multicoded) whereas inc=, also mentioned in the Data Options section, reads the data from the
column and uses that as the factor for each row. What to use when is best illustrated by examples, although in general you should try to use fac= whenever possible since, in processing terms, it is more efficient than inc=.
The respondent has been asked to say how much he agrees or disagrees with a particular statement. If he agrees very much, he has a code 1 in, say, C210. If he agrees somewhat, he has a 2; if he neither agrees nor disagrees he is coded as 3; disagrees somewhat, a 4 and disagrees very much, a 5. People who refuse to answer are coded as C210&. We wish to obtain a numerical mean value of these opinions using factors of +2 for agrees very much down to 2 for disagrees very much. These are not the same as the codes representing these responses in the data, so we enter them with fac=. People who refused to answer will appear in the table but will not be included in the mean. So the axis will look like l vers1 n01Agrees Very Much;c=c2101;fac=2 n01Agrees Somewhat;c=c2102;fac=1 n01Neither Agrees Nor Disagrees;c=c2103;fac=0 n01Disagrees Somewhat;c=c2104;fac=-1 n01Disagrees Very Much;c=c2105;fac=-2 n01Refused;c=c210& n12Mean;dec=2
Miscellaneous n statements
To define a condition that applies to a group of consecutive elements, type: n00;c=logical_expression
An n00 defines a condition applicable to all subsequent rows until another n00 is read or until the end of the axis, whichever is the sooner. Its format is: n00[;c=condition] Where the condition is any valid logical expression. To override the automatic page turnover within an axis, insert the statement: n09[Text] at the point at which a new page is required. Text is an optional text which will be printed beneath the table headings at the top of the next page.
If several consecutive statements in an axis have conditions defined by a code or codes in the same column, you can save yourself a lot of time and effort by replacing the individual n01 statements with a single col statement. One of the simplest col statements you can write is: col n;[base];Rtext1[=p1];Rtext2[=p2] where n is the column containing the codes for this question, base creates a base element, and Rtext1=p1, Rtext2=p2 and so on define the texts and conditions for the individual elements. To explain more clearly how the col statement works, lets take the axis mstat that we wrote earlier and rewrite it using a col statement. Originally it consisted of five statements: n10Base n01Single;c=c1091 n01Married;c=c1092 n01Divorced;c=c1093 n01Widowed;c=c1094 We can replace these with the line: col 109;Base;Single;Married;Divorced;Widowed
Val can be used to test whether the value of a variable is equal to a given value. If it is equal, the
cell count is incremented by 1. The format is: val variable;[Base];[hd=Text];=;[tx=Text];n1 [Text1]; ... ;nn [Textn] where variable is the data, integer or real variable whose value is to be tested, n1 to nn are the values against which the variable is to be compared, and Text1 to Textn are the row descriptions to be printed in the table. The equals sign indicates that the test is for arithmetic equality rather than ranges. Base, hd= and tx= are optional and create the base, sub-heading and text-only rows of the table as described for col statements. Lets work through an example to illustrate this. Suppose c(110,111) contains data on the number of people in the household, and we wish to set up a table showing how many respondents live in households containing 1, 2, 3, 4, 5 or 6 people, so we write: val c(110,111);Base;Hd=Number in Household;=;1 Person;2 People; +3 People;4 People;5 People;6 People
The column specs on a fld statement define the columns to be read. There are three ways of entering them. First, you may list each column or field reference one after the other, separated by commas. The list must be enclosed in parentheses. In our example this would be: fld (c(12,13), c(14,15), c(16,17)) Second, if you have sequential fields as you do here, you can type the start columns of each field followed by the field length. The list of start columns is separated by commas and enclosed in parentheses, and the field length comes after the closing parenthesis and starts with a colon. If you use this notation for the film example you would write: fld (c12, c14, c16) :2 If you wish, you can abbreviate this further by typing just the start columns of the first and last fields, followed by the field length. This time you do not use parentheses: fld c12, c16 :2 Third, if the fields are not sequential, you may list the start columns and field width of each group of columns (as shown above) and separate each group with a slash. For example, to read data from columns 12 to 17 and 52 to 57, with each field being two columns wide, you would type: fld c12, c16 / c52, c56 :2 This reads c(12,13), c(14,15), c(16,17), c(52,53), c(54,55) and c(56,57). You can also use this notation for single non-sequential fields. For example: fld c23 / c36 / c71 :2 means c(23,24), c(36,37) and c(71,72).
The element specs part of the statement defines the element texts and the codes which represent those responses. If you enter element texts by themselves, Quantum assumes that the first text is code 1, the second text is code 2, and so on. The codes apply to all fields named in the column specs part of the statement. Therefore, to define elements which will count the number of people who saw each film, you would write: fld c12,c16:2;Columbus;Aliens 3;Pretty Woman; +Green Card;Batman 2
Weighting in Quantum
Sometimes in surveys we treat the respondents as representatives of the total population of which they are a sample. Normally, tables reflect the attitudes of the people interviewed, but we
may want the tables to reflect the attitudes of the total population instead, so that it seems as if we had interviewed everyone rather than just a sample of the population. This, of course, assumes that the people interviewed are a truly representative sample. If we take a sample of 380 from a population of 10,000 middle-aged housewives, and discover that 57 members of this sample buy cheddar cheese, we may want the number of middle-aged housewives who buy cheddar cheese to read 1,500 in our tables, not 57. Moving from 57 to 1,500 is the fine art of weighting. In this case, each middle-aged housewife has a weight of 10,000/380. Since 57 of them buy cheddar cheese, the number in the cell will be: 10000 / 380 * 57 = 1,500 Weighting is also used to correct biases that build up during a survey. For example, when conducting interviews by telephone you may find that 60% of the respondents were women. You may then want to correct this ratio of men to women to make the two groups more evenly balanced.
Weighting methods
Quantum is sufficiently flexible to allow more than one set of weights for a given set of respondents. Which set is applied is determined by options on the a,sectbeg, flt or tab statement or on the statements which create the individual rows or columns of a table. Each set of weights, however, will apply one weight for each respondent. There are two ways of calculating weights: a) The weight for each respondent may be part of the data for that respondent, or it may be calculated in the edit and passed to the tabulation section as a variable. b) The more common method of weighting is to define a set of characteristics and apply specific weights to respondents satisfying those characteristics.
Types of weighting
Quantum offers factor, target and rim weighting, preweights, postweights, weighting using proportions and weighting to a given total.
Factor weighting
With factor weighting, every record which satisfies a given set of conditions is assigned a specific weight. You would generally use it when the weights are calculated outside of Quantum for instance, you may be told that all unemployed people in London require a weight of 10.5, whereas unemployed people in the rest of the country need a weight of 7.3.
Target weighting
Target weights may be used when you know the exact number of respondents you want to appear in each cell of the weighted table. For example, in a table of age by sex, you may know
the exact number of men under 21, women under 21, and so on, to appear in the table once it has been weighted. The weights that you define in your matrix are therefore the values to appear in the weighted table rather than the weights to be applied to each respondent of a given age and sex.
Rim weighting
Rim weighting is used when: a) you want to weight according to various characteristics, but do not know the relationship of the intersection of those characteristics, or b) you do not have enough respondents to fill all the possible cells of the table if you were to weight the data using the multidimensional technique described above. For example, you may want to weight by age, sex and marital status and may know the weights for each category of those characteristics (e.g. people aged 25 to 30; men; single people). However, you may not know the weights for, say, single men aged between 25 and 30, married women aged between 31 and 40, and so on.
Preweights
Preweights, stored as part of each respondents data or created during the edit, are applied to individual records before target or factor weighting is applied. When the characteristic weights are targets, the preweights are used in the calculation of the weight for each respondent.
Postweights
The opposite of preweights are postweights, which are applied after all other weights have been applied, and therefore have no effect on the way in which targets are reached. They are generally used to make a final adjustment to a specific item.
Descriptive statistics
Quantum provides facilities for calculation of a set of basic statistics from the figures produced in Quantum tabulations. They include the statistics most commonly used for testing hypotheses about the values of proportions (percentages) and the locations (average values) of variables, and about differences in these between two or more subsets of the data. There are also chisquared statistics for testing hypotheses about a single distribution or about differences between two or more distributions. The statistical tests available are:
o One-dimensional, two-dimensional and single classification chi-squared tests o Four tests of differences between proportions (Z-tests) o Two tests of differences between means (T-tests) o Friedmans test of differences in location between a set of related samples (sometimes
known as Friedmans two-way analysis of variance)
o Kolmogorov-Smirnov test of differences between two samples o McNemars test of the significance of changes o F Test for testing differences between a set of means (one-way analysis of variance
(ANOVA))
Quanvert
Quanvert is the Windowed version of quantum database. In other words , it is the GUI for Quantum . Quanvert can process surveys of any type, size or complexity. Whether it's a survey with hundreds of questions, or millions of respondents, or one that's been conducted on a regular basis for years - Quanvert can handle it fast. Quanvert has been specifically designed for the market researcher. You don't have to be a data processing or computer expert, or a statistician -you just have to be interested in your survey results! And you can investigate your data from your desktop, without having to search through
volumes of printed reports. There is no need to predict what analyses you will require before you receive your data Any table can be created based on any variable or question. You can test out any hypothesis, and dig as deep into the data as you wish. For instance, you may want to examine the age group of people who responded positively to an advertisement. You can then take this a stage further and produce a series of tables filtered on those females interviewed. Quanvert is especially powerful for analyzing individual responses to verbatim or "open" questions.
using Quantum
Before you can convert a Quantum spec and data file into a Quanvert database there are several tasks you may need to carry out first. These include checking the Quantum program to ensure that it will create the required information in the appropriate places, and setting up subdirectories if variables are not to be stored in the main project directory. If you have a large database from which you require only a few variables, you may use the raw Quantum data rather than creating a full Quanvert database. To create a Quanvert database, Following command needs to be given at the command prompt : quantum v [pd dir_1] [td dir_2] [prog_file] [data_file] The v parameter tells Quantum not to produce tables but, when it reaches the output stage, to run the flip program instead. The pd and td parameters allow you to read files from and create temporary files in directories other than the directory in which you are running Quantum. All Quanvert projects originate from Quantum. Although Quanvert produces tables identical to those generated by Quantum, it does not normally use the raw data and Quantum program files. Instead, it uses a series of compressed data and axis files, one pair per axis, derived from the Quantum files. These individual databases are referred to as inverted or transposed databases, and the process which creates them is called flipping. In databases with simple axes it is possible to run Quanvert almost immediately on the raw Quantum data.
The sex axis, for instance, will have two sex.ax containing the element texts and sex. Fli containing the inverted data for that axis. To tidy a directory once the database has been created, type: flipclean [a] under Unix or: flipclea [a] under DOS. This deletes any temporary files created during the flip process but leaves intact any files which are needed for Quanvert. Example
*include vars
External Variables and Arrays are declared in a file called Vars and included before including edit section
ed *include edit end Edit section will have calculations of counts, column settings to get counts which are not straight-forward.
wm1 wax1 wax2;rim;input; +20;30;50; +50;50; +33;33;33 Weighting of the dat in the output ( if required )
*include tabs
Will have details of what to be tabulated with what in order to get a table
*include axes
*include breaks