Sie sind auf Seite 1von 79

Interview questions

1. What version of SAS are you currently using? SAS 9.2 2. What does the difference between combining 2 datasets using multiple SET statement and MERGE statement? 3. Describe your familiarity with SAS Formats / Informats. 4. Can you state some special input delimiters? 5. It is possible to use the MERGE statement without a BY statement? Explain? 6. What is the purpose of trailing @ and @@ ? 7. For what purposes do you use DATA _NULL_? 8. Identify statements whose placement in the DATA step is critical. 9. What is a method for assigning first.VAR and last.VAR to the BY group variable on unsorted data? 10. How would you delete duplicate observations? 11. What is the Program Data Vector (PDV)? What are its functions? 12. What are SAS/ACCESS and SAS/CONNECT? 13. How would you determine the number of missing or nonmissing values in computations? 14. What is the difference between %LOCAL and %GLOBAL? 15. What is auto call macro and how to create a auto call macro? What is the use of it? 16. If you use a SYMPUT in a DATA step, when and where can you use the macro variable? 17. Name atleast five compile time statements? 18. SCAN vs. SUBSTR function? 19. Describe the ways in which you can create macro variables? 20. State some differences between the DATA Step and SQL? 21. What system options are you familiar with? 22. What is the COALESCE function? 23. Give some differences between PROC MEANS and PROC SUMMARY? 24. Have you used Call symputx ? What points need to be kept in mind when using it? 25. What option in PROC FORMAT allows you to create a format from an input control data set rather than VALUE statement? 26. How would you code a macro statement to produce information on the SAS log? 27. Name four set operators? 28. What does %put do? 29. Which gets applied first when using the keep= and rename= data set options on the same data set? 30. Give example of macro quoting functions? 31. What system option determines whether the macro facility searches a specific catalog for a stored, compiled macro? 32. Do you know about the SAS autoexec file? What is its significance? 33. What exactly is a sas hash table?

34. What is PROC FREQs default behavior for handling missing values? 35. I am trying to find the ways to find outliers in data. Which procedures will help me find it? 36. Have you used ODS Statements? What are benefits of ODS? 37. I want to make a quick backup of a data sets along with any associated indexes What procedure can I use? 38. What is a sas catalog? 39. What does the statement format _all_; do? 40. State different ways of combining sas datasets? 41. State different ways of getting data into SAS? 42. What is The SQL Procedure Pass-Through Facility? 43. How can you Identify and resolve programming logic errors? 44. Can a FORMAT, LABEL, DROP, KEEP, or LENGTH statements use array references? 45. What is sas PICTURE FORMATS? Question: What is the function of output statement? Answer: To override the default way in which the DATA step writes observations to output, you can use an OUTPUT statement in the DATA step. Placing an explicit OUTPUT statement in a DATA step overrides the automatic output, so that observations are added to a data set only when the explicit OUTPUT statement is executed. Question: What is the function of Stop statement? Answer: Stop statement causes SAS to stop processing the current data step immediately and resume processing statement after the end of current data step. Question : What is the difference between using drop= data set option in data statement and set statement? Answer: If you dont want to process certain variables and you do not want them to appear in the new data set, then specify drop= data set option in the set statement. Whereas If want to process certain variables and do not want them to appear in the new data set, then specify drop= data set option in the data statement. Question: Given an unsorted dataset, how to read the last observation to a new data set? Answer: using end= data set option. For example: data work.calculus; set work.comp end=last; If last; run; Where Calculus is a new data set to be created and Comp is the existing data set last is the temporary variable (initialized to 0) which is set to 1 when the set statement reads the last observation.

Question : What is the difference between reading the data from external file and reading the data from existing data set ? Answer: The main difference is that while reading an existing data set with the SET statement, SAS retains the values of the variables from one observation to the next. Question: What is the difference between SAS function and procedures? Answer: Functions expects argument value to be supplied across an observation in a SAS data set and procedure expects one variable value per observation. For example: data average ; set temp ; avgtemp = mean( of T1 T24 ) ; run ; Here arguments of mean function are taken across an observation. proc sort ; by month ; run ; proc means ; by month ; var avgtemp ; run ; Proc means is used to calculate average temperature by month (taking one variable value across an observation). Question: Differnce b/w sum function and using + operator? Answer: SUM function returns the sum of non-missing arguments whereas + operator returns a missing value if any of the arguments are missing. Example: data mydata; input x y z; cards; 33 3 3 24 3 4 24 3 4 .32 23 . 3 54 4 . 35 4 2 ; run; data mydata2; set mydata; a=sum(x,y,z); p=x+y+z; run; In the output, value of p is missing for 3rd, 4th and 5th observation as :

ap 39 39 31 31 31 31 5. 26 . 58 . 41 41 Question: What would be the result if all the arguments in SUM function are missing? Answer: a missing value Question: What would be the denominator value used by the mean function if two out of seven arguments are missing? Answer: five Question: Give an example where SAS fails to convert character value to numeric value automatically? Answer: Suppose value of a variable PayRate begins with a dollar sign ($). When SAS tries to automatically convert the values of PayRate to numeric values, the dollar sign blocks the process. The values cannot be converted to numeric values. Therefore, it is always best to include INPUT and PUT functions in your programs when conversions occur. Question: What would be the resulting numeric value (generated by automatic char to numeric conversion) of a below mentioned character value when used in arithmetic calculation? 1,735.00 Answer: a missing value Question: What would be the resulting numeric value (generated by automatic char to numeric conversion) of a below mentioned character value when used in arithmetic calculation? 1735.00 Answer: 1735 Question: Which SAS statement does not perform automatic conversions in comparisons? Answer: where statement Question: Briefly explain Input and Put function? Answer: Input function Character to numeric conversionInput(source,informat) put function Numeric to character conversionput(source,format) Question: What would be the result of following SAS function(given that 31 Dec, 2000 is Sunday)?

Weeks = intck (week,31 dec 2000d,01jan2001d); Years = intck (year,31 dec 2000d,01jan2001d); Months = intck (month,31 dec 2000d,01jan2001d); Answer: Weeks=0, Years=1,Months=1 Question: What are the parameters of Scan function? Answer: scan(argument,n,delimiters) argument specifies the character variable or expression to scan n specifies which word to read delimiters are special characters that must be enclosed in single quotation marks Question: Suppose the variable address stores the following expression: 209 RADCLIFFE ROAD, CENTER CITY, NY, 92716 What would be the result returned by the scan function in the following cases? a=scan(address,3); b=scan(address,3,,'); Answer: a=Road; b=NY Question: What is the length assigned to the target variable by the scan function? Answer: 200 Question: Name few SAS functions? Answer: Scan, Substr, trim, Catx, Index, tranwrd, find, Sum. Question: What is the function of tranwrd function? Answer: TRANWRD function replaces or removes all occurrences of a pattern of characters within a character string. Question: Consider the following SAS Program data finance.earnings; Amount=1000; Rate=.075/12; do month=1 to 12; Earned+(amount+earned)*(rate); end; run; What would be the value of month at the end of data step execution and how many observations would be there? Answer: Value of month would be 13 No. of observations would be 1 Question: Consider the following SAS Program data finance; Amount=1000; Rate=.075/12;

do month=1 to 12; Earned+(amount+earned)*(rate); output; end; run; How many observations would be there at the end of data step execution? Answer: 12 Question: How do you use the do loop if you dont know how many times should you execute the do loop? Answer: we can use do until or do while to specify the condition. Question: What is the difference between do while and do until? Answer: An important difference between the DO UNTIL and DO WHILE statements is that the DO WHILE expression is evaluated at the top of the DO loop. If the expression is false the first time it is evaluated, then the DO loop never executes. Whereas DO UNTIL executes at least once. Question: How do you specify number of iterations and specific condition within a single do loop? Answer: data work; do i=1 to 20 until(Sum>=20000); Year+1; Sum+2000; Sum+Sum*.10; end; run; This iterative DO statement enables you to execute the DO loop until Sum is greater than or equal to 20000 or until the DO loop executes 10 times, whichever occurs first. Question: How many data types are there in SAS? Answer: Character, Numeric Question: If a variable contains only numbers, can it be character data type? Also give example Answer: Yes, it depends on how you use the variable Example: ID, Zip are numeric digits and can be character data type. Question: If a variable contains letters or special characters, can it be numeric data type? Answer: No, it must be character data type. Question; What can be the size of largest dataset in SAS? Answer: The number of observations is limited only by computers capacity to handle and store them.

Prior to SAS 9.1, SAS data sets could contain up to 32,767 variables. In SAS 9.1, the maximum number of variables in a SAS data set is limited by the resources available on your computer. Question: Give some example where PROC REPORTs defaults are different than PROC PRINTs defaults? Answer: No Record Numbers in Proc Report Labels (not var names) used as headers in Proc Report REPORT needs NOWINDOWS option Question: Give some example where PROC REPORTs defaults are same as PROC PRINTs defaults? Answer: Variables/Columns in position order. Rows ordered as they appear in data set. Question: Highlight the major difference between below two programs: a. data mydat; input ID Age; cards; 2 23 4 45 3 56 9 43 ; run; proc report data = mydat nowd; column ID Age; run; b. data mydat1; input grade $ ID Age; cards; A 2 23 B 4 45 C 3 56 D 9 43 ; run; proc report data = mydat1 nowd; column Grade ID Age; run; Answer: When all the variables in the input file are numeric, PROC REPORT does a sum as a default.Thus first program generates one record in the list report whereas second generates four records.

Question: In the above program, how will you avoid having the sum of numeric variables? Answer: To avoid having the sum of numeric variables, one or more of the input variables must be defined as DISPLAY. Thus we have to use : proc report data = mydat nowd; column ID Age; define ID/display; run; Question: What is the difference between Order and Group variable in proc report? Answer: If the variable is used as group variable, rows that have the same values are collapsed. Group variables produce list report whereas order variable produces summary report. Question: Give some ways by which you can define the variables to produce the summary report (using proc report)? Answer: All of the variables in a summary report must be defined as group, analysis, across, or Computed variables. Questions: What are the default statistics for means procedure? Answer: n-count, mean, standard deviation, minimum, and maximum Question: How to limit decimal places for variable using PROC MEANS? Answer: By using MAXDEC= option Question: What is the difference between CLASS statement and BY statement in proc means? Answer: Unlike CLASS processing, BY processing requires that your data already be sorted or indexed in the order of the BY variables. BY group results have a layout that is different from the layout of CLASS group results. Question: What is the difference between PROC MEANS and PROC Summary? Answer: The difference between the two procedures is that PROC MEANS produces a report by default. By contrast, to produce a report in PROC SUMMARY, you must include a PRINT option in the PROC SUMMARY statement. Question: How to specify variables to be processed by the FREQ procedure? Answer: By using TABLES Statement.

Question: Describe CROSSLIST option in TABLES statement? Answer: Adding the CROSSLIST option to TABLES statement displays crosstabulation tables in ODS column format. Question: How to create list output for crosstabulations in proc freq? Answer: To generate list output for crosstabulations, add a slash ( /) and the LIST option to the TABLES statement in your PROC FREQ step. TABLES variable-1*variable-2 <* variable-n> / LIST; Question: Proc Means work for ________ variable and Proc FREQ Work for ______ variable? Answer: Numeric, Categorical Question: How can you combine two datasets based on the relative position of rows in each data set; that is, the first observation in one data set is joined with the first observation in the other, and so on? Answer: One to One reading Question: data concat; set a b; run; format of variable Revenue in dataset a is dollar10.2 and format of variable Revenue in dataset b is dollar12.2 What would be the format of Revenue in resulting dataset (concat)? Answer: dollar10.2 Question: If you have two datasets you want to combine them in the manner such that observations in each BY group in each data set in the SET statement are read sequentially, in the order in which the data sets and BY variables are listed then which method of combining datasets will work for this? Answer: Interleaving Question: While match merging two data sets, you cannot use the __________option with indexed data sets because indexes are always stored in ascending order. Answer: Descending Question: I have a dataset concat having variable a b & c. How to rename a b to e & f? Answer: data concat(rename=(a=e b=f)); set concat; run; Question : What is the difference between One to One Merge and Match Merge? Give example also.. Answer: If both data sets in the merge statement are sorted by id(as shown below) and each observation in one data set has a

corresponding observation in the other data set, a one-to-one merge is suitable. data mydata1; input id class $; cards; 1 Sa 2 Sd 3 Rd 4 Uj ; data mydata2; input id class1 $; cards; 1 Sac 2 Sdf 3 Rdd 4 Lks ; data mymerge; merge mydata1 mydata2; run; If the observations do not match, then match merging is suitable data mydata1; input id class $; cards; 1 Sa 2 Sd 2 Sp 3 Rd 4 Uj ; data mydata2; input id class1 $; cards; 1 Sac 2 Sdf 3 Rdd 3 Lks 5 Ujf ; data mymerge; merge mydata1 mydata2; by id run; What is the effect of the OPTIONS statement ERRORS=1? Whats the difference between VAR A1 - A4 and VAR A1 A4? What do the SAS log messages "numeric values have been converted to character" mean? What are the implications?

Why is a STOP statement needed for the POINT= option on a SET statement? How do you control the number of observations and/or variables read or written? Approximately what date is represented by the SAS date value of 730? How would you remove a format that has been permanently associated with a variable?? What does the RUN statement do? Why is SAS considered self-documenting? What areas of SAS are you most interested in? Briefly describe 5 ways to do a "table lookup" in SAS. What versions of SAS have you used (on which platforms)? What are some good SAS programming practices for processing very large data sets? What are some problems you might encounter in processing missing values? In Data steps? Arithmetic? Comparisons? Functions? Classifying data? How would you create a data set with 1 observation and 30 variables from a data set with 30 observations and 1 variable? What is the different between functions and PROCs that calculate the same simple descriptive statistics? If you were told to create many records from one record, show how you would do this using arrays and with PROC TRANSPOSE? What are _numeric_ and _character_ and what do they do? How would you create multiple observations from a single observation? For what purpose would you use the RETAIN statement? What is a method for assigning first.VAR and last.VAR to the BY group variable on unsorted data? What is the order of application for output data set options, input data set options and SAS statements? What is the order of evaluation of the comparison operators: + - * / ** ( )? How could you generate test data with no input data? How do you debug and test your SAS programs? What can you learn from the SAS log when debugging? What is the purpose of _error_? How can you put a "trace" in your program? Are you sensitive to code walk-throughs, peer review, or QC review? Have you ever used the SAS Debugger? What other SAS features do you use for error trapping and data validation? How does SAS handle missing values in: assignment statements, functions, a merge, an update, sort order, formats, PROCs? How many missing values are available? When might you use them? How do you test for missing values? How are numeric and character missing values represented internally?

Very Basic What SAS statements would you code to read an external raw data file to a DATA step? How do you read in the variables that you need? Are you familiar with special input delimiters? How are they used? If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value? What is the difference between an informat and a format? Name three informats or formats. Name and describe three SAS functions that you have used, if any? How would you code the criteria to restrict the output to be produced? What is the purpose of the trailing @? The @@? How would you use them? Under what circumstances would you code a SELECT construct instead of IF statements? What statement do you code to tell SAS that it is to write to an external file? What statement do you code to write the record to the file? If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record? If you're not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set? What is the one statement to set the criteria of data that can be coded in any step? Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself. How would you include common or reuse code to be processed along with your statements? When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc? If you have a data set that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variable? Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables. How would you delete duplicate observations? How would you delete observations with duplicate keys? How would you code a merge that will keep only the observations that have matches from both sets.

How would you code a merge that will write the matches of both to one data set, the non-matches from the left-most data set to a second data set, and the non-matches of the rightmost data set to a third data set. Internals execution What is the Program Data Vector (PDV)? What are its functions? Does SAS 'Translate' (compile) or does it 'Interpret'? Explain. At compile time when a SAS data set is read, what items are created? Name statements that are recognized at compile time only? Identify statements whose placement in the DATA step is critical. Name statements that function at both compile and execution time. Name statements that are execution only. In the flow of DATA step processing, what is the first action in a typical DATA Step? What is _n_? Base SAS What is the effect of the OPTIONS statement ERRORS=1? What's the difference between VAR A1 - A4 and VAR A1 -- A4? What do the SAS log messages "numeric values have been converted to character" mean? What are the implications? Why is a STOP statement needed for the POINT= option on a SET statement? How do you control the number of observations and/or variables read or written? Approximately what date is represented by the SAS date value of 730? How would you remove a format that has been permanently associated with a variable?? What does the RUN statement do? Why is SAS considered self-documenting? What areas of SAS are you most interested in? Briefly describe 5 ways to do a "table lookup" in SAS. What versions of SAS have you used (on which platforms)? What are some good SAS programming practices for processing very large data sets? What are some problems you might encounter in processing missing values? *In Data steps? Arithmetic? Comparisons? Functions? Classifying data? How would you create a data set with 1 observation and 30 variables from a data set with 30 observations and 1 variable? What is the different between functions and PROCs that calculate the same simple descriptive statistics?

If you were told to create many records from one record, show how you would do this using arrays and with PROC TRANSPOSE? What are _numeric_ and _character_ and what do they do? How would you create multiple observations from a single observation? For what purpose would you use the RETAIN statement? What is a method for assigning first.VAR and last.VAR to the BY group variable on unsorted data? What is the order of application for output data set options, input data set options and SAS statements? What is the order of evaluation of the comparison operators: + - * / ** ( ) ?
QUESTIONS ON TESTING AND DEBUGGING

How could you generate test data with no input data? How do you debug and test your SAS programs? What can you learn from the SAS log when debugging? What is the purpose of _error_? How can you put a "trace" in your program? Are you sensitive to code walk-throughs, peer review, or QC review? Have you ever used the SAS Debugger? What other SAS features do you use for error trapping and data validation?

QUESTIONS ON MISSING VALUES

- How does SAS handle missing values in: assignment statements, functions, a merge, an update, sort order, formats, PROCs? - How many missing values are available? When might you use them? - How do you test for missing values? - How are numeric and character missing values represented internally?
SOME GENERAL NON-TECHNICAL QUESTIONS

- What has been your most common programming mistake? - What is your favorite programming language and why? - What is your favorite operating system? Why? - Do you observe any coding standards? What is your opinion of them? - What percent of your program code is usually original and what percent copied and modified? - Have you ever had to follow SOPs or programming guidelines? - Which is worse: not testing your programs or not commenting your programs? - Name several ways to achieve efficiency in your program. Explain trade-offs. - What other SAS products have you used and consider yourself proficient in using?
QUESTIONS ON FUNCTIONS

- How do you make use of functions? - When looking for contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc? - What is the significance of the 'OF' in X=SUM(OF a1-a4, a6, a9);? - What do the PUT and INPUT functions do? - Which date function advances a date, time or date/time value by a given interval? - What do the MOD and INT function do? - How might you use MOD and INT on numerics to mimic SUBSTR on character strings? - In ARRAY processing, what does the DIM function do? - How would you determine the number of missing or nonmissing values in computations? - What is the difference between: x=a+b+c+d; and x=SUM(a,b,c,d);? - There is a field containing a date. It needs to be displayed in the format "ddmonyy" if it's before 1975, "dd mon ccyy" if it's after 1985, and as 'Disco Years' if it's between 1975 and 1985. How would you accomplish this in data step code? Using only PROC FORMAT.

- In the following DATA step, what is needed for 'fraction' to print to the log? data _null_; x=1/3; if x=.3333 then put 'fraction'; run; - What is the difference between calculating the 'mean' using the mean function and PROC MEANS?
QUESTIONS ON PROCS

- Have you ever used "Proc Merge"? - If you were given several SAS data sets you were unfamiliar with, how would you find out the variable names and formats of each dataset? - What SAS PROCs have you used and consider yourself proficient in using? - How would you keep SAS from overlaying the a SAS set with its sorted version? - In PROC PRINT, can you print only variables that begin with the letter "A"? - What are some differences between PROC SUMMARY and PROC MEANS? - PROC FREQ: *Code the tables statement for a single-level (most common) frequency. *Code the tables statement to produce a multi-level frequency. *Name the option to produce a frequency line items rather that a table. *Produce output from a frequency. Restrict the printing of the table. - PROC MEANS: *Code a PROC MEANS that shows both summed and averaged output of the data. *Code the option that will allow MEANS to include missing numeric data to be included in the report. *Code the MEANS to produce output to be used later. - Do you use PROC REPORT or PROC TABULATE? Which do you prefer? Explain.

QUESTIONS :

The following SAS program is submitted: data test; set sasuser.employees; if 2 le years_service le 10 then amount = 1000; else if years_service gt 10 then amount = 2000; else amount = 0; amount_per_year = years_service / amount; run; Which one of the following values does the variable AMOUNT_PER_YEAR contain if an employee has been with the company for one year? A. 0 B. 1000 C. 2000 D. . (missing numeric value) The contents of the raw data file AMOUNT are listed below: --------10-------20-------30 $1,234 The following SAS program is submitted: data test; infile 'amount'; input @1 salary 6.; if _error_ then description = 'Problems';

else description = 'No Problems'; run; Which one of the following is the value of the DESCRIPTION variable? A. Problems B. No Problems C. ' ' (missing character value) D. The value can not be determined as the program fails to execute due to errors. The contents of the raw data file NAMENUM are listed below: --------10-------20-------30 Joe xx The following SAS program is submitted: data test; infile 'namenum'; input name $ number; run; Which one of the following is the value of the NUMBER variable? A. xx B. Joe C. . (missing numeric value) D. The value can not be determined as the program fails to execute due to errors. The contents of the raw data file AMOUNT are listed below: --------10-------20-------30 $1,234 The following SAS program is submitted: data test; infile 'amount'; input @1 salary 6.; run; Which one of the following is the value of the SALARY variable? A. 1234 B. 1,234 C. $1,234 D. . (missing numeric value) Which one of the following statements is true regarding the SAS automatic _ERROR_ variable? A. The _ERROR_ variable contains the values 'ON' or 'OFF'. B. The _ERROR_ variable contains the values 'TRUE' or 'FALSE'. C. The _ERROR_ variable is automatically stored in the resulting SAS data set. D. The _ERROR_ variable can be used in expressions or calculations in the DATA step. Which one of the following is true when SAS encounters a data error in a DATA step? A. The DATA step stops executing at the point of the error, and no SAS data set is created. B. A note is written to the SAS log explaining the error, and the DATA step continues to execute. C. A note appears in the SAS log that the incorrect data record was

saved to a separate SAS file for further examination. D. The DATA step stops executing at the point of the error, and the resulting DATA set contains observations up to that point. The following SAS program is submitted: data work.totalsales (keep = monthsales{12} ); set work.monthlysales (keep = year product sales); array monthsales {12} ; do i=1 to 12; monthsales{i} = sales; end; run; The data set named WORK.MONTHLYSALES has one observation per month for each of five years for a total of 60 observations. Which one of the following is the result of the above program? A. The program fails execution due to data errors. B. The program fails execution due to syntax errors. C. The program executes with warnings and creates the WORK.TOTALSALES data set. D. The program executes without errors or warnings and creates the WORK.TOTALSALES data set. The following SAS program is submitted: data work.totalsales; set work.monthlysales(keep = year product sales); retain monthsales {12} ; array monthsales {12} ; do i = 1 to 12; monthsales{i} = sales; end; cnt + 1; monthsales{cnt} = sales; run; The data set named WORK.MONTHLYSALES has one observation per month for each of five years for a total of 60 observations. Which one of the following is the result of the above program? A. The program fails execution due to data errors. B. The program fails execution due to syntax errors. C. The program runs with warnings and creates the WORK.TOTALSALES data set with 60 observations. D. The program runs without errors or warnings and creates the WORK.TOTALSALES data set with 60 observations. The following SAS program is submitted: data work.january; set work.allmonths (keep = product month num_sold cost); if month = 'Jan' then output work.january; sales = cost * num_sold; keep = product sales; run; Which variables does the WORK.JANUARY data set contain? A. PRODUCT and SALES only B. PRODUCT, MONTH, NUM_SOLD and COST only

C. PRODUCT, SALES, MONTH, NUM_SOLD and COST only D. An incomplete output data set is created due to syntax errors. The contents of the raw data file CALENDAR are listed below: --------10-------20-------30 01012000 The following SAS program is submitted: data test; infile 'calendar'; input @1 date mmddyy10.; if date = '01012000'd then event = 'January 1st'; run; Which one of the following is the value of the EVENT variable? A. 01012000 B. January 1st C. . (missing numeric value) D. The value can not be determined as the program fails to execute due to errors. A SAS program is submitted and the following SAS log is produced: 2 data gt100; 3 set ia.airplanes 4 if mpg gt 100 then output; 22 202 ERROR: File WORK.IF.DATA does not exist. ERROR: File WORK.MPG.DATA does not exist. ERROR: File WORK.GT.DATA does not exist. ERROR: File WORK.THEN.DATA does not exist. ERROR: File WORK.OUTPUT.DATA does not exist. ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, ;, END, KEY, KEYS, NOBS, OPEN, POINT, _DATA_, _LAST_, _NULL_. ERROR 202-322: The option or parameter is not recognized and will be ignored. 5 run; The IA libref was previously assigned in this SAS session. Which one of the following corrects the errors in the LOG? A. Delete the word THEN on the IF statement. B. Add a semicolon at the end of the SET statement. C. Place quotes around the value on the IF statement. D. Add an END statement to conclude the IF statement. The contents of the raw data file SIZE are listed below: --------10-------20-------30 72 95 The following SAS program is submitted: data test; infile 'size'; input @1 height 2. @4 weight 2; run; Which one of the following is the value of the variable WEIGHT in the output data set? A. 2

B. 72 C. 95 D. . (missing numeric value) A SAS PRINT procedure output of the WORK.LEVELS data set is listed below: Obs name level 1 Frank 1 2 Joan 2 3 Sui 2 4 Jose 3 5 Burt 4 6 Kelly . 7 Juan 1 The following SAS program is submitted: data work.expertise; set work.levels; if level = . then expertise = 'Unknown'; else if level = 1 then expertise = 'Low'; else if level = 2 or 3 then expertise = 'Medium'; else expertise = 'High'; run; Which of the following values does the variable EXPERTISE contain? A. Low, Medium, and High only B. Low, Medium, and Unknown only C. Low, Medium, High, and Unknown only D. Low, Medium, High, Unknown, and ' ' (missing character value) The contents of the raw data file EMPLOYEE are listed below: --------10-------20-------30 Ruth 39 11 Jose 32 22 Sue 30 33 John 40 44 The following SAS program is submitted: data test; infile 'employee'; input employee_name $ 1-4; if employee_name = 'Ruth' then input idnum 10-11; else input age 7-8; run; Which one of the following values does the variable IDNUM contain when the name of the employee is "Ruth"? A. 11 B. 22 C. 32 D. . (missing numeric value)

The contents of the raw data file EMPLOYEE are listed below: --------10-------20-------30 Ruth 39 11 Jose 32 22 Sue 30 33 John 40 44 The following SAS program is submitted: data test; infile 'employee'; input employee_name $ 1-4; if employee_name = 'Sue' then input age 7-8; else input idnum 10-11; run; Which one of the following values does the variable AGE contain when the name of the employee is "Sue"? A. 30 B. 33 C. 40 D. . (missing numeric value) The following SAS program is submitted: libname sasdata 'SAS-data-library'; data test; set sasdata.chemists; if jobcode = 'Chem2' then description = 'Senior Chemist'; else description = 'Unknown'; run; A value for the variable JOBCODE is listed below: JOBCODE chem2 Which one of the following values does the variable DESCRIPTION contain? A. Chem2 B. Unknown C. Senior Chemist D. ' ' (missing character value) The following SAS program is submitted: libname sasdata 'SAS-data-library'; data test; set sasdata.chemists; if jobcode = 'chem3' then description = 'Senior Chemist'; else description = 'Unknown'; run; A value for the variable JOBCODE is listed below: JOBCODE CHEM3 Which one of the following values does the variable DESCRIPTION contain? A. chem3

B. Unknown C. Senior Chemist D. ' ' (missing character value) Which one of the following ODS statement options terminates output being written to an HTML file? A. END B. QUIT C. STOP D. CLOSE The following SAS program is submitted: proc means data = sasuser.shoes; where product in ('Sandal' , 'Slipper' , 'Boot'); run; Which one of the following ODS statements completes the program and sends the report to an HTML file? A. ods html = 'sales.html'; B. ods file = 'sales.html'; C. ods file html = 'sales.html'; D. ods html file = 'sales.html'; The following SAS program is submitted: proc format; value score 1 - 50 = 'Fail' 51 - 100 = 'Pass'; run; proc report data = work.courses nowd; column exam; define exam / display format = score.; run; The variable EXAM has a value of 50.5. How will the EXAM variable value be displayed in the REPORT procedure output? A. Fail B. Pass C. 50.5 D. . (missing numeric value) The following SAS program is submitted: options pageno = 1; proc print data = sasuser.houses; run; proc means data = sasuser.shoes; run; The report created by the PRINT procedure step generates 5 pages of output. What is the page number on the first page of the report generated by the MEANS procedure step? A. 1 B. 2 C. 5 D. 6

Which one of the following SAS system options displays the time on a report? A. TIME B. DATE C. TODAY D. DATETIME Which one of the following SAS system options prevents the page number from appearing on a report? A. NONUM B. NOPAGE C. NONUMBER D. NOPAGENUM The following SAS program is submitted: footnote1 'Sales Report for Last Month'; footnote2 'Selected Products Only'; footnote3 'All Regions'; footnote4 'All Figures in Thousands of Dollars'; proc print data = sasuser.shoes; footnote2 'All Products'; run; Which one of the following contains the footnote text that is displayed in the report? A. All Products B. Sales Report for Last Month All Products C. All Products All Regions All Figures in Thousands of Dollars D. Sales Report for Last Month All Products All Regions All Figures in Thousands of Dollars The following SAS program is submitted: proc means data = sasuser.houses std mean max; var sqfeet; run; Which one of the following is needed to display the standard deviation with only two decimal places? A. Add the option MAXDEC = 2 to the MEANS procedure statement. B. Add the statement MAXDEC = 7.2; in the MEANS procedure step. C. Add the statement FORMAT STD 7.2; in the MEANS procedure step. D. Add the option FORMAT = 7.2 option to the MEANS procedure statement. Unless specified, which variables and data values are used to calculate statistics in the MEANS procedure? A. non-missing numeric variable values only B. missing numeric variable values and non-missing numeric variable values only C. non-missing character variables and non-missing numeric variable values only

D. missing character variables, non-missing character variables, missing numeric variable values, and non-missing numeric variable values The following SAS program is submitted: proc sort data = sasuser.houses out = houses; by style; run; proc print data = houses; run; Click on the Exhibit button to view the report produced. style bedrooms baths price CONDO 2 1.5 80050 3 2.5 79350 4 2.5 127150 2 2.0 110700 RANCH 2 1.0 64000 3 3.0 86650 3 1.0 89100 1 1.0 34550 SPLIT 1 1.0 65850 4 3.0 94450 3 1.5 73650 TWOSTORY 4 3.0 107250 2 1.0 55850 2 1.0 69250 4 2.5 102950 Which of the following SAS statement(s) create(s) the report? A. id style; B. id style; var style bedrooms baths price; C. id style; by style; var bedrooms baths price; D. id style; by style; var style bedrooms baths price; A realtor has two customers. One customer wants to view a list of homes selling for less than $60,000. The other customer wants to view a list of homes selling for greater than $100,000. Assuming the PRICE variable is numeric, which one of the following PRINT procedure steps will select all desired observations? A. proc print data = sasuser.houses; where price lt 60000; where price gt 100000; run; B. proc print data = sasuser.houses; where price lt 60000 or price gt 100000; run; C. proc print data = sasuser.houses;

where price lt 60000 and price gt 100000; run; D. proc print data = sasuser.houses; where price lt 60000 or where price gt 100000; run; The value 110700 is stored in a numeric variable. Which one of the following SAS formats is used to display the value as $110,700.00 in a report? A. comma8.2 B. comma11.2 C. dollar8.2 D. dollar11.2 The SAS data set SASUSER.HOUSES contains a variable PRICE which has been assigned a permanent label of "Asking Price". Which one of the following SAS programs temporarily replaces the label "Asking Price" with the label "Sale Price" in the output? A. proc print data = sasuser.houses; label price = "Sale Price"; run; B. proc print data = sasuser.houses label; label price "Sale Price"; run; C. proc print data = sasuser.houses label; label price = "Sale Price"; run; D. proc print data = sasuser.houses label = "Sale Price"; run; The SAS data set BANKS is listed below: BANKS name rate FirstCapital 0.0718 DirectBank 0.0721 VirtualDirect 0.0728 The following SAS program is submitted: data newbank; do year = 1 to 3; set banks; capital + 5000; end; run; Which one of the following represents how many observations and variables will exist in the SAS data set NEWBANK? A. 0 observations and 0 variables B. 1 observations and 4 variables C. 3 observations and 3 variables D. 9 observations and 2 variables The following SAS program is submitted: data work.clients; calls = 6; do while (calls le 6);

calls + 1; end; run; Which one of the following is the value of the variable CALLS in the output data set? A. 4 B. 5 C. 6 D. 7 The following SAS program is submitted: data work.pieces; do while (n lt 6); n + 1; end; run; Which one of the following is the value of the variable N in the output data set? A. 4 B. 5 C. 6 D. 7 The following SAS program is submitted: data work.sales; do year = 1 to 5; do month = 1 to 12; x + 1; end; end; run; Which one of the following represents how many observations are written to the WORK.SALES data set? A. 0 B. 1 C. 5 D. 60 A raw data record is listed below: --------10-------20-------30 1999/10/25 The following SAS program is submitted: data projectduration; infile 'file-specification'; input date $ 1 - 10; run; Which one of the following statements completes the program above and computes the duration of the project in days as of today's date? A. duration = today( ) - put(date,ddmmyy10.); B. duration = today( ) - put(date,yymmdd10.);

C. duration = today( ) - input(date,ddmmyy10.); D. duration = today( ) - input(date,yymmdd10.); A raw data record is listed below: --------10-------20-------30 Printing 750 The following SAS program is submitted: data bonus; infile 'file-specification'; input dept $ 1 - 11 number 13 - 15; run; Which one of the following SAS statements completes the program and results in a value of 'Printing750' for the DEPARTMENT variable? A. department = trim(dept) number; B. department = dept input(number,3.); C. department = trim(dept) || put(number,3.); D. department = input(dept,11.) || input(number,3.); The following SAS program is submitted: data work.month; date = put('13mar2000'd,ddmmyy10.); run; Which one of the following represents the type and length of the variable DATE in the output data set? A. numeric, 8 bytes B. numeric, 10 bytes C. character, 8 bytes D. character, 10 bytes The following SAS program is submitted: data work.products; Product_Number = 5461; Item = '1001'; Item_Reference = Item'/'Product_Number; run; Which one of the following is the value of the variable ITEM_REFERENCE in the output data set? A. 1001/5461 B. 1001/ 5461 C. . (missing numeric value) D. The value can not be determined as the program fails to execute due to errors. The following SAS program is submitted: data work.retail; cost = '20000'; total = .10 * cost; run; Which one of the following is the value of the variable TOTAL in the output data set? A. 2000 B. '2000'

C. . (missing numeric value) D. ' ' (missing character value) Which one of the following SAS statements correctly computes the average of four numerical values? A. average = mean(num1 - num4); B. average = mean(of num1 - num4); C. average = mean(of num1 to num4); D. average = mean(num1 num2 num3 num4); The following SAS program is submitted: data work.test; Author = 'Agatha Christie'; First = substr(scan(author,1,' ,'),1,1); run; Which one of the following is the length of the variable FIRST in the output data set? A. 1 B. 6 C. 15 D. 200 The following SAS program is submitted: data work.test; Author = 'Christie, Agatha'; First = substr(scan(author,2,' ,'),1,1); run; Which one of the following is the value of the variable FIRST in the output data set? A. A B. C C. Agatha D. ' ' (missing character value) The following SAS program is submitted: data work.test; Title = 'A Tale of Two Cities, Charles J. Dickens'; Word = scan(title,3,' ,'); run; Which one of the following is the value of the variable WORD in the output data set? A. T B. of C. Dickens D. ' ' (missing character value) The following SAS program is submitted: data work.test; First = 'Ipswich, England'; City_Country = substr(First,1,7)!!', '!!'England'; run; Which one of the following is the length of the variable CITY_COUNTRY in the output data set? A. 6 B. 7

C. 17 D. 25 The following SAS program is submitted: data work.test; First = 'Ipswich, England'; City = substr(First,1,7); City_Country = City!!', '!!'England'; run; Which one of the following is the value of the variable CITY_COUNTRY in the output data set? A. Ipswich!! B. Ipswich, England C. Ipswich, 'England' D. Ipswich , England Which one of the following is true of the RETAIN statement in a SAS DATA step program? A. It can be used to assign an initial value to _N_ . B. It is only valid in conjunction with a SUM function. C. It has no effect on variables read with the SET, MERGE and UPDATE statements. D. It adds the value of an expression to an accumulator variable and ignores missing values. A raw data file is listed below: --------10-------20-------30 1901 2 1905 1 1910 6 1925 . 1941 1 The following SAS program is submitted and references the raw data file above: data coins; infile 'file-specification'; input year quantity; run; Which one of the following completes the program and produces a non-missing value for the variable TOTQUANTITY in the last observation of the output data set? A. totquantity + quantity; B. totquantity = sum(totquantity + quantity); C. totquantity 0; sum totquantity; D. retain totquantity 0; totquantity = totquantity + quantity; A raw data file is listed below: --------10-------20-------30 squash 1.10 apples 2.25 juice 1.69

The following SAS program is submitted using the raw data file above: data groceries; infile 'file-specification'; input item $ cost; run; Which one of the following completes the program and produces a grand total for all COST values? A. grandtot = sum cost; B. grandtot = sum(grandtot,cost); C. retain grandtot 0; grandtot = sum(grandtot,cost); D. grandtot = sum(grandtot,cost); output grandtot; The following SAS program is submitted: data work.total; set work.salary(keep = department wagerate); by department; if first.department then payroll = 0; payroll + wagerate; if last.department; run; The SAS data set WORK.SALARY, currently ordered by DEPARTMENT, contains 100 observations for each of 5 departments. Which one of the following represents how many observations the WORK.TOTAL data set contains? A. 5 B. 20 C. 100 D. 500 The following SAS program is submitted: data work.total; set work.salary(keep = department wagerate); by department; if first.department then payroll = 0; payroll + wagerate; if last.department; run; The SAS data set named WORK.SALARY contains 10 observations for each department, currently ordered by DEPARTMENT. Which one of the following is true regarding the program above? A. The BY statement in the DATA step causes a syntax error. B. FIRST.DEPARTMENT and LAST.DEPARTMENT are variables in the WORK.TOTAL data set. C. The values of the variable PAYROLL represent the total for each department in the WORK.SALARY data set. D. The values of the variable PAYROLL represent a total for all values of WAGERATE in the WORK.SALARY data set. ANSWERS :

1: 2: 3: 4: 5:

d a c d d

11: 12: 13: 14: 15:

b a b d d

21: 22: 23: 24: 25:

d b c b a

31: 32: 33: 34: 35:

b d c b d

6: b 7: b 8: b 9: d 10: d

16: b 17: b 18: d 19: d 20: c

26: a 27: c 28: b 29: d 30: c

36: c 37: d 38: d 39: a 40: b

41: d 42: a 43: b 44: d 45: d 46: c or d 47: a 48: c 49: a 50: d or c

What SAS statements would you code to read an external raw data file to a DATA step? INFILE statement. How do you read in the variables that you need? Using Input statement with the column pointers like @5/12-17 etc. Are you familiar with special input delimiters? How are they used? DLM and DSD are the delimiters that Ive used. They should be included in the infile statement. Comma separated values files or CSV files are a common type of file that can be used to read with the DSD option. DSD option treats two delimiters in a row as MISSING value. DSD also ignores the delimiters enclosed in quotation marks. If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn't have a value? By using the option MISSOVER in the infile statement.If the input of some data lines are shorter than others then we use TRUNCOVER option in the infile statement. What is the difference between an informat and a format? Name three informats or formats. Informats read the data. Format is to write the data. Informats: comma. dollar. date. Formats can be same as informatsInformats: MMDDYYw. DATEw. TIMEw. , PERCENTw,Formats: WORDIATE18., weekdatew. Name and describe three SAS functions that you have used, if any? LENGTH: returns the length of an argument not counting the trailing blanks.(missing values have a length of

1)Ex: a=my cat; x=LENGTH(a); Result: x=6 SUBSTR: SUBSTR(arg,position,n) extracts a substring from an argument starting at position for n characters or until end if no n. Ex: data dsn; A=(916)734-6241; X=SUBSTR(a,2,3); RESULT: x=916 ; run; TRIM: removes trailing blanks from character expression. Ex: a=my ; b=cat;X= TRIM(a)(b); RESULT: x=mycat. SUM: sum of non missing values.Ex: x=Sum(3,5,1); result: x=9.0 INT: Returns the integer portion of the argument. How would you code the criteria to restrict the output to be produced? Use NOPRINT option. What is the purpose of the trailing @ and the @@? How would you use them? @ holds the value past the data step.@@ holds the value till a input statement or end of the line. Double trailing @@: When you have multiple observations per line of raw data, we should use double trailing signs (@@) at the end of the INPUT statement. The line hold specifies like a stop sign telling SAS, stop, hold that line of raw data. ex: data dsn; input sex $ days; cards; F 53 F 56 F 60 F 60 F 78 F 87 F 102 F 117 F 134

F 160 F 277 M 46 M 52 M 58 M 59 M 77 M 78 M 80 M 81 M 84 M 103 M 114 M 115 M 133 M 134 M 175 M 175 ; run; The above program can be changed to make the program shorter using @@ .... data dsn; input sex $ days @@; cards; F 53 F 56 F 60 F 60 F 78 F 87 F 102 F 117 F 134 F 160 F 277M 46 M 52 M 58 M 59 M 77 M 78 M 80 M 81 M 84 M 103 M 114M 115 M 133 M 134 M 175 M 175 ; run; Trailing @: By using @ without specifying a column, it is as if you are telling SAS, stay tuned for more information. Dont touch that dial. SAS will hold the line of data until it reaches either the end of the data step or an INPUT statement that does not end with the trailing. Under what circumstances would you code a SELECT construct instead of IF statements? When you have a long series of mutually exclusive conditions and the comparison is numeric, using a SELECT group is slightly more efficient than using IF-THEN or IF-THEN-ELSE statements because CPU time is reduced. SELECT GROUP: Select: begins with select group.When: identifies SAS statements that are executed when a particular condition is true. Otherwise (optional): specifies a statement to be executed if no WHEN condition is met.

End: ends a SELECT group. What statement you code to tell SAS that it is to write to an external file? .What statement do you code to write the record to the file? PUT and FILE statements. If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record? If you're not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set? Data _Null_ What is the one statement to set the criteria of data that can be coded in any step? Options statement: This a part of SAS program and effects all steps that follow it. Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself . How would you include common or reuse code to be processed along with your statements? By using SAS Macros. When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc? SCAN. If you have a data set that contains 100 variables, but you need only five of those, .what is the code to force SAS to use only those variable? Using KEEP option or statement. Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables. Proc sort data=one; BY State District County ; Run ; How would you delete duplicate observations? NONUPLICATES

How would you delete observations with duplicate keys? NODUPKEY How would you code a merge that will keep only the observations that have matches from both sets. Check the condition by using If statement in the Merge statement while merging datasets. How would you code a merge that will write the matches of both to one data set, the non-matches from the left-most data. Step1: Define 3 datasets in DATA step Step2: Assign values of IN statement to different variables for 2 datasets Step3: Check for the condition using IF statement and output the matching to first dataset and no matches to different datasets Ex: data xxx; merge yyy(in = inxxx) zzz (in = inzzz); by aaa; if inxxx = 1 and inyyy = 1; run; What is the Program Data Vector (PDV)? What are its functions? Function: To store the current obs;PDV (Program Data Vector) is a logical area in memory where SAS creates a dataset one observation at a time. When SAS processes a data step it has two phases. Compilation phase and execution phase. During the compilation phase the input buffer is created to hold a record from external file. After input buffer is created the PDV is created. The PDV is the area of memory where SAS builds dataset, one observation at a time. The PDV contains two automatic variables _N_ and _ERROR_. The Logical Program Data Vector (PDV) is a set of buffers that includes all variables referenced either explicitly or implicitly in the DATA step. It is created at compile time, then used at execution time as the location where the working values of variables are stored as they are processed by the DATA step program(source:http://www2.sas.com/proceedings/sugi24/Posters/p235 -24.pdf). Does SAS 'Translate' (compile) or does it 'Interpret'? Explain. SAS compiles the code At compile time when a SAS data set is read, what items are created?Automatic variables are created. Input Buffer, PDV and Descriptor Information Name statements that are recognized at compile time only?

PUT Name statements that are execution only. INFILE, INPUT .Identify statements whose placement in the DATA step is critical. DATA, INPUT, RUN. Name statements that function at both compile and execution time. INPUT In the flow of DATA step processing, what is the first action in a typical DATA Step? The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. What is _n_? It is a Data counter variable in SAS. Note: Both -N- and _ERROR_ variables are always available to you in the data step .N- indicates the number of times SAS has looped through the data step.This is not necessarily equal to the observation number, since a simple sub setting IF statement can change the relationship between Observation number and the number of iterations of the data step.The ERROR- variable ha a value of 1 if there is a error in the data for that observation and 0 if it is not. Ex: This is nothing but a implicit variable created by SAS during data processing. It gives the total number of records SAS has iterated in a dataset. It is Available only for data step and not for PROCS. Eg. If we want to find every third record in a Dataset thenwe can use the _n_ as follows Data new-sas-data-set; Set old; if mod(_n_,3)= 1 then; run; Note: If we use a where clause to subset the _n_ will not yield the required result. How do i convert a numeric variable to a character variable? You must create a differently-named variable using the PUT function. How do i convert a character variable to a numeric variable? You must create a differently-named variable using the INPUT function. How can I compute the age of something?

Given two sas date variables born and calc: age = int(intck('month',born,calc) / 12); if month(born) = month(calc) then age = age - (day(born) > day(calc)); How can I compute the number of months between two dates? Given two sas date variables begin and end: months = intck('month',begin,end) - (day(end) <> How can I determine the position of the nth word within a character string? Use a combination of the INDEXW and SCAN functions:pos = indexw(string,scan(string,n)); I need to reorder characters within a string...use SUBSTR? You can do this using only one function call with TRANSLATE versus two functions calls with SUBSTR. The following lines each move the first character of a 4-character string to the last: reorder = translate('2341',string,'1234'); reorder = substr(string,2,3) substr(string,1,1); How can I put my sas date variable so that December 25, 1995 would appear as '19951225'? (with no separator) use a combination of the YEAR. and MMDDYY. formats to simply display the value: put sasdate year4. sasdate mmddyy4.; or use a combination of the PUT and COMPRESS functions to store the value: newvar = compress(put(sasdate,yymmdd10.),'/'); How can I put my sas time variable with a leading zero for hours 1-9? Use a combination of the Z. and MMSS. formats: hrprint = hour(sastime); put hrprint z2. ':' sastime mmss5.; INFILE OPTIONS Prepared by Sreeja E V(sreeja@kreara.com) source: kreara.blogspot.com. Infile has a number of options available. FLOWOVER FLOWOVER is the default option on INFILE statement. Here, when the INPUT statement reaches the end of non-blank characters without having filled all variables, a new line is read into the Input Buffer and INPUT attempts to fill the rest of the variables starting from column

one. The next time an INPUT statement is executed, a new line is brought into the Input Buffer. Consider the following text file containing three variables id, type and amount. 11101 A 11102 A 100 11103 B 43 11104 C 11105 C 67 The following SAS code uses the flowover option which reads the next non missing values for missing variables. data B; infile "External file" flowover; input id $ type $ amount; run; which creates the following dataset MISSOVERWhen INPUT reads a short line, MISSOVER option on INFILE statement does not allow it to move to the next line. MISSOVER option sets all the variables without values to missing. data B; infile "External file" missover; input id $ type $ amount; run; which creates the following dataset TRUNCOVER Causes the INPUT statement to read variable-length records where some records are shorter than the INPUT statement expects. Variables which are not assigned values are set to missing. Difference between TRUNCOVER and MISSOVER Both will assign missing values to variables if the data line ends before the variables field starts. But when the data line ends in the middle of a variable field, TRUNCOVER will take as much as is there, whereas MISSOVER will assign the variable a missing value. Consider the text file below containing a character variable chr. a bb ccc dddd eeeee ffffff Consider the following SAS code data trun; infile "External file" truncover;

input chr $3. ; run; When using truncover option we get the following dataset data miss; infile "External file" missover; input chr $3. ; run; While using missover option we get the output

What SAS statements would you code to read an external raw data file to a DATA step? We use SAS statements FILENAME to specify the location of the file INFILE Identifies an external file to read with an INPUT statement INPUT to specify the variables that the data is identified with. How do you read in the variables that you need? Using Input statement with column /line pointers, informats and length specifiers. Are you familiar with special input delimiters? How are they used? DLM, DSD are the special input delimiters DELIMITER= delimiter(s) specifies an alternate delimiter (other than a blank) to be used for LIST input DSD (delimiter-sensitive data) specifies that when data values are enclosed in quotation marks, delimiters within the value be treated as character data. The DSD option changes how SAS treats delimiters when you use LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932 .htm#a000177189 If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didnt have a value? Options MISSOVER and TRUNCOVER options.. MISSOVER prevents an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. When an INPUT statement reaches the end of the current input data record, variables without any values assigned are set to missing. TRUNCOVER overrides the default behavior of the INPUT statement when an input data record is shorter than the INPUT statement expects. By default,

the INPUT statement automatically reads the next input data record. TRUNCOVER enables you to read variable-length records when some records are shorter than the INPUT statement expects. Variables without any values assigned are set to missing. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932 .htm#a000177189 What is the difference between an informat and a format? Name three informats or formats. INFORMAT Statement Associates informats with variables Its basically used in an input / SQL create table statements to read external file raw data or data that is not in a SAS format. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000178244 .htm eg: commaw. datew. Wordatew. dollarw. $varyinglengthw. FORMAT Statement Associates formats with variables Its basically used in a datastep format / SQL select / Procedure format statements to output SAS data to a file/report etc Formats can look-like informats but are differentiated as to which statement they are used in eg. Datew., Worddatew., mmddyyw. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000178212 .htm Name and describe three SAS functions that you have used, if any? The most common functions that would be used areConversion functions Input / Put / int / ceil / floor Character functions Scan / substr / index / Left / trim / compress / cat / catx / upcase,lowcase Arithmetic functions Sum / abs / Attribute info functions Attrn / length Dataset open / close / exist Directory dexist / dopen / dclose / dcreate / dinfo File functions fexist / fopen/ filename / fileref SQL functions coalesce / count / sum/ mean Date functions date / today / datdif / datepart / datetime / intck / mdy Array functions dim http://sastechies.com/SASfunctions.php title=http://sastechies.com/ SASfunctions.php How would you code the criteria to restrict the output to be produced? In view of in-sufficient clarity as to what the interviewer refers to Global statement options obs=; Dataset options obs= Proc SQL NOPRINT option for reporting / inobs= , outobs= for SQL select Proc datasets NOLIST option What is the purpose of the trailing @ and the @@? How would you use them? Line-hold specifiers keep the pointer on the current input record when a data record is read by more than one INPUT statement (trailing @)

one input line has values for more than one observation (double trailing @) a record needs to be reread on the next iteration of the DATA step (double trailing @). Use a single trailing @ to allow the next INPUT statement to read from the same record. Use a double trailing @ to hold a record for the next INPUT statement across iterations of the DATA step. Normally, each INPUT statement in a DATA step reads a new data record into the input buffer. When you use a trailing @, the following occurs: The pointer position does not change. No new record is read into the input buffer. The next INPUT statement for the same iteration of the DATA step continues to read the same record rather than a new one. SAS releases a record held by a trailing @ when a null INPUT statement executes: input; an INPUT statement without a trailing @ executes the next iteration of the DATA step begins. Normally, when you use a double trailing @ (@@), the INPUT statement for the next iteration of the DATA step continues to read the same record. SAS releases the record that is held by a double trailing @ immediately if the pointer moves past the end of the input record immediately if a null INPUT statement executes: input; when the next iteration of the DATA step begins if an INPUT statement with a single trailing @ executes later in the DATA step: input @; A record held by the double trailing at sign (@@) is not released until >-+-10V+the input 10 9 7 point 2 2 8 er 84 23 36 75 move s past the end of the recor d. Then the input point er move s

10 3

down to the next recor d. input ID $4. @@; an . INPUT . state input Department 5.; ment witho ut a linehold specif ier execu tes. enables the next INPUT statement to read from the same record releases the current record when a subsequent INPUT statement executes without a line-hold specifier. Unlike the @@, the single @ also releases a record when control returns to the top of the DATA step for the next iteration. data perm.sales97; infile data97 missover; input ID $4. @; do Quarter=1 to 4; input Sales : comma. @; output; end; run; Raw Data File Data97 >-V-10+-20+-30+-40 073 1,323. 2,472. 3,276. 4 34 85 65 0943 1,908.34 2,560.38 1009 2,934.12 3,308.41 4,176.18 7,581.81 data perm.people (drop=type); infile census; retain Address; input type $1. @; if type='H' then input @3 Address $15.; if type='P'; input @3 Name $10. @13 Age 3. @15 Gender $1.; run;

5,34 52

>V+-10+H 321 S. MAIN ST P MARY E 21 F P WILLIAM M 23 P M SUSAN K 3 F data perm.residnts; infile census; retain Address; input type $1. @; if type='H' then do; if _n_ > 1 then output; Total=0; input Address $ 3-17; end; else if type='P' then total+1; >-+-10+-20 H 321 S. MAIN P ST P MARY E 21 P F H WILLIAM M 23 P M P SUSAN K 3 P F P 324 S. MAIN P ST H THOMAS H P 79 M P WALTER S 46 H M P ALICE A 42 F P MARYANN A 20 F JOHN S 16 M 325A S. MAIN ST JAMES L 34 M LIZA A 31 F 325B S. MAIN ST MARGO K 27 F WILLIAM R 27 M P ROBERT W 1 M

Under what circumstances would you code a SELECT construct instead of IF statements? The SELECT statement begins a SELECT group. SELECT groups contain WHEN statements that identify SAS statements that are executed when a particular condition is true. Use at least one WHEN statement in a SELECT group. An optional OTHERWISE statement specifies a statement to be executed if no WHEN condition is met. An END statement ends a SELECT group.

Null statements that are used in WHEN statements cause SAS to recognize a condition as true without taking further action. Null statements that are used in OTHERWISE statements prevent SAS from issuing an error message when all WHEN conditions are false. Using Select-When improves processing efficiency and understandability in programs that needed to check a series of conditions for the same variable. Use IF-THEN/ELSE statements for programs with few statements. Using a subsetting IF statement without a THEN clause could be dangerous because it would process only those records that meet the condition specified in the IF clause. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201966 .htm What statement you code to tell SAS that it is to write to an external file? FILENAME / FILE/ PUT The FILENAME statement is an optional statement that species the location of the external file. PUT Statement Writes the variable values to the external file. The FILE statement specifies the current output file for PUT statements in the DATA step. When multiple FILE statements are present, the PUT statement builds and writes output lines to the file that was specified in the most recent FILE statement. If no FILE statement was specified, the PUT statement writes to the SAS log. The specified output file must be an external file, not a SAS data library, and it must be a valid access type. If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record? Use the _infile_ option in the put statement filename some 'c:\cool.dat'; filename cool1 'c:\cool1.dat'; data _null_; infile some; input some; file cool1; put _infile_; run; Q. Which SAS statement below will change the characteristics of a variable if it was used in a data step? A. SCAN B. ATTRIB C. FORMAT D. PUT E. ARRAY (2)The following SAS program is submitted: libname sasdata SAS-data-library; data test; set sasdata.chemists;

if jobcode = chem3 then description = Senior Chemist; else description = Unknown; run; A value for the variable JOBCODE is listed below: JOBCODE CHEM3 Which one of the following values does the variable DESCRIPTION contain? A. chem3 B. Senior Chemist C. Unknown D. (missing character value) Predictive Modeling Certification Question: (1)Which SAS Enterprise Miner tool would you use to exclude certain observations in your data source, such as extreme outliers, from your analysis? a. The Input Data tool b. The Filter tool c. The Data Partition tool d. The Explore window (2)Which SAS Enterprise Miner tool can be used to automatically explore alternative network architectures and hidden unit counts? a. AutoNeural b. DMNeural c. Neural Network d. Rule Induction Q.(3)Which of the following statements about assessing model performance using the Model Comparison tool is true? a. Unless a profit matrix is defined, the Model Comparison tool selects the model with the smallest validation misclassification rate by default. b. The Model Comparison tool calculates values for up to three statistics at a time. c. For all fit statistics that the Model Comparison tool generates, the highest value indicates the best fit. d. The Model Comparison tool appears on the Explore tab. We put a question of base sas ,advance sas ,predictive Modeling or interview question , every day in this section and to know the answer please send a email to qa@iisastr.com with subject line question no . also you may send any question to get answer. Interview Question: Q1.When reading data ,if you do not specify the length of a varibles .what is the default length? Q2. can the trailing @ control be used in the LIST,COLUMN, or Formattedt Input statement? Q3.In which method ,Merge with By statement or SQL procedure,will you get a warning message when combining common variables from data sets unless you select variables from individual data sets?

Q4. Mr Raj sas programmer has written the following code data iisastr; if age>25 then drop sex; run; Do you think above program will run without any error or not. please answer with reason? Q5. Can a where statement be applied to DATA steps with an INPUT statement? Q6.Ms Lily has written the following code . data iisastr; set iisastr_delhi; run; Do you think that , during the execution phase of the above program ,it will create a input buffer to store the values of the observation? How might you use MOD and INT on numeric to mimic SUBSTR on character Strings? A) The first argument to the MOD function is a numeric, the second is a non-zero numeric; the result is the remainder when the integer quotient of argument-1 is divided by argument-2. The INT function takes only one argument and returns the integer portion of an argument, truncating the decimal portion. Note that the argument can be an expression. DATA NEW ; A = 123456 ; X = INT( A/1000 ) ; Y = MOD( A, 1000 ) ; Z = MOD( INT( A/100 ), 100 ) ; PUT A= X= Y= Z= ; RUN ; Result: A=123456 X=123 Y=456 Z=34 In ARRAY processing, what does the DIM function do? A) DIM: It is used to return the number of elements in the array. When we use Dim function we would have to re specify the stop value of an iterative DO statement if u change the dimension of the array. How would you determine the number of missing or nonmissing values in computations?

A) To determine the number of missing values that are excluded in a computation, use the NMISS function. data _null_; m=.; y=4; z=0; N = N(m , y, z); NMISS = NMISS (m , y, z); run; The above program results in N = 2 (Number of non missing values) and NMISS = 1 (number of missing values). Do you need to know if there are any missing values? A) Just use: missing_values=MISSING(field1,field2,field3); This function simply returns 0 if there aren't any or 1 if there are missing values.If you need to know how many missing values you have then use num_missing=NMISS(field1,field2,field3); You can also find the number of non-missing values with non_missing=N (field1,field2,field3); What is the difference between: x=a+b+c+d; and x=SUM (of a, b, c ,d);? A) Is anyone wondering why you wouldnt just use total=field1+field2+field3; First, how do you want missing values handled? The SUM function returns the sum of non-missing values. If you choose addition, you will get a missing value for the result if any of the fields are missing. Which one is appropriate depends upon your needs.However, there is an advantage to use the SUM function even if you want the results to be missing. If you have more than a couple fields, you can often use shortcuts in writing the field names If your fields are not numbered sequentially but are stored in the program data vector together then you can use: total=SUM(of fielda--zfield); Just make sure you remember the of and the double dashes or your code will run but you wont get your intended results. Mean is another function where the function will calculate differently than the writing out the formula if you have missing values.There is a field containing a date. It needs to be displayed in the format "ddmonyy" if it's before 1975, "dd mon ccyy" if it's after 1985, and as 'Disco Years' if it's between 1975 and 1985. How would you accomplish this in data step code? Using only PROC FORMAT. data new ; input date ddmmyy10.;

cards; 01/05/1955 01/09/1970 01/12/1975 19/10/1979 25/10/1982 10/10/1988 27/12/1991 ; run; proc format ; value dat low-'01jan1975'd=ddmmyy10.'01jan1975'd-'01JAN1985'd="Disco Years"' 01JAN1985'd-high=date9.; run; proc print; format date dat. ; run; In the following DATA step, what is needed for 'fraction' to print to the log? data _null_; x=1/3; if x=.3333 then put 'fraction'; run; What is the difference between calculating the 'mean' using the mean function and PROC MEANS? A) By default Proc Means calculate the summary statistics like N, Mean, Std deviation, Minimum and maximum, Where as Mean function compute only the mean values. What are some differences between PROC SUMMARY and PROC MEANS? Proc means by default give you the output in the output window and you can stop this by the option NOPRINT and can take the output in the separate file by the statement OUTPUTOUT= , But, proc summary doesn't give the default output, we have to explicitly give the output statement and then print the data by giving PRINT option to see the result. What is a problem with merging two data sets that have variables with the same name but different data? A) Understanding the basic algorithm of MERGE will help you understand how the stepProcesses. There are still a few common scenarios whose results sometimes catch users off guard. Here are a few of the most frequent 'gotchas':

1- BY variables has different lengthsIt is possible to perform a MERGE when the lengths of the BY variables are different,But if the data set with the shorter version is listed first on the MERGE statement, theShorter length will be used for the length of the BY variable during the merge. Due to this shorter length, truncation occurs and unintended combinations could result.In Version 8, a warning is issued to point out this data integrity risk. The warning will be issued regardless of which data set is listed first:WARNING: Multiple lengths were specified for the BY variable name by input data sets.This may cause unexpected results. Truncation can be avoided by naming the data set with the longest length for the BY variable first on the MERGE statement, but the warning message is still issued. To prevent the warning, ensure the BY variables have the same length prior to combining them in the MERGE step with PROC CONTENTS. You can change the variable length with either a LENGTH statement in the merge DATA step prior to the MERGE statement, or by recreating the data sets to have identical lengths for the BY variables.Note: When doing MERGE we should not have MERGE and IF-THEN statement in one data step if the IF-THEN statement involves two variables that come from two different merging data sets. If it is not completely clear when MERGE and IF-THEN can be used in one data step and when it should not be, then it is best to simply always separate them in different data step. By following the above recommendation, it will ensure an error-free merge result. Which data set is the controlling data set in the MERGE statement? A) Dataset having the less number of observations control the data set in the merge statement. How do the IN= variables improve the capability of a MERGE? A) The IN=variablesWhat if you want to keep in the output data set of a merge only the matches (only those observations to which both input data sets contribute)? SAS will set up for you special temporary variables, called the "IN=" variables, so that you can do this and more. Here's what you have to do: signal to SAS on the MERGE statement that you need the IN= variables for the input data set(s) use the IN= variables in the data step appropriately, So to keep only the matches in the match-merge above, ask for the IN= variables and use them:data three;merge one(in=x) two(in=y); /* x & y are your choices of names */by id; /* for the IN= variables for data */if x=1 and y=1; /* sets one and two respectively */run; What techniques and/or PROCs do you use for tables? A) Proc Freq, Proc univariate, Proc Tabulate & Proc Report. Do you prefer PROC REPORT or PROC TABULATE? Why? A) I prefer to use Proc report until I have to create cross tabulation tables, because, It gives me so many options to modify the look up of

my table, (ex: Width option, by this we can change the width of each column in the table) Where as Proc tabulate unable to produce some of the things in my table. Ex: tabulate doesnt produce n (%) in the desirable format. How experienced are you with customized reporting and use of DATA _NULL_ features? A) I have very good experience in creating customized reports as well as with Data _NULL_ step. Its a Data step that generates a report without creating the dataset there by development time can be saved. The other advantages of Data NULL is when we submit, if there is any compilation error is there in the statement which can be detected and written to the log there by error can be detected by checking the log after submitting it. It is also used to create the macro variables in the data set. What is the difference between nodup and nodupkey options? A) NODUP compares all the variables in our dataset while NODUPKEY compares just the BY variables. What is the difference between compiler and interpreter? Give any one example (software product) that act as an interpreter? A) Both are similar as they achieve similar purposes, but inherently different as to how they achieve that purpose. The interpreter translates instructions one at a time, and then executes those instructions immediately. Compiled code takes programs (source) written in SAS programming language, and then ultimately translates it into object code or machine language. Compiled code does the work much more efficiently, because it produces a complete machine language program, which can then be executed. Under what circumstances would you code a SELECT construct instead of IF statements? A: I think Select statement are used when you are using one conditionto compare with several conditions likeselect passwhen Physics >60when math > 100when English = 50;otherwise fail; What is the one statement to set the criteria of data that can be codedin any step? A) Options statement. What is the effect of the OPTIONS statement ERRORS=1? A) The ERROR- variable ha a value of 1 if there is a error in the data for that observation and 0 if it is not. What's the difference between VAR A1 - A4 and VAR A1 -- A4 ?

A: There is no diff between VAR A1-A4 an VAR A1A4. Where as If u submit VAR A1---A4 instead of VAR A1-A4 or VAR A1A3, u will see error message in the log. What do the SAS log messages "numeric values have been converted to character" mean? What are the implications? It implies that automatic conversion took place to make character functions possible Why is a STOP statement needed for the POINT= option on a SET statement? Because POINT= reads only the specified observations, SAS cannot detect an end-of-file condition as it would if the file were being read sequentially. How do you control the number of observations and/or variables read or written? FIRSTOBS and OBS optionApproximately what date is represented by the SAS date value of 730? 31st December 1961 Identify statements whose placement in the DATA step is critical. A: INPUT, DATA and RUN Does SAS 'Translate' (compile) or does it 'Interpret'? Explain. A) Compile What does the RUN statement do? a) When SAS editor looks at Run it starts compiling the data or proc step, if you have more than one data step or proc step or if you have a proc step Following the data step then you can avoid the usage of the run statement. Why is SAS considered self-documenting? A) SAS is considered self documenting because during the compilation time it creates and stores all the information about the data set like the time and date of the data set creation later No. of the variables later labels all that kind of info inside the dataset and you can look at that infousing proc contents procedure.

What are some good SAS programming practices for processing very large data sets? A) Sort them once, can use firstobs = and obs = , What is the different between functions and PROCs that calculate the same simple descriptive statistics? A)Functions can used inside the data step and on the same data set but with proc's you can create a new data sets to output the results. May be more ........... If you were told to create many records from one record, show how youwould do this using arrays and with PROC TRANSPOSE? A) I would use TRANSPOSE if the variables are less use arrays if the var are more ................. depends What is a method for assigning first.VAR and last.VAR to the BY groupvariable on unsorted data? A) In Unsorted data you can't use First. or Last. How do you debug and test your SAS programs? A) First thing is look into Log for errors or warning or NOTE in some cases or use the debugger in SAS data step. What other SAS features do you use for error trapping and datavalidation? A) Check the Log and for data validation things like Proc Freq, Proc means or some times proc print to look how the data looks like ........ How would you combine 3 or more tables with different structures? A) I think sort them with common variables and use merge statement. I am not sure what you mean different structures. What areas of SAS are you most interested in? BASE, STAT, GRAPH, ETSBriefly describe 5 ways to do a "table lookup" in SAS.Match Merging, Direct Access, Format Tables, Arrays, PROC SQL What versions of SAS have you used (on which platforms)? SAS 8.2 in Windows and UNIX, SAS 7 and 6.12

What are some good SAS programming practices for processing very large data sets? Sampling method using OBS option or subsetting, commenting the Lines, Use Data Null What are some problems you might encounter in processing missing values? In Data steps? Arithmetic? Comparisons? Functions? Classifying data? The result of any operation with missing value will result in missing value. Most SAS statistical procedures exclude observations with any missing variable values from an analysis. How would you create a data set with 1 observation and 30 variables from a data set with 30 observations and 1 variable? Using PROC TRANSPOSE What is the different between functions and PROCs that calculate the same simple descriptive statistics? Proc can be used with wider scope and the results can be sent to a different dataset. Functions usually affect the existing datasets. If you were told to create many records from one record, show how you would do this using array and with PROC TRANSPOSE? Declare array for number of variables in the record and then used Do loopProc Transpose with VAR statement. What are _numeric_ and _character_ and what do they do? Will either read or writes all numeric and character variables in dataset. How would you create multiple observations from a single observation? Using double Trailing @@ For what purpose would you use the RETAIN statement? The retain statement is used to hold the values of variables across iterations of the data step. Normally, all variables in the data step are set to missing at the start of each iteration of the data step. What is the order of evaluation of the comparison operators:

+ - * / ** ()?(), **, *, /, +, How could you generate test data with no input data?Using Data Null and put statementHow do you debug and test your SAS programs? Using Obs=0 and systems options to trace the program execution in log. What can you learn from the SAS log when debugging? It will display the execution of whole program and the logic. It will also display the error with line number so that you can and edit the program. What is the purpose of _error_? It has only to values, which are 1 for error and 0 for no error How can you put a "trace" in your program? By using ODS TRACE ON How does SAS handle missing values in: assignment statements, functions, a merge, an update, sort order, formats, PROCs? Missing values will be assigned as missing in Assignment statement. Sort order treats missing as second smallest followed by underscore. How do you test for missing values? Using Subset functions like IF then Else, Where and SelectHow are numeric and character missing values represented internally? Character as Blank or and Numeric as. Which date functions advances a date time or date/time value by a given interval? INTNX. In the flow of DATA step processing, what is the first action in a typical DATA Step? When you submit a DATA step, SAS processes the DATA step and then creates a new SAS data set.( creation of input buffer and

PDV)Compilation PhaseExecution Phase What are SAS/ACCESS and SAS/CONNECT? SAS/Access only process through the databases like Oracle, SQLserver, Ms-Access etc. SAS/Connect only use Server connection. What is the one statement to set the criteria of data that can be coded in any step? OPTIONS Statement, Label statement, Keep / Drop statements.What is the purpose of using the N=PS option?The N=PS option creates a buffer in memory which is large enough to store PAGESIZE (PS) lines and enables a page to be formatted randomly prior to it being printed. What are the scrubbing procedures in SAS? Proc Sort with nodupkey option, because it will eliminate the duplicate values. What are the new features included in the new version of SAS i.e., SAS9.1.3? The main advantage of version 9 is faster execution of applications and centralized access of data and support.There are lots of changes has been made in the version 9 when we compared with the version 8. The following are the few:SAS version 9 supports Formats longer than 8 bytes & is not possible with version 8. Length for Numeric format allowed in version 9 is 32 where as 8 in version 8. Length for Character names in version 9 is 31 where as in version 8 is 32. Length for numeric informat in version 9 is 31, 8 in version 8.Length for character names is 30, 32 in version 8.3 new informats are available in version 9 to convert various date, time and datetime forms of data into a SAS date or SAS time. ANYDTDTEW. - Converts to a SAS date value ANYDTTMEW. - Converts to a SAS time value. ANYDTDTMW. -Converts to a SAS datetime value.CALL SYMPUTX Macro statement is added in the version 9 which creates a macro variable at execution time in the data step by Trimming trailing blanks Automatically converting numeric value to character. New ODS option (COLUMN OPTION) is included to create a multiple columns in the output.WHAT DIFFERRENCE DID YOU FIND AMONG VERSION 6 8 AND 9 OF SAS. The SAS 9 Architecture is fundamentally different from any prior version of SAS. In the SAS 9 architecture, SAS relies on a new

component, the Metadata Server, to provide an information layer between the programs and the data they access. Metadata, such as security permissions for SAS libraries and where the various SAS servers are running, are maintained in a common repository. What has been your most common programming mistake? Missing semicolon and not checking log after submitting program, Not using debugging techniques and not using Fsview option vigorously. Name several ways to achieve efficiency in your program. Explain trade-offs. Efficiency and performance strategies can be classified into 5 different areas. CPU time Data Storage Elapsed time Input/Output Memory CPU Time and Elapsed Time- Base line measurements Few Examples for efficiency violations: Retaining unwanted datasets Not sub setting early to eliminate unwanted records. Efficiency improving techniques: Using KEEP and DROP statements to retain necessary variables. Use macros for reducing the code. Using IF-THEN/ELSE statements to process data programming. Use SQL procedure to reduce number of programming steps. Using of length statements to reduce the variable size for reducing the Data storage.Use of Data _NULL_ steps for processing null data sets for Data storage. What other SAS products have you used and consider yourself proficient in using? Data _NULL_ statement, Proc Means, Proc Report, Proc tabulate, Proc freq and Proc print, Proc Univariate etc. What is the significance of the 'OF' in X=SUM (OF a1-a4, a6, a9); If dont use the OF function it might not be interpreted as we expect. For example the function above calculates the sum of a1 minus a4 plus a6 and a9 and not the whole sum of a1 to a4 & a6 and a9. It is true for mean option also.

What do the PUT and INPUT functions do? INPUT function converts character data values to numeric values. PUT function converts numeric values to character values. EX: for INPUT: INPUT (source, informat)For PUT: PUT (source, format)Note that INPUT function requires INFORMAT and PUT function requires FORMAT. If we omit the INPUT or the PUT function during the data conversion, SAS will detect the mismatched variables and will try an automatic character-to-numeric or numeric-to-character conversion. But sometimes this doesnt work because $ sign prevents such conversion. Therefore it is always advisable to include INPUT and PUT functions in your programs when conversions occur. Which date function advances a date, time or datetime value by a given interval? INTNX: INTNX function advances a date, time, or datetime value by a given interval, and returns a date, time, or datetime value. Ex: INTNX(interval,start-from,number-of-increments,alignment) INTCK: INTCK(interval,start-of-period,end-of-period) is an interval functioncounts the number of intervals between two give SAS dates, Time and/or datetime. DATETIME () returns the current date and time of day. DATDIF (sdate,edate,basis): returns the number of days between two dates. What do the MOD and INT function do? What do the PAD and DIM functions do? MOD: Modulo is a constant or numeric variable, the function returns the reminder after numeric value divided by modulo. INT: It returns the integer portion of a numeric value truncating the decimal portion. PAD: it pads each record with blanks so that all data lines have the same length. It is used in the INFILE statement. It is useful only when missing data occurs at the end of the record. CATX: concatenate character strings, removes leading and trailing blanks and inserts separators. SCAN: it returns a specified word from a character value. Scan function assigns a length of 200 to each target variable. SUBSTR: extracts a sub string and replaces character values.Extraction of a substring: Middleinitial=substr(middlename,1,1); Replacing character values: substr (phone,1,3)=433; If SUBSTR function is on the left side of a statement, the function replaces the contents of the character variable.TRIM: trims the trailing blanks from the character values. SCAN vs. SUBSTR: SCAN extracts words within a value that is marked by delimiters. SUBSTR extracts a portion of the value by stating the specific location.

It is best used when we know the exact position of the sub string to extract from a character value. How might you use MOD and INT on numeric to mimic SUBSTR on character Strings? The first argument to the MOD function is a numeric, the second is a non-zero numeric; the result is the remainder when the integer quotient of argument-1 is divided by argument-2. The INT function takes only one argument and returns the integer portion of an argument, truncating the decimal portion. Note that the argument can be an expression. DATA NEW ; A = 123456 ; X = INT( A/1000 ) ; Y = MOD( A, 1000 ) ; Z = MOD( INT( A/100 ), 100 ) ; PUT A= X= Y= Z= ; RUN ; A=123456 X=123 Y=456 Z=34 In ARRAY processing, what does the DIM function do? DIM: It is used to return the number of elements in the array. When we use Dim function we would have to re specify the stop value of an iterative DO statement if u change the dimension of the array. How would you determine the number of missing or nonmissing values in computations? To determine the number of missing values that are excluded in a computation, use the NMISS function. data _null_; m=.; y=4; z=0; N = N(m , y, z); NMISS = NMISS (m , y, z); run; The above program results in N = 2 (Number of non missing values) and NMISS = 1 (number of missing values). Do you need to know if there are any missing values? Just use: missing_values=MISSING(field1,field2,field3); This function

simply returns 0 if there aren't any or 1 if there are missing values. If you need to know how many missing values you have then use num_missing=NMISS(field1,field2,field3); You can also find the number of non-missing values with non_missing=N (field1,field2,field3); What is the difference between: x=a+b+c+d; and x=SUM (of a, b, c ,d);? Is anyone wondering why you wouldnt just use total=field1+field2+field3; First, how do you want missing values handled? The SUM function returns the sum of non-missing values. If you choose addition, you will get a missing value for the result if any of the fields are missing. Which one is appropriate depends upon your needs. However, there is an advantage to use the SUM function even if you want the results to be missing. If you have more than a couple fields, you can often use shortcuts in writing the field names If your fields are not numbered sequentially but are stored in the program data vector together then you can use: total=SUM(of fielda--zfield); Just make sure you remember the of and the double dashes or your code will run but you wont get your intended results. Mean is another function where the function will calculate differently than the writing out the formula if you have missing values. There is a field containing a date. It needs to be displayed in the format "ddmonyy" if it's before 1975, "dd mon ccyy" if it's after 1985, and as 'Disco Years' if it's between 1975 and 1985. How would you accomplish this in data step code? Using only PROC FORMAT. data new ; input date ddmmyy10.; cards; 01/05/1955 01/09/1970 01/12/1975 19/10/1979 25/10/1982 10/10/1988 27/12/1991 ; run; proc format ; value dat low-'01jan1975'd=ddmmyy10. '01jan1975'd-'01JAN1985'd="Disco Years" '01JAN1985'd-high=date9.; run; proc print;

format date dat. ; run; In the following DATA step, what is needed for 'fraction' to print to the log? data _null_; x=1/3; if x=.3333 then put 'fraction'; run; What is the difference between calculating the 'mean' using the mean function and PROC MEANS? By default Proc Means calculate the summary statistics like N, Mean, Std deviation, Minimum and maximum, Where as Mean function compute only the mean values. What are some differences between PROC SUMMARY and PROC MEANS? Proc means by default give you the output in the output window and you can stop this by the option NOPRINT and can take the output in the separate file by the statement OUTPUTOUT= , But, proc summary doesn't give the default output, we have to explicitly give the output statement and then print the data by giving PRINT option to see the result. What is a problem with merging two data sets that have variables with the same name but different data? Understanding the basic algorithm of MERGE will help you understand how the stepProcesses. There are still a few common scenarios whose results sometimes catch users off guard. Here are a few of the most frequent 'gotchas': 1- BY variables has different lengthsIt is possible to perform a MERGE when the lengths of the BY variables are different, But if the data set with the shorter version is listed first on the MERGE statement, theShorter length will be used for the length of the BY variable during the merge. Due to this shorter length, truncation occurs and unintended combinations could result. In Version 8, a warning is issued to point out this data integrity risk. The warning will be issued regardless of which data set is listed first: WARNING: Multiple lengths were specified for the BY variable name by input data sets.This may cause unexpected results. Truncation can be avoided by naming the data set with the longest length for the BY

variable first on the MERGE statement, but the warning message is still issued. To prevent the warning, ensure the BY variables have the same length prior to combining them in the MERGE step with PROC CONTENTS. You can change the variable length with either a LENGTH statement in the merge DATA step prior to the MERGE statement, or by recreating the data sets to have identical lengths for the BY variables. Note: When doing MERGE we should not have MERGE and IF-THEN statement in one data step if the IF-THEN statement involves two variables that come from two different merging data sets. If it is not completely clear when MERGE and IF-THEN can be used in one data step and when it should not be, then it is best to simply always separate them in different data step. By following the above recommendation, it will ensure an error-free merge result. Which data set is the controlling data set in the MERGE statement? Dataset having the less number of observations control the data set in the merge statement. How do the IN= variables improve the capability of a MERGE? The IN=variables What if you want to keep in the output data set of a merge only the matches (only those observations to which both input data sets contribute)? SAS will set up for you special temporary variables, called the "IN=" variables, so that you can do this and more. Here's what you have to do: signal to SAS on the MERGE statement that you need the IN= variables for the input data set(s) use the IN= variables in the data step appropriately, So to keep only the matches in the match-merge above, ask for the IN= variables and use them: data three; merge one(in=x) two(in=y); /* x & y are your choices of names */ by id; /* for the IN= variables for data */ if x=1 and y=1; /* sets one and two respectively */ run; What techniques and/or PROCs do you use for tables? Proc Freq, Proc univariate, Proc Tabulate & Proc Report. Do you prefer PROC REPORT or PROC TABULATE? Why? I prefer to use Proc report until I have to create cross tabulation tables, because, It gives me so many options to modify the look up of my table, (ex: Width option, by this we can change the width of each

column in the table) Where as Proc tabulate unable to produce some of the things in my table. Ex: tabulate doesnt produce n (%) in the desirable format. How experienced are you with customized reporting and use of DATA _NULL_ features? I have very good experience in creating customized reports as well as with Data _NULL_ step. Its a Data step that generates a report without creating the dataset there by development time can be saved. The other advantages of Data NULL is when we submit, if there is any compilation error is there in the statement which can be detected and written to the log there by error can be detected by checking the log after submitting it. It is also used to create the macro variables in the data set. What is the difference between nodup and nodupkey options? NODUP compares all the variables in our dataset while NODUPKEY compares just the BY variables. What is the difference between compiler and interpreter? Give any one example (software product) that act as an interpreter? Both are similar as they achieve similar purposes, but inherently different as to how they achieve that purpose. The interpreter translates instructions one at a time, and then executes those instructions immediately. Compiled code takes programs (source) written in SAS programming language, and then ultimately translates it into object code or machine language. Compiled code does the work much more efficiently, because it produces a complete machine language program, which can then be executed. Code the tables statement for a single level frequency? Proc freq data=lib.dataset; table var; *here you can mention single variable of multiple variables seperated by space to get single frequency; run; What is the main difference between rename and label? 1. Label is global and rename is local i.e., label statement can be used either in proc or data step where as rename should be used only in data step. 2.If we rename a variable, old name will be lost but if we label a variable its short name (old name) exists along with its descriptive name.

What is Enterprise Guide? What is the use of it? It is an approach to import text files with SAS (It comes free with Base SAS version 9.0) What other SAS features do you use for error trapping and data validation? What are the validation tools in SAS? For dataset: Data set name/debug Data set: name/stmtchk For macros: Options:mprint mlogic symbolgen.How can you put a "trace" in your program?ODS Trace ON, ODS Trace OFF the trace records. How would you code a merge that will keep only the observations that have matches from both data sets? Using "IN" variable option. Look at the following example. data three; merge one(in=x) two(in=y); by id; if x=1 and y=1; run; or data three; merge one(in=x) two(in=y); by id; if x and y; run; What are input dataset and output dataset options? Input data set options are obs, firstobs, where, in output data set options compress, reuse.Both input and output dataset options include keep, drop, rename, obs, first obs. How can u create zero observation dataset? Creating a data set by using the like clause. ex: proc sql; create table latha.emp like oracle.emp; quit; In this the like clause triggers the existing table structure to be copied to the new table. using this method result in the creation of an empty table. Have you ever-linked SAS code, If so, describe the link and any required statements used to either process the code or the

step itself? In the editor window we write %include 'path of the sas file'; run; if it is with non-windowing environment no need to give run statement. How can u import .CSV file in to SAS? tell Syntax? To create CSV file, we have to open notepad, then, declare the variables. proc import datafile='E:\age.csv'out=sarathdbms=csv replace;getnames=yes; proc print data=sarath; run; What is the use of Proc SQl? PROC SQL is a powerful tool in SAS, which combines the functionality of data and proc steps. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create new variables, and print the results or create a new dataset all in one step! PROC SQL uses fewer resources when compard to that of data and proc steps. To join files in PROC SQL it does not require to sort the data prior to merging, which is must, is data merge. What is SAS GRAPH? SAS/GRAPH software creates and delivers accurate, high-impact visuals that enable decision makers to gain a quick understanding of critical business issues. Why is a STOP statement needed for the point=option on a SET statement? When you use the POINT= option, you must include a STOP statement to stop DATA step processing, programming logic that checks for an invalid value of the POINT= variable, or Both. Because POINT= reads only those observations that are specified in the DO statement, SAS cannot read an end-of-file indicator as it would if the file were being read sequentially. Because reading an end-of-file indicator ends a DATA step automatically, failure to substitute another means of ending the DATA step when you use POINT= can cause the DATA step to go into a continuous loop. What is the difference between nodup and nodupkey options? The NODUP option checks for and eliminates duplicate observations. The NODUPKEY option checks for and eliminates duplicate observations by variable values.

If youre not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set? Data _null_; _NULL_ specifies that SAS does not create a data set when it executes the DATA step. Data _null_ is majorly used in eg. Data _null_; Set somedata; Call symput(macvar,dsnvariable); Run; Creating a Custom Report creating quick macro variables with call symput routine

Eg. The second DATA step in this program produces a custom report and uses the _NULL_ keyword to execute the DATA step without creating a SAS data set: data sales; input dept : $10. jan feb mar; datalines; shoes 4344 3555 2666 housewares 3777 4888 7999 appliances 53111 7122 41333 ; data _null_; set sales; qtr1tot=jan+feb+mar; put Total Quarterly Sales: qtr1tot dollar12.; run; What is the one statement to set the criteria of data that can be coded in any step? WHERE statement can sets the criteria for any data set in a datastep or a proc step. Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself. SAS code could be linked using the GOTO or the Link statement. GOTO http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a00020194 9.htm

LINK http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a00020197 2.htm The difference between the LINK statement and the GO TO statement is in the action of a subsequent RETURN statement. A RETURN statement after a LINK statement returns execution to the statement that follows LINK. A RETURN statement after a GO TO statement returns execution to the beginning of the DATA step, unless a LINK statement precedes GO TO, in which case execution continues with the first statement after LINK. In addition, a LINK statement is usually used with an explicit RETURN statement, whereas a GO TO statement is often used without a RETURN statement. When your program executes a group of statements at several points in the program, using the LINK statement simplifies coding and makes program logic easier to follow. If your program executes a group of statements at only one point in the program, using DO-group logic rather than LINK-RETURN logic is simpler. Goto eg. data info; input x; if 1<=x<=5 then go to add; put x=; add: sumx+x; datalines; 7 6 323 ; Link Eg. data hydro; input type $ depth station $; /* link to label calcu: */ if type =aluv then link calcu; date=today(); /* return to top of step */ return; calcu: if station=site_1 then elevatn=6650-depth; else if station=site_2 then elevatn=5500-depth; /* return to date=today(); */ return; datalines; aluv 523 site_1 uppa 234 site_2 aluv 666 site_2 more data lines ;

How would you include common or reuse code to be processed along with your statements? - Using SAS Macros. - Using a %include statement When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc? Index function Searches a character expression for a string of characters SAS Statements a=ABC.DEF (X=Y); b=X=Y; x=index(a,b); put x; Results

10

For learning purposes The INDEXC function searches for the first occurrence of any individual character that is present within the character string, whereas the INDEX function searches for the first occurrence of the character string as a pattern. b=have a good day; x=indexc(b,pleasant,'very); put x; The INDEXW function searches for strings that are words, whereas the INDEX function searches for patterns as separate words or as parts of other words. INDEXC searches for any characters that are present in the excerpts. s=asdf adog dog; p=dog ; x=indexw(s,p); put x; If you have a data set that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variables? Use KEEP= dataset option (data statement or set statement) or KEEP statement in a datastep. eg. Data fewdata (keep = var10 var11); Set fulldata (Keep= VAR1 VAR2 VAR3 VAR4 VAR5); Keep var6 var7; Run; Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables. Proc sort data= Dist_County;

By state district city; Run; How would you delete duplicate observations? noduprecs option in a Proc Sort. data cricket; input id country $9. score; cards; 1 australia 342 2 somerset 343 1 australia 342 2 somerset 341 ; run; proc sort data = cricket noduprecs; by id; run; Here in the example observation 1 and 3 are duplicate records.so Obs 1 is retained How would you delete observations with duplicate keys? nodupkey option in a Proc Sort. proc sort data = cricket nodupkey; by id; run; In the above example Observation 1/ 3 and 2 / 4 have duplicate key (variable id) values i.e. 1 and 2 respectivelyso observations 3 / 4 get deleted How would you code a merge that will keep only the observations that have matches from both sets. data mergeddata; merge one(in=A) two(in=B); By ID; if A and B; run; How would you code a merge that will write the matches of both to one data set, the non-matches from the left-most data. Data one two three; Merge DSN1 (in=A) DSN2 (in=B); By ID; If A and B then output one; If A and not B then output two; If not A and B then output three; Run; What is the Program Data Vector (PDV)? What are its functions? PDV is a logical area in memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS

language statements. The data values are assigned to the appropriate variables in the program data vector. From here, SAS writes the values to a SAS data set as a single observation. Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of _ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more errors have occurred). SAS does not write these variables to the output data set.

Does SAS Translate (compile) or does it Interpret? Explain. At compile time when a SAS data set is read, what items are created? SAS compiles the code sent to the compiler. When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them, that is, automatically translates the statements into machine code. In this phase, SAS identifies the type and length of each new variable, and determines whether a type conversion is necessary for each subsequent reference to a variable. During the compile phase, SAS creates the following three items: input buffer is a logical area in memory into which SAS reads each record of raw data when SAS executes an INPUT statement. Note that this buffer created only when the DATA step reads raw data. (When the DATA step reads a SAS data set, SAS reads the data directly into the program data vector.) program is a logical area in memory where SAS builds a data set, one data vector observation at a time. When a program executes, SAS reads data (PDV) values from the input buffer or creates them by executing SAS language statements. The data values are assigned to the appropri variables in the program data vector. From here, SAS writes the val to a SAS data set as a single observation. Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_. The _N_ variab counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of _ERROR_ is either 0 (indicating errors exist), or 1 (indicating that one or more errors have occurred SAS does not write these variables to the output data set. descriptor is information that SAS creates and maintains about each SAS data information set, including data set attributes and variable attributes. It contains for example, the name of the data set and its member type, the da and time that the data set was created, and the number, names an data types (character or numeric) of the variables. The Execution Phase

By default, a simple DATA step iterates once for each observation that is being created. The flow of action in the Execution Phase of a simple DATA step is described as follows: The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. SAS sets the newly created program variables to missing in the program data vector (PDV). SAS reads a data record from a raw data file into the input buffer, or it reads an observation from a SAS data set directly into the program data vector. You can use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record. SAS executes any subsequent programming statements for the current record. At the end of the statements, an output, return, and reset occur automatically. SAS writes an observation to the SAS data set, the system automatically returns to the top of the DATA step, and the values of variables created by INPUT and assignment statements are reset to missing in the program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to missing here. SAS counts iteration, reads the next record or observation, and executes the subsequent programming statements for the current observation. The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw data file. All the variables are assigned missing values (Blank for character, . for numeric values) Name statements that are recognized at compile time only? drop, keep, rename, label, format, informat, attrib, where, by, retain, length, array Name statements that are execution only. INFILE, INPUT, Output, Call routines Identify statements whose placement in the DATA step is critical. DATA, INPUT, RUN, CARDS ,INFILE,WHERE,LABEL,SELECT,INFORMAT,FORMAT Name statements that function at both compile and execution time. options, title, footnote In the flow of DATA step processing, what is the first action in a typical DATA Step? The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. What is _n_? The _N_ variable counts the number of times the DATA step begins to iterate.

It is one of the Automatic data step (and not procs) variables (the other one being _ERROR_) that SAS provides in a PDV. It should be noted that _n_ does not necessarily equal the observation number in a dataset. How do I convert a numeric variable to a character variable? Practically, the data type of a variable cannot be changed in one data step, but the data values couldOne should create a new variable with data type character and assign the values of the numeric variable with a PUT function, drop the numeric variable, and rename the character variable to the numeric variable name. Note: You would receive a warning saying that the variable has already been defined as numeric. Eg.

http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000199354 .htm#a000226452 How do I convert a character variable to a numeric variable? Practically, the data type of a variable cannot be changed in one data step, but the data values couldOne should create a new variable with data type numeric and assign the values of the character variable with a INPUT function, drop the character variable, and rename the numeric variable to the character variable name. Note: You would receive a warning saying that the variable has already been defined as character. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000180357 .htm What SAS statements would you code to read an external raw data file to a DATA step? We use SAS statements FILENAME to specify the location of the file INFILE Identifies an external file to read with an INPUT statement INPUT to specify the variables that the data is identified with.

How do you read in the variables that you need? Using Input statement with column /line pointers, informats and length specifiers. Are you familiar with special input delimiters? How are they used? DLM, DSD are the special input delimiters DELIMITER= delimiter(s) specifies an alternate delimiter (other than a blank) to be used for LIST input DSD (delimiter-sensitive data) specifies that when data values are enclosed in quotation marks, delimiters within the value be treated as character data. The DSD option changes how SAS treats delimiters when you use LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932 .htm#a000177189 If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didnt have a value? Options MISSOVER and TRUNCOVER options.. MISSOVER prevents an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. When an INPUT statement reaches the end of the current input data record, variables without any values assigned are set to missing. TRUNCOVER overrides the default behavior of the INPUT statement when an input data record is shorter than the INPUT statement expects. By default, the INPUT statement automatically reads the next input data record. TRUNCOVER enables you to read variable-length records when some records are shorter than the INPUT statement expects. Variables without any values assigned are set to missing. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932 .htm#a000177189 What is the difference between an informat and a format? Name three informats or formats. INFORMAT Statement Associates informats with variables Its basically used in an input / SQL create table statements to read external file raw data or data that is not in a SAS format. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000178244 .htm eg: commaw. datew. Wordatew. dollarw. $varyinglengthw. FORMAT Statement Associates formats with variables Its basically used in a datastep format / SQL select / Procedure format statements to output SAS data to a file/report etc

Formats can look-like informats but are differentiated as to which statement they are used in eg. Datew., Worddatew., mmddyyw. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000178212 .htm Name and describe three SAS functions that you have used, if any? The most common functions that would be used areConversion functions Input / Put / int / ceil / floor Character functions Scan / substr / index / Left / trim / compress / cat / catx / upcase,lowcase Arithmetic functions Sum / abs / Attribute info functions Attrn / length Dataset open / close / exist Directory dexist / dopen / dclose / dcreate / dinfo File functions fexist / fopen/ filename / fileref SQL functions coalesce / count / sum/ mean Date functions date / today / datdif / datepart / datetime / intck / mdy Array functions dim http://sastechies.com/SASfunctions.php title=http://sastechies.com/ SASfunctions.php How would you code the criteria to restrict the output to be produced? In view of in-sufficient clarity as to what the interviewer refers to Global statement options obs=; Dataset options obs= Proc SQL NOPRINT option for reporting / inobs= , outobs= for SQL select Proc datasets NOLIST option What is the purpose of the trailing @ and the @@? How would you use them? Line-hold specifiers keep the pointer on the current input record when a data record is read by more than one INPUT statement (trailing @) one input line has values for more than one observation (double trailing @) a record needs to be reread on the next iteration of the DATA step (double trailing @). Use a single trailing @ to allow the next INPUT statement to read from the same record. Use a double trailing @ to hold a record for the next INPUT statement across iterations of the DATA step. Normally, each INPUT statement in a DATA step reads a new data record into the input buffer. When you use a trailing @, the following occurs: The pointer position does not change. No new record is read into the input buffer. The next INPUT statement for the same iteration of the DATA step continues to read the same record rather than a new one. SAS releases a record held by a trailing @ when a null INPUT statement executes:

input; an INPUT statement without a trailing @ executes the next iteration of the DATA step begins. Normally, when you use a double trailing @ (@@), the INPUT statement for the next iteration of the DATA step continues to read the same record. SAS releases the record that is held by a double trailing @ immediately if the pointer moves past the end of the input record immediately if a null INPUT statement executes: input; when the next iteration of the DATA step begins if an INPUT statement with a single trailing @ executes later in the DATA step: input @; A record held by the double trailing at sign (@@) is not released until >-+-10V+the input 10 9 7 point 2 2 8 er 84 23 36 75 move s past the end of the recor d. Then the input point er move s down to the next recor d. an INPUT state ment witho ut a linehold specif ier input ID $4. @@; . . input Department 5.;

10 3

execu tes. enables the next INPUT statement to read from the same record releases the current record when a subsequent INPUT statement executes without a line-hold specifier. Unlike the @@, the single @ also releases a record when control returns to the top of the DATA step for the next iteration. data perm.sales97; infile data97 missover; input ID $4. @; do Quarter=1 to 4; input Sales : comma. @; output; end; run; Raw Data File Data97 >-V-10+-20+-30+-40 073 1,323. 2,472. 3,276. 4 34 85 65 0943 1,908.34 2,560.38 1009 2,934.12 3,308.41 4,176.18 7,581.81 data perm.people (drop=type); infile census; retain Address; input type $1. @; if type='H' then input @3 Address $15.; if type='P'; input @3 Name $10. @13 Age 3. @15 Gender $1.; run; >V+-10+H 321 S. MAIN ST P MARY E 21 F P WILLIAM M 23 P M SUSAN K 3 F data perm.residnts; infile census; retain Address; input type $1. @; >-+-10+-20 H 321 S. MAIN P ST P MARY E 21 P F H WILLIAM M 23

5,34 52

if type='H' then do; if _n_ > 1 then output; Total=0; input Address $ 3-17; end; else if type='P' then total+1;

P P P P P H P P H P P

M SUSAN K 3 F 324 S. MAIN ST THOMAS H 79 M WALTER S 46 M ALICE A 42 F MARYANN A 20 F JOHN S 16 M 325A S. MAIN ST JAMES L 34 M LIZA A 31 F 325B S. MAIN ST MARGO K 27 F WILLIAM R 27 M P ROBERT W 1 M

Under what circumstances would you code a SELECT construct instead of IF statements? The SELECT statement begins a SELECT group. SELECT groups contain WHEN statements that identify SAS statements that are executed when a particular condition is true. Use at least one WHEN statement in a SELECT group. An optional OTHERWISE statement specifies a statement to be executed if no WHEN condition is met. An END statement ends a SELECT group. Null statements that are used in WHEN statements cause SAS to recognize a condition as true without taking further action. Null statements that are used in OTHERWISE statements prevent SAS from issuing an error message when all WHEN conditions are false. Using Select-When improves processing efficiency and understandability in programs that needed to check a series of conditions for the same variable. Use IF-THEN/ELSE statements for programs with few statements. Using a subsetting IF statement without a THEN clause could be dangerous because it would process only those records that meet the condition specified in the IF clause. http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201966 .htm What statement you code to tell SAS that it is to write to an external file?

FILENAME / FILE/ PUT The FILENAME statement is an optional statement that species the location of the external file. PUT Statement Writes the variable values to the external file. The FILE statement specifies the current output file for PUT statements in the DATA step. When multiple FILE statements are present, the PUT statement builds and writes output lines to the file that was specified in the most recent FILE statement. If no FILE statement was specified, the PUT statement writes to the SAS log. The specified output file must be an external file, not a SAS data library, and it must be a valid access type. If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record? Use the _infile_ option in the put statement view source print? filename some 'c:\cool.dat'; filename cool1 'c:\cool1.dat'; data _null_; infile some; input some; file cool1; put _infile_; run; Question Dataset below shows the hospital visit by patients during the entire year (2012) data hospital; input name $ month $; cards; Tanu Jan Tanoj Feb Tanu Apr Tanu Dec Arun Oct Kiran Nov Tarun Mar Tarun Apr Tarun May Tarun Dec ; run; Find out how many times each patient has visited the hospital in 201208-12 using proc freq, proc report, proc sql Answer: proc freq data = hospital; tables name/out=hosp1(drop=percent); run;

data hosp2; set hospital; count=1; run; proc report data = hosp2; column name count; define name/group; run; proc sql; select name, sum(count) from hosp2 group by name; quit; Question: data class; input name $ subject $ marks; cards; Manoj Sacience 94 Raj Science 86 Tanu Maths 76 Manoj Maths 45 Manoj English 65 Tanu English 76 Tanu Science 76 Raj Maths 66 Raj English 56 ; run; Use SQL to produce : Sum as 204 Manoj 208 Raj 228 Tanu Avg Marks as: 68 Manoj 69.33333 Raj 76 Tanu Use proc means to do the same Answer: proc sql; select name, Avg(marks) from class group by name; quit; proc sort data = class; by name; run; proc means data = class mean;

var marks; by name; run; 1. What areas of SAS are you most interested in? 2. Describe 5 ways to do a "table lookup" in SAS. 3. What versions of SAS have you used (on which platforms)? 4. What are some good SAS programming practices for processing very large data sets? 5. What are some problems you might encounter in processing missing values? In Data steps? Arithmetic? Comparisons? Functions? Classifying data? 6. How would you create a data set with 1 observation and 30 variables from a data set with 30 observations and 1 variable? 7. What is the different between functions and PROCs that calculate the same simple descriptive statistics? 8. If you were told to create many records from one record, show how you would do this using array and with PROC TRANSPOSE? 9. What do the SAS log messages "numeric values have been converted to character" mean? What are the implications? 10. Why is a STOP statement needed for the POINT= option on a SET statement? 11. How do you control the number of observations and/or variables read or written? 12. Approximately what date is represented by the SAS date value of 730? 13. Identify statements whose placement in the DATA step is critical. 14. What does the RUN statement do? 15. Why is SAS considered self-documenting? 16. What are some good SAS programming practices for processing very large data sets? 17. What is the different between functions and PROCs that calculate thesame simple descriptive statistics? 18. If you were told to create many records from one record, show how you would do this using arrays and with PROC TRANSPOSE? 19. What is a method for assigning first.VAR and last.VAR to the BY groupvariable on unsorted data? 20. How do you debug and test your SAS programs? 21. What other SAS features do you use for error trapping and data validation? 22. How would you combine 3 or more tables with different structures? 23. What are _numeric_ and _character_ and what do they do? 24. For what purpose would you use the RETAIN statement? 25. How could you generate test data with no input data? 26. How do you debug and test your SAS programs? 27. What can you learn from the SAS log when debugging?

28. What is the purpose of _error_? 29. How do you test for missing values? 30. In the flow of DATA step processing, what is the first action in a typical DATA Step? 31. What are SAS/ACCESS and SAS/CONNECT? 32. What is the one statement to set the criteria of data that can be coded in any step? 33. What is the purpose of using the N=PS option? 34. What are the scrubbing procedures in SAS? 35. What are the new features included in the new version of SAS i.e., SAS9.1.3? 36. What do the PUT and INPUT functions do? 37. Which date function advances a date, time or datetime value by a given interval? 38. How might you use MOD and INT on numeric to mimic SUBSTR on character Strings? 39. In ARRAY processing, what does the DIM function do? 40. How would you determine the number of missing or nonmissing values in computations? 41. Do you need to know if there are any missing values? 42. What are the validation tools in SAS? 43. How can you put a "trace" in your program? 44. What are input dataset and output dataset options? 45. What is SAS GRAPH?

Das könnte Ihnen auch gefallen