Sie sind auf Seite 1von 18

Topic:

SAS Expression and Functions


1. Overview of Operators in SAS 2. Infix Operators in SAS 3. SAS expression & Conditional expression (WHERE and IF) 4. Numeric functions 5. SAS Time and Date 6. Character functions 7. Global Statements

1. Overview of Operators in SAS Definitions: A SAS operator is a symbol that represents a comparison, arithmetic calculation, or logical operation; a SAS function; or grouping parentheses. SAS uses two major kinds of operators:

prefix operators infix operators.

A prefix operator is an operator that is applied to the variable, constant, function, or parenthetic expression that immediately follows it. The plus sign (+) and minus sign (-) can be used as prefix operators. The following are examples of prefix operators used with variables, constants, functions, and parenthetic expressions:
+y -25 -cos(angle1) +(x*y)

An infix operator applies to the operands on each side of it, for example, 6<8. Infix operators include the following:

arithmetic comparison logical, or Boolean minimum maximum concatenation.

When used to perform arithmetic operations, the plus and minus signs are infix operators. 2. Infix Operator
2.1 Arithmetic operators: Arithmetic Operators Symbol * / + ** Definition multiplication Division Addition subtraction exponentiation Example where bonus = salary * .10; where f = g/h; where c = a+b; where f = g-h; where y = a**2;

2.2 Comparison operators: The following table lists the comparison operators:
Comparison Operators Symbol = ^= or ~= or = > < >= <= Mnemonic Equivalent EQ NE GT LT GE LE IN Definition equal to not equal to greater than less than greater than or equal to less than or equal to equal to one from a list of values Example where empnum eq 3374; where status ne fulltime; where hiredate gt '01jun1982'd; where empnum < 2000; where empnum >= 3374; where empnum <= 3374; where state in ('NC','TX'); or num in (3, 4, 5)

Others 2.3.1 IN Operator The IN operator, which is a comparison operator, searches for character and numeric values that are equal to one from a list of values. The list of values must be in parentheses, with each character value in quotation marks and separated by either a comma or blank. For example, suppose you want all sites that are in North Carolina or Texas. You could specify:
where state = 'NC' or state = 'TX';

However, the easier way would be to use the IN operator, which says you want any state in the list:
where state in ('NC','TX');

In addition, you can use the NOT logical operator to exclude a list. For example,
where state not in ('CA', 'TN', 'MA'); Example: data in_ex; input store $ vcr_price vcd_price cd_player_price; datalines;

future_shop 169.99 69.99 79.99 sony_store 179.99 64.99 84.99 radio_shack 159.99 64.99 69.99 three_d 174.99 67.49 74.99 electron 174.99 65.99 69.99 ; data in_ex_2; set in_ex; *where/if vcr_price not in (159.99, 169.99); run;

2.3.2 Fully-Bounded Range Condition A fully-bounded range condition consists of a variable between two comparison operators, specifying both an upper and lower limit. For example, the following expression returns the employee numbers that fall within the range of 500 to 1000 (inclusive):
where 500 <= empnum <= 1000;

You can combine the NOT logical operator with a fully-bounded range condition to select observations that fall outside the range. Note that parentheses are required:
where not (500 <= empnum <= 1000);

2.3.3 BETWEEN-AND Operator The BETWEEN-AND operator is also considered a fully-bounded range condition that selects observations in which the value of a variable falls within an inclusive range of values. You can specify the limits of the range as constants or expressions. Any range you specify is an inclusive range, so that a value equal to one of the limits of the range is within the range. The general syntax for using BETWEEN-AND is: WHERE variable BETWEEN value AND value; For example:
where empnum between 500 and 1000; where taxes between salary*0.30 and salary*0.50;

2.3 Boolean operators: Also called Logical operators & or and and | or or or ^ or not negation

Two comparisons with a common variable linked by AND (operator) can be condensed with an implied AND. For example, the following two subsetting IF statements produce the same result:

if 16<=age and /or age<=65; if 16<=age<=65;

2.4 MIN and MAX Operators Use the MIN and MAX operators to find the minimum or maximum value of two quantities. 2.5 Concatenation Operator The concatenation operator concatenates character values. You indicate the concatenation operator as follows: || For example, in this DATA step, the value that results from the concatenation contains blanks because the length of the COLOR variable is eight:
data namegame; length color name $8 game $12; color='black'; name='jack'; *game=color||name; game=trim(color)||name; *put game=; run;

The value of GAME is 'black jack'. To correct this problem, use the TRIM function in the concatenation operation as follows: game=trim(color)||name; This statement produces a value of 'blackjack' for the variable GAME
data _null_; month='sep'; year=99; date=trim(month) || left(put(year,8.)); /*PUT function to convert a numeric value to a character value*/ put date=; run;

3. SAS expression and Conditional expression (WHERE and IF) IF expression: The IF command can only be used in a DATA step. This means you are creating a new dataset that includes only the selected observations. Only observations that satisfy the IF condition are sent to the output dataset. WHERE expression: is a type of SAS expression that defines a condition for selecting observations. The WHERE command can be added to any procedure. Only observations that satisfy the WHERE condition are used by the procedure. Using the WHERE statement may improve the efficiency of your SAS programs because SAS is not required to read all observations from the input data set and does not change the data set in any way. 4. Numeric functions 4.1 Mathematical: The following is a brief summary of SAS functions useful for defining models. ABS(x) the absolute value of x

COS(x) the cosine of x. x is in radians. EXP(x) ex LOG(x) the natural logarithm of x LOG10(x) the log base ten of x LOG2(x) the log base two of x SIN(x) the sine of x. x is in radians. SQRT(x) the square root of x TAN(x) the tangent of x. x is in radians and is not an odd multiple of 2. 4.2 Statistical Kurtosis: describes the heaviness of the tails of a distribution Skewness: measure of the tendency for the distribution values to be more spread out on one side that other Max and Min: Mean: average is the total of sum divided by the number of scores Median: the point that corresponds to the value that lies in the middle of the distribution or the value that divides a distribution exactly in half (3, 5, 8, 10, 11). It serves as a valuable alternative to the mean in the specific situations: a) there are a few extreme scores in the distribution, b) some values have undetermined values, c) when the data are measured on an ordinal scale (see example below) Range: max to min Sum: Variance: measure of variability Std: standard deviation, the square root of the variance Stderr: standard error of the mean, std dev/the square root of sample size Example:
data income_1; input region $ 1-8 income; cards; Edison 99800 Edison 109800 Edison 120000 Edison 96500 Edison 90550 Edison 115000 Edison 142500 Edison 73000 Edison 79850 Edison 55890 Edison 23000 Edison 19800 Edison 82000 Edison 76800 Edison 39800 Edison 22800 Edison 58650 ; run; proc univariate data=income_1 plot normal; var income;

run; proc chart data =income_1; vbar income/levels =8; run; data income_2; input region $ 1-8 income; cards; Edison 11800 Edison 29800 Edison 15000 Edison 26500 Edison 39550 Edison 22000 Edison 62500 Edison 83000 Edison 29850 Edison 35890 Edison 53000 Edison 19800 Edison 72000 Edison 36800 Edison 39800 Edison 22800 Edison 58650 ; run; proc univariate data=income_2 plot normal; var income; run; proc chart data =income_2; vbar income/levels =10; run;

5. SAS Time and Date: 5.1 SAS Date, Time, and Date time Values Key Concepts: SAS date value: is a value that represents the number of days between January 1, 1960, and a specified date. SAS can perform calculations on dates ranging from A.D. 1582 to A.D. 19,900. Dates before January 1, 1960, are negative numbers; dates after are positive numbers. SAS date values account for all leap year days, including the leap year day in the year 2000. Various SAS language elements handle SAS date values: functions, formats and informats.

SAS time value: is a value representing the number of seconds since midnight of the current day.

SAS datetime value: is a special value that combines both date and time information. A SAS datetime value is stored as the number of seconds between midnight on January 1, 1960, and a given date and time.

Example:

options nodate pageno=1 linesize=80 pagesize=60; data test; Time1=86399;/*datetime value*/; format Time1 datetime.; Date1=86399;/*date value*/; format Date1 date.; Time2=86399;/*time value*/; format Time2 timeampm.; run; proc print data=test; title 'Same Number, Different SAS Values'; run; Output:
Obs 1 Time1 01JAN60:23:59:59 Date1 20JUL96 Time2 11:59:59 PM

How SAS Converts Calendar Dates to SAS Date Values

5.2 Two-Digit and Four-Digit Years If dates in your external data sources or SAS program statements contain two-digit years, you can determine which century prefix should be assigned to them by using the YEARCUTOFF= system option. The YEARCUTOFF= system option specifies the first year of the 100-year span that is used to determine the century of a two-digit year. Before you use the YEARCUTOFF= system option, examine the dates in your data: If the dates in your data fall within a 100-year span, you can use the YEARCUTOFF= system option. If the dates in your data do not fall within a 100-year span, you must either convert the two-digit years to to four-digit years or use a DATA step with conditional logic to assign the proper century prefix. Once you've determined that the YEARCUTOFF= system option is appropriate for your range of data, you can determine the setting to use. The best setting for YEARCUTOFF= is a year just slightly lower than the lowest year in your data. For example, if you have
8

data in a range from 1921 to 1999, set YEARCUTOFF= to 1920, if that is not already your system default. The result of setting YEARCUTOFF= to 1920 is that SAS interprets all two-digit dates in the range of 20 through 99 as 1920 through 1999. SAS interprets all two-digit dates in the range 00 through 19 as 2000 through 2019. The following figure shows the span of years when the YEARCUTOFF= option is set to a value of 1920. The 100-year span in this case is from 1920 to 2019. Span of Years When the YEARCUTOFF= Option Is Set to 1920

With YEARCUTOFF= set to 1920, a two-digit year of 10 would be interpreted as 2010, and a two-digit year of 22 would be interpreted as 1922. Example:
data date; input @1 jobid $ @6 datalines; A010 01/15/15 16:25 A100 01/15/25 2:16 A110 03/15/2025 13:22 A200 01/30/96 9:11 B100 02/05/00 6:02 B200 06/15/2000 5:09 ; date mmddyy8. time time5. date_time datetime18.; 30Jun1915/10:03 13May1925/10:03 10Apr2025/06:00 12Dec1996/09:55 04Aug2000/08:34 11Jan2000/07:13

5.3 Working with SAS Dates and Times Informats and Formats The SAS System converts date, time and datetime values back and forth between calendar dates and clock times with SAS language elements called formats and informats. SAS uses formats and informats to interpret and display data in convenient ways. Formats present a value, recognized by SAS, such as a time or date value, as a calendar date or clock time in a variety of lengths and notations, and provide instructions for how to display a variable on output. Informats read notations or a value, such as a clock time or a calendar date, which may be in a variety of lengths, and then convert the data to a SAS date, time, or datetime value and provides instructions for how to interpret data as it is read. Informats can be specified using an Informat statement, or on the INPUT command following a colon after the variable name. They ALWAYS end with a period!
*Example: Reading, Writing, and Calculating Date Values This program reads four regional meeting dates and calculates the dates on which announcements should be mailed. ;

options nodate pageno=1 linesize=80 pagesize=60; data meeting; input region $ mtg : mmddyy8.; sendmail=mtg-45; datalines; N 11-24-99 S 12-28-99 E 12-03-99 W 10-04-99 ; proc print data=meeting; format mtg sendmail date9.; title 'When To Send Announcements'; run;

Example:
data aa; length dob 8; input @1 id 3. @5 doa mmddyy8. @14 dob mmddyy8.; informat dob ddmmyy8. doa ddmmyy8.; age = (doa-dob)/365.25; age2 =int((doa-dob)/365.25); datalines; 001 06/21/97 05/13/66 002 05/04/98 11/28/96 003 10/15/99 09/25/45 ;

MMDDYYw. , DATEw, WEEKDATEw., WORDDATEw., MONYYw. et al. 5.4 Date functions - INTCK( interval, date1(or start), date2(end) ) returns the number of boundaries of intervals of the given kind that lie between the two date or datetime values.
data intk; input dob mmddyy8.; informat dob ddmmyy8.; age_year= intck('year', dob, today()); age_month= intck('month', dob, today()); age_day = intck ('day', dob, today()); datalines; 06/17/97 ;

- Year, QTR, Month, Weekday, Day functions


data intk2; input dob mmddyy8.; dob_yr=year(dob); dob_qtr= qtr(dob); dob_month=month(dob); dob_day= day(dob); datalines; 06/17/97

10

Note: The YEAR function produces a four-digit numeric value that represents the year. The MONTH function returns a numeric value that represents the month from a SAS date value. Numeric values can range from 1 through 12. The WEEKDAY function returns the day of the week (1= Sunday) DAY function produces an integer from 1 to 31 that represents the day of the month

- MDY MDY converts a month, day, and year value into SAS date variable to compute the date. Example:
data m_d_y; input month day year; date=mdy (month, day, year); drop month day year; format date mmddyy8.; datalines; 11 25 98 03 20 04 ;

-TODAY() Function This function returns todays date as a SAS date value from your computers system clock.
data to; input @1 jobid $ @6 date yymmdd10.; a=today(); b=year(a)-year(date); datalines; A010 19970426 ;

5.5 Declaring a SAS Date, Time or Datetime Constant Special features in Base SAS Software allow users to declare a particular date or time as a constant without having to know the number of days from January 1, 1960 and/or the number of seconds since midnight. A date or time constant is declared by enclosing a date or time in single quotes, followed by the letter D and/or T to signify either date or time or DT to signify a datetime variable. For example:
X = 04JUL97D will set the new variable X equal to the number of days between January 1, 1960 and July 4, 1997. Y = 09:00T sets the new variable Y to the number of seconds between midnight and 9 am. Z = 04JUL97:12:00DT sets the value of the variable Z to the number of seconds from January 1, 1960 to noon on July 4, 1997.
data dt; X = '04JUL97'D ; Y = '09:00'T ; Z = '04JUL97:12:00'DT ; run;

6. Character functions

11

SAS software is rich in its assortment of functions that deal with character data. The class of functions is sometimes called STRING functions. In this lecture, we demonstrate some of the more useful string functions. 6.1 Put and Input functions Frequently a variable value needs to be converted from one format to another. For example, a new data contains customer ID as numeric value, but your permanent data has customer ID as character variable. 6.1.1 PUT function Syntax: put (argument1, format) where argument1 is a variable name or a constant, and format is a valid SAS format of the same type (numeric or character) as argument1. The PUT function writes a character string that consists of the value of argument1 output in the specified format. The result of the PUT function is always a character value, regardless of the type of the function's arguments. 6.1.2 INPUT function: Its syntax is similar to put function. Whereas, INPUT function converts a character value to a numeric value. 6.2 SUBSTR function The SAS data step function SUBSTR (commonly pronounced substring) function is used to work with a specific position or positions of characters within a defined character variable. The function focuses on a portion of a string and can go on either side of the = sign in a data step statement. SYNTAX The SUBSTR function has three arguments: SUBSTR (SOURCE, POSITION, N) or (char_var, start, length). The function returns N characters, beginning at character number POSITION from the string SOURCE. - SOURCEThis is the larger or reference string. It can be a variable or a string of characters. - POSITIONThis value is a positive integer and references the starting point to begin reading the internal group of characters. - NThis value is a positive integer and references the number of characters to read from the starting point POSITION in the field SOURCE. Note that the SUBSTR function reads and writes Character variables only. Applications - right side application Examples;
data subs; input city $ ; city_id= substr(city, 1, 3); datalines; VAN505 CAL408

12

OTT307 ; data subs2; input dxcode $; /*input dxcode; why error/ dxcode1=substr(dxcode, 1, 3); datalines; 5791 5792 5796 ; data subs3; input longvar $15.; cards; 19JAN1985215000 27NOV1993317500 11JUL1996376250 ; data subs4; set subs3; day=substr(longvar,1,2); month=substr(longvar,3,3); year=substr(longvar,6,4); homeval=substr(longvar,10,6);

6.3 Scan function It is used if the data values contained a space or comma delimiter. Break up a character variable into separate variable. Syntax: New_variable = Scan( variable-name, word, delimiter); where word is a numeric value specifying the word you are looking for (first, second, third, etc). The default length of a variable created with SCAN is 200 characters. Examples:
data auth; input author $ 1-20 ; datalines; David F. Drak David G. Hartwell Paul S. Lovecraft Horace V. Wadpole Stuart D. Schiff ; data auth2; length first_name middle_name last_name $ 20; set auth; first_name=scan(author, 1, ' '); middle_name=scan(author, 2, ' '); last_name=scan(author, 3, ' '); proc print data=auth2; var first_name middle_name last_name; title 'Break the Author's name into Three Parts'; run; data aa;

13

input address1 $ 1-31 address2 $32-64; cards; 450 Shepard Ave. TO ON M3C7C1 450, Shepard Ave., TO, ON, M3C7C1 ; run; data ab; set aa; num1 =scan(address1, 1); st_nam1 =scan(address1, 2); city_nam1 =scan(address1, 4); state_nam1 =scan(address1, 5); p_code1 =scan(address1, 6); run; data ac; set aa; num2 =scan(address2, st_nam2 =scan(address2, city_nam2 =scan(address2, state_nam2=scan(address2, p_code2 =scan(address2, run;

1, 2, 3, 4, 5,

','); ','); ','); ','); ',');

6.4 Upcase and Lowcase functions -upcase: upper cases all letters within a variable value - lowcase: lower cases all letters within a variable value Example:
data up1; infile datalines ; input gender $ @@; datalines; f f m m ; data up2; set up1; gender=upcase(gender); run;

6.5 VERIFY function Syntax: Verify (source, excerpt) Source: specifies any SAS character expression. Excerpt: specifies any SAS character expression. If you specify more than one excerpt, separate them with a comma. Example:
data scores; input Grade : $1. @@; check='abcdf'; if verify(grade, check)>0 then put @1 'INVALID ' grade=; datalines; a b c b c d f a a q a b d d b

14

List any observations with values for grade that are not the letter abcdf See log file.
DATA EXAMPLE4; INPUT ID $ 1-4 ANSWER $ 5-9; P = VERIFY(ANSWER,'ABCDE'); OK = P EQ 0; DATALINES; 001 ACBED 002 ABXDE 003 12CCE 004 ABC E ; PROC PRINT DATA=EXAMPLE4 NOOBS; TITLE 'Listing of Example 4';

RUN;
Note: O.K =1 means is valid value.

If there are no characters in the character_var that are not in the verify_string, the function returns a zero. Wow, that sounds complicated. 6.6 Comparing INDEX, INDEXC and INDEXW functions INDEX(source,excerpt) source is a character string or text expression. excerpt is a character string or text expression. It searches source for the first occurrence of string and returns the position of its first character. If string is not found, the function returns 0. If there are multiple occurrences of the string, INDEX returns only the position of the first occurrence.
* Use INDEX to search for 'grouped letters' anywhere in a string or to search for individual letters; data one; INPUT string $25.; position=INDEX(string,'cat'); letter=INDEX(string,'c'); datalines; the cat came back catastrophic curious cat caterwauls army ; /

INDEXC(source,excerpt-1<,... excerpt-n>) source specifies the character expression to search. excerpt specifies the characters to search for in the character expression. It searches source, from left to right, for the first occurrence of any character present in the excerpts and returns the position in source of that character. If none of the characters in excerpt-1 through excerpt-n in source are found, INDEXC returns a value of 0.
/* USE INDEXC to locate the first occurrence of *any* character(s) specified in the excerpt. This example helps us determine if STRING is a street name or part of an address containing numbers */ DATA two;

15

INPUT string $25.; IF INDEXC (string,'0123456789')> 0 then has_numbers=string; ELSE no_numbers=string; CARDS; Box 101 Pine Street ;

INDEXW(source, excerpt) source specifies the character expression to search. excerpt specifies the string of characters to search for in the character expression. SAS removes the leading and trailing blanks from excerpt. The INDEXW function searches source, from left to right, for the first occurrence of excerpt and returns the position in source of the substring's first character. If the substring is not found in source, INDEXW returns a value of 0. If there are multiple occurrences of the string, INDEXW returns only the position of the first occurrence.
/* Use INDEXW to find the target excerpt in a string on a 'word boundary' */; DATA three; INPUT string $25.; IF INDEXW(string,'my') > 0 then PUT 'The text string "' string '" contains the word MY.'; CARDS; my aunt amy in the army my oh my mine examine ; RUN;

6.7 CAT, CATS, CATX, CATT functions CAT - concatenates character strings without removing leading or trailing blanks CATS - concatenates character strings and removes leading and trailing blanks CATT - concatenates character strings and removes trailing blanks CATX - concatenates character strings, removes leading and trailing blanks, and inserts separators
data test; input (x1-x4) ($); x5=' 5'; length new1 $40 new2-new4 $10 ; new1=cat(of x1-x5); new2=cats(of x1-x5); new3=catt(x1,x2,x3,x4,x5); new4=catx(',', of x1-x5); keep new:; datalines; 1 2 3 4 5 6 . 8

16

; proc print; var new1-new4; run; data test2 infile cards missover; length first last $20; input first $ last $ ; datalines; jone smith john wayne bill phil hodge ; run; data test3; set test2; name = catx(", ", of last first );/*removes leading and trailing blanks, and inserts separators*/ name1 = cat(of last first); /*without removing leading or trailing blanks */ name2 = cats(of last first); /*leading or trailing blanks*/ name3 = catt(of last first); /*trailing blanks*/ newname1=trim(left(first))||' '||left(last); newname2=catx(' ', first, last); run; proc print data = test3; run;

6.8 Compress A more general problem is to remove selected characters from a string. For example, suppose you want to remove blanks, parentheses, and dashes from a phone number that has been stored as a character value. The COMPRESS function can remove any number of specified characters from a character variable. The program below uses the COMPRESS function twice. The first time, to remove blanks from the string; the second to remove blanks plus the other above mentioned characters. Here is the code:
data phone; input phone $ 1-15; phone1 = compress(phone); phone2 = compress(phone,'(-) '); datalines; (908)235-4490 (201) 555-77 99 ; title "Listing of Data Set PHONE"; proc print data=phone noobs; run;

6.9 Complb This example will demonstrate how to convert multiple blanks to a single blank. Suppose you have some names and addresses in a file. Some of the data entry clerks placed extra spaces between the first and last names and in the address fields.
17

data multiple; input name $20. address $30.; name = compbl(name); address = compbl(address); datalines; Ron Cody 89 Lazy ; proc print data=multiple noobs; run;

Brook

Road

7. Global Statements Global statements can be specified anywhere in your SAS program, and they remain in effect until changed. FOOTNOTE for printing footnote lines at the bottom of each page %INCLUDE for including files of SAS statements LIBNAME for accessing SAS data libraries FILENAME OPTIONS for setting various SAS system options: pageno, center, linesize, pagesize, date, nodate. RUN for executing the preceding SAS statements TITLE for printing title lines at the top of each page ODS: html, listing, trace

18

Das könnte Ihnen auch gefallen