Section 2.3 - SAS Statements SAS statements are inserted into DATA and PROC steps to provide the necessary instructions to read and process the data in a DATA step or to analyze data in a PROC step. A few statements are entered into the working environment of the command file outside a DATA or PROC step (usually at the very beginning of the file or whenever the purposes of having these statements need to be invoked); other statements found in the DATA step do not begin with a keyword and cause some specific programming action (such as a calculation) to take place. 1. All SAS statements (except formulas) begin with a SAS keyword, and all statements (including formulas) must end with a semicolon (;). They may begin on any line and in any column. Keywords for commonly used SAS statements in the DATA step are: DATA < > ; SET < > ; MERGE < > ; BY < > ; INFORMAT < > ; INFILE < > ; FORMAT < > ; ARRAY < > ; KEEP < > ; DROP < > ; RETAIN < >; LABEL < > ; INPUT < > ; IF < > ; ELSE < > ; DO < > ; OUTPUT < >; END; RUN; The brackets < > imply your choice of options, variable names, or file names are to be inserted following the keyword. This list of keywords and many others will be described in subsequent chapters. 2. More than one statement may appear on any line as long as each one ends with a semicolon; however, for readability and diagnostic purposes one statement per line is usually preferred. 3. If you're running SAS on a UNIX system, note that commands at the UNIX prompt are case-sensitive. This means that any external filenames, directories, or unix system commands must be entered exactly in the case in which unix expects them. [As noted earlier, keywords, options, and formulas within the SAS program are *not* case sensitive, so you may use any combination of upper- or lower-case letters for the statements themselves.] 4. A statement may cross many lines as long as the SAS keywords, variable names, quotes, or numbers are not divided by a line break. For example, an INPUT statement (used in the DATA step) may be divided across lines in the following way: * In the INPUT statement, specifying variable names across a line is OK; DATA new; INFILE 'c:\mydata\quest.dat' missover; INPUT var1 var2 var3 var4 var5 var6 var7 var8 var9 var10; RUN; 5. Use heirarchical indentations to improve readability, especially for statements placed within DO loops (see example programs below). 6. A few statements require character text that is to be printed to the output listing. Enclose all character strings of this nature within a SAS statement with single or double quotes, e.g., 'yes' or "Yes". If you want to use a contraction within a phrase, be sure to enclose with double quotes, e.g., TITLE1 "It's a great day!"; 7. SAS utilizes special kind of variable names indexed by a number. When you have a set of variables of one data type (all characters, all numbers, etc.) collected on each observation (e.g., responses from a survey), you can begin a variable name with one or more characters followed by a number that increases with each variable by one integer (e.g., for responses to questions from a survey assign q1 q2 q3 q4 q5 q6, etc., where the number represents the survey item). This notation allows SAS to group variables together in a much abbreviated version, e.g., q1-q6 that is especially efficient in ARRAY, KEEP, RETAIN, LENGTH, FORMAT, VAR, and many other SAS statements. As an example, the following two SAS statements are equivalent: VAR q1 q2 q3 q4 q5 q6; VAR q1-q6; Imagine the economy this feature provides when you have 239 questions in a survey! If you don't know how many variables begin with the same letters, the colon modifier will access all of them. DATA _null_; SET p2; PUT (q:) (best3.) ; Run; The PUT statement with (n:) (best3.) expands the n stem to capture all of the n1-nx variables and writes them all of them in a format containing 3 columns that 'best' represents their actual values (e.g., integer, real). %INCLUDE The %INCLUDE statement allows you to enter statements (e.g., DATA steps, PROC FORMAT to produce needed formats, ESTIMATE or CONTRAST statements) from an external file at any point in your primary command file. SAS will interpret them just as if they were placed directly in the file at that point. For example, you can write all your formats in a file called formats.sas and then invoke them in your primary command file by entering the single line: %INCLUDE 'c:\data\formats.sas' ; It is also especially helpful if you have many ESTIMATE or CONTRAST statements in PROCs GLM or MIXED, as they can get rather long and tedious with many factors and interactions. In the log file, a + (plus sign) appears to tell you the statements have been added. The + merely indicates the lines come from files read with an %INCLUDE statement. It doesn't serve any operational purpose. The %INCLUDE statement acts somewhat like a SAS macro (see Chapter 9) in that what is going on at this point is text substitution.