Base SAS Interview Questions and Answers Part – 1

01. Difference between INPUT and INFILE

Ans: The INFILE statement is used to identify an external file while the INPUT statment is used to describe your variables.

Difference between INPUT and INFILE

Note : The variable name, followed by $ (dollar sign), idenfities the variable type as character.
In the example shown above, ID and SEX are numeric variables and Name a character variable.

02. Difference between Informat and Format

Ans: Informats read the data while Formats write the data.

Informat – To tell SAS that a number should be read in a particular format.

For example: the informat mmddyy6. tells SAS to read the number 121713 as the date December 17, 2013.

Format – To tell SAS how to print the variables.

03. Difference between Missover and Truncover

Ans: 

Missover – When the MISSOVER option is used on the INFILE statement, the INPUT statement does not jump to the next line when reading a short line. Instead, MISSOVER sets variables to missing.

Truncover – It assigns the raw data value to the variable even if the value is shorter than the length that is expected by the INPUT statement.

The following is an example of an external file that contains data:

1
22
333
4444

This DATA step uses the numeric informat 4. to read a single field in each record of raw data and to assign values to the variable ID.

MISSOVER Option

data readin;
infile ‘external-file’ missover;
input ID4.;
run;

proc print data=readin;
run;

The output is shown below :

Obs ID

1     .
2     .
3     .
4     4444

TRUNCOVER Option

data readin;
infile ‘external-file’ truncover;
input ID4.;
run;

proc print data=readin;
run;

The output is shown below :

Obs ID

1     1
2     22
3     333
4     4444

04. Purpose of double trailing @@ in Input Statement ?

Ans: The double trailing sign (@@) tells SAS rather than advancing to a new record, hold the current input record for the execution of the next INPUT statement.

DATA Readin;
Input Name $ Score @@;
cards;
Sam 25 David 30 Ram 35
Deeps 20 Daniel 47 Pars 84
;
RUN;

The output is shown below :

Purpose of double trailing @@ in Input Statement ?

05. How to include or exclude specific variables in a data set?

Ans: DROP, KEEP Statements and Data set Options

DROP, KEEP Statement

The DROP statement specifies the names of the variables that you want to remove from the data set.

How to include or exclude specific variables in a data set?

The KEEP statement specifies the names of the variables that you want to retain from the data set.

How to include or exclude specific variables in a data set?

DROP, KEEP Data set Options

The main difference between DROP/ KEEP statement and DROP= / KEEP= data set option is that you can not use DROP/KEEP statement in procedures.

How to include or exclude specific variables in a data set?

How to include or exclude specific variables in a data set?

06. How to print observations 5 through 10 from a data set?

Ans: The FIRSTOBS= and OBS= data set options would tell SAS to print observations 5 through 10 from the data set READIN.

How to print observations 5 through 10 from a data set?

07. What are the default statistics that PROC MEANS produce?

Ans: PROC MEANS produce the “default” statistics of N, MIN, MAX, MEAN and STD DEV.

08. Name and describe functions that you have used for data cleaning?

Ans: Name and describe functions that you have used for data cleaning?

9. Difference between FUNCTION and PROC

Ans: Example : MEAN function and PROC MEANS

The MEAN function is an average of the value of several variables in one observation.

The average that is calculated for a PROC MEANS is the sum of all of the values of a variable divided by the number of observations in the variable.

In other words, The MEAN function will SUM across the row and a procedure will SUM down a column.

MEAN Function

Difference between FUNCTION and PROC

The output is shown below :

Difference between FUNCTION and PROC

PROC MEANS

Difference between FUNCTION and PROC

The output is shown below :

Difference between FUNCTION and PROC

10. Differences between WHERE and IF statement?

Ans:

  1. WHERE statement can be used in procedures to subset data while IF statement cannot be used in procedures.
  2. WHERE can be used as a data set option while IF cannot be used as a data set option.
  3. WHERE statement is more efficient than IF statement. It tells SAS not to read all observations from the data set
  4. WHERE statement can be used to search for all similar character values that sound alike while IF statement cannot be used.
  5. WHERE statement can not be used when reading data using INPUT statement whereas IF statement can be used.
  6. Multiple IF statements can be used to execute multiple conditional statements
  7. When it is required to use newly created variables, use IF statement as it doesn’t require variables to exist in the READIN data set.