Base SAS Interview Questions and Answers Part – 2

11. What is Program Data Vector (PDV)?

Ans: PDV is a logical area in the memory.

How PDV is created?

SAS creates a dataset one observation at a time. Input buffer is created at the time of compilation, for holding a record from external file. PDV is created followed by the creation of input buffer. SAS builds dataset in the PDV area of memory.

12. What is DATA _NULL_?

Ans: The DATA _NULL_ is mainly used to create macro variables. It can also be used to write output without creating a dataset. The idea of “null” here is that we have a data step that actually doesn’t create a data set.

13. What is the difference between ‘+’ operator and SUM function?

Ans: SUM function returns the sum of non-missing arguments whereas “+” operator returns a missing value if any of the arguments are missing.

Suppose we have a data set containing three variables – X, Y and Z. They all have missing values. We wish to compute sum of all the variables.

The data is shown in the image below :

What is the difference between '+' operator and SUM function?

data mydata2;
set mydata;
a=sum(x,y,z);
p=x+y+z;
run;

The output is shown in the image below :

What is the difference between '+' operator and SUM function?

In the output, value of p is missing for 4th, 5th and 6th observations.

14. How to identify and remove unique and duplicate values?

1. Use PROC SORT with NODUPKEY and NODUP Options.
2. Use First. and Last. Variables

The detailed explanation is shown below :

SAMPLE DATA SET

ID Name Score
1 David 45
1 David 74
2 Sam 45
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya  72

Create this data set in SAS

data readin;
input ID Name $ Score;
cards;
1   David   45
1   David   74
2   Sam   45
2   Ram   54
3   Bane   87
3   Mary   92
3   Bane   87
4   Dane   23
5   Jenny   87
5   Ken   87
6   Simran   63
8   Priya   72
;
run;

There are several ways to identify and remove unique and duplicate values:

PROC SORT

In PROC SORT, there are two options by which we can remove duplicates.

1. NODUPKEY Option 2. NODUP Option

The NODUPKEY option removes duplicate observations where value of a variable listed in BY statement is repeated while NODUP option removes duplicate observations where values in all the variables are repeated (identical observations).

How to identify and remove unique and duplicate values?

The output is shown below :

 

 

How to identify and remove unique and duplicate values?
SAS : NODUPKEY vs NODUP

The NODUPKEY has deleted 5 observations with duplicate values whereas NODUP has not deleted any observations.

Why no value has been deleted when NODUP option is used?

Although ID 3 has two identical records (See observation 5 and 7), NODUP option has not removed them. It is because they are not next to one another in the dataset and SAS only looks at one record back.

To fix this issue, sort on all the variables in the dataset READIN.
To sort by all the variables without having to list them all in the program, you can use the keywork ‘_ALL_’ in the BY statement (see below).

PROC SORT DATA = readin NODUP;
BY _all_;
RUN;

The output is shown below :

How to identify and remove unique and duplicate values?
SAS NODUP Output

15. Difference between NODUP and NODUPKEY Options?

Ans: The NODUPKEY option removes duplicate observations where value of a variable listed in BY statement is repeated while NODUP option removes duplicate observations where values in all the variables are repeated (identical observations).

See the detailed explanation for this question above (Q14).

16. What are _numeric_ and _character_ and what do they do?

Ans: 

1. _NUMERIC_ specifies all numeric variables that are already defined in the current DATA step.
2. _CHARACTER_ specifies all character variables that are currently defined in the current DATA step.
3. _ALL_ specifies all variables that are currently defined in the current DATA step.

Example : To include all the numeric variables in PROC MEANS
proc means;
var _numeric_;
run;

17. How to sort in descending order?

Ans: Use DESCENDING keyword in PROC SORT code. The example below shows the use of the descending keyword.

PROC SORT DATA=auto;
BY DESCENDING engine ;
RUN ;

18. Under what circumstances would you code a SELECT construct instead of IF statements?

Ans: When you have a long series of mutually exclusive conditions and the comparison is numeric, using a SELECT group is slightly more efficient than using IF-THEN or IF-THEN-ELSE statements because CPU time is reduced.

The syntax for SELECT WHEN is as follows :

SELECT (condition);
WHEN (1) x=x;
WHEN (2) x=x*2;
OTHERWISE x=x-1;
END;

Example :

SELECT (str);
WHEN (‘Sun’) wage=wage*1.5;
WHEN (‘Sat’) wage=wage*1.3;
OTHERWISE DO;
wage=wage+1;
bonus=0;
END;
END;

19. How to convert a numeric variable to a character variable?

Ans: You must create a differently-named variable using the PUT function.

The example below shows the use of the PUT function.

charvar = put(numvar, 7.) ;

20. How to convert a character variable to a numeric variable?

Ans: You must create a differently-named variable using the INPUT function.

The example below shows the use of the INPUT function.

numvar=input(charvar,4.0);