11. What is Program Data Vector (PDV)?
Ans: PDV is a logical area in the memory.
How PDV is created?
SAS creates a dataset one observation at a time. Input buffer is created at the time of compilation, for holding a record from external file. PDV is created followed by the creation of input buffer. SAS builds dataset in the PDV area of memory.
12. What is DATA _NULL_?
Ans: The DATA _NULL_ is mainly used to create macro variables. It can also be used to write output without creating a dataset. The idea of “null” here is that we have a data step that actually doesn’t create a data set.
13. What is the difference between ‘+’ operator and SUM function?
Ans: SUM function returns the sum of non-missing arguments whereas “+” operator returns a missing value if any of the arguments are missing.
Suppose we have a data set containing three variables – X, Y and Z. They all have missing values. We wish to compute sum of all the variables.
The data is shown in the image below :
The output is shown in the image below :
In the output, value of p is missing for 4th, 5th and 6th observations.
14. How to identify and remove unique and duplicate values?
1. Use PROC SORT with NODUPKEY and NODUP Options.
2. Use First. and Last. Variables
The detailed explanation is shown below :
SAMPLE DATA SET
Create this data set in SAS
input ID Name $ Score;
1 David 45
1 David 74
2 Sam 45
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72
There are several ways to identify and remove unique and duplicate values:
In PROC SORT, there are two options by which we can remove duplicates.
1. NODUPKEY Option 2. NODUP Option
The NODUPKEY option removes duplicate observations where value of a variable listed in BY statement is repeated while NODUP option removes duplicate observations where values in all the variables are repeated (identical observations).
The output is shown below :
The NODUPKEY has deleted 5 observations with duplicate values whereas NODUP has not deleted any observations.
Why no value has been deleted when NODUP option is used?
Although ID 3 has two identical records (See observation 5 and 7), NODUP option has not removed them. It is because they are not next to one another in the dataset and SAS only looks at one record back.
To fix this issue, sort on all the variables in the dataset READIN.
To sort by all the variables without having to list them all in the program, you can use the keywork ‘_ALL_’ in the BY statement (see below).
PROC SORT DATA = readin NODUP;
The output is shown below :
15. Difference between NODUP and NODUPKEY Options?
Ans: The NODUPKEY option removes duplicate observations where value of a variable listed in BY statement is repeated while NODUP option removes duplicate observations where values in all the variables are repeated (identical observations).
See the detailed explanation for this question above (Q14).
16. What are _numeric_ and _character_ and what do they do?
1. _NUMERIC_ specifies all numeric variables that are already defined in the current DATA step.
2. _CHARACTER_ specifies all character variables that are currently defined in the current DATA step.
3. _ALL_ specifies all variables that are currently defined in the current DATA step.
Example : To include all the numeric variables in PROC MEANS
17. How to sort in descending order?
Ans: Use DESCENDING keyword in PROC SORT code. The example below shows the use of the descending keyword.
PROC SORT DATA=auto;
BY DESCENDING engine ;
18. Under what circumstances would you code a SELECT construct instead of IF statements?
Ans: When you have a long series of mutually exclusive conditions and the comparison is numeric, using a SELECT group is slightly more efficient than using IF-THEN or IF-THEN-ELSE statements because CPU time is reduced.
The syntax for SELECT WHEN is as follows :
WHEN (1) x=x;
WHEN (2) x=x*2;
WHEN (‘Sun’) wage=wage*1.5;
WHEN (‘Sat’) wage=wage*1.3;
19. How to convert a numeric variable to a character variable?
Ans: You must create a differently-named variable using the PUT function.
The example below shows the use of the PUT function.
charvar = put(numvar, 7.) ;
20. How to convert a character variable to a numeric variable?
Ans: You must create a differently-named variable using the INPUT function.
The example below shows the use of the INPUT function.