Graphically Speaking

Getting started with SGPLOT – Part 3 – VBOX

The Tukey box plot is popular among statisticians for viewing the distribution of an analysis variable with or without classifiers. The figure on the right is from the SGPLOT Box Plot documentation showing all the features of the box.

The code shown below creates the simplest box plot graph which displays the distribution of the analysis variable Cholesterol.SAS Certifications Tutorials and Materials, SAS Certifications Guide, SAS Certifications, SAS Certifications Syllabus, SAS Series Plot

title ‘Distribution of Cholesterol’;
proc sgplot data=sashelp.heart;
vbox cholesterol;
run;

SAS Certifications Tutorials and Materials, SAS Certifications Guide, SAS Certifications, SAS Certifications Syllabus, SAS Series Plot

The graph on the right shows the results of the procedure step above and displays a box for the variable Cholesterol. The display includes a box spanning the Q1-Q3 inter-quartile range, with a line drawn at the median value. A marker is used to display the mean value. Whiskers are drawn to the observation nearest to the “Fence” as defined in the doc mentioned above, and “outlier” observations are displayed above and below the fences. See the online documentation for the GTL Box Plot for all the details of the various statistics that are displayed.

SAS Certifications Tutorials and Materials, SAS Certifications Guide, SAS Certifications, SAS Certifications Syllabus, SAS Series Plot

Box Plot by Category: The code below creates a box plot graph by a category variable – DeathCause. Note, we have used the XAXIS statement to remove the display of the label name on the axis.

title ‘Distribution of Cholesterol by Death Cause’;
proc sgplot data=sashelp.heart;
vbox cholesterol / category=deathcause;
xaxis display=(nolabel);
run;

The graph on the right displays the distribution of the cholesterol values by death cause. Note, by default the graph will try to split long axis tick values at the “white space” in the valSAS Certifications Tutorials and Materials, SAS Certifications Guide, SAS Certifications, SAS Certifications Syllabus, SAS Series Plot

Connect: A connect line is drawn connecting the mean statistic across the categories using the CONNECT=mean option. The connect line can connect any statistic like mean, median, Q1, Q3 etc.

For this graph, we have also simplified the layout by dropping the frame border of the wall, the axis lines, and added y-axis grids. This presents the data in an alternative visual manner that reduces clutter and is pleasing to the eye. A DATASKIN is set for visual effect.

title ‘Distribution of Cholesterol by Death Cause’;
proc sgplot data=sashelp.heart noborder;
vbox cholesterol / category=deathcause
connect=mean fillattrs=graphdata3
dataskin=gloss;
xaxis display=(noline nolabel noticks);
yaxis display=(noline noticks nolabel) grid;
run;

SAS Certifications Tutorials and Materials, SAS Certifications Guide, SAS Certifications, SAS Certifications Syllabus, SAS Series Plot

Grouped Box Plot: One additional classifier can be added – GROUP. The graph on the right displays the distribution of Cholesterol by death cause and sex. This is a common graph type useful in the Clinical Research domain where we want to view the results by category and treatment.

title ‘Distribution of Cholesterol by Death Cause’;
proc sgplot data=sashelp.heart noborder;
vbox cholesterol / category=deathcause
group=sex clusterwidth=0.5
boxwidth=0.8 meanattrs=(size=5)
outlierattrs=(size=5);
xaxis display=(noline nolabel noticks);
yaxis display=(noline noticks nolabel) grid;
run;

Cluster width can be set to make the cluster of boxes for each category tighter. Here we have set CLUSTERWIDTH=0.5, making the boxes for each category are more tightly packed. BOXWIDTH can also be used to make the individual boxed narrower or wider. BOXWIDTH=1 will make the boxes within each cluster touch. Attributes for the mean marker and outlier markers can be set using the appropriate ATTRS option.

SAS Certifications Tutorials and Materials, SAS Certifications Guide, SAS Certifications, SAS Certifications Syllabus, SAS Series Plot

Notches: Notches can be displayed by using the NOTCH option. The graph on the right shows the result of the program shown below. Notches are displayed and the box width is reduced to 20% of the available spacing. The whisker cap is removed by setting CAPSHAPE.

title ‘Distribution of Cholesterol by Death Cause’;
proc sgplot data=sashelp.heart noborder;
vbox cholesterol / category=deathcause
boxwidth=0.2 meanattrs=(size=6)
notches capshape=none ;
xaxis display=(noline nolabel noticks);
yaxis display=(noline noticks nolabel) grid;
run;

SAS Certifications Tutorials and Materials, SAS Certifications Guide, SAS Certifications, SAS Certifications Syllabus, SAS Series Plot

Whisker Percentile: The graph on the right shows how to control the whisker percentile. This is popular option requested by many users. WHISKER=value (0-25) can be used to set the length of the whisker as a percentile. WHISKER=1 creates a graph with 99% Whisker percentile.

By default, the box plot makes the category axis discrete. This happens even if the category variable is numeric or time. There are many cases where we want to see the distribution of some variable by a numeric x variable, such as weeks or over time. In such cases, we want the boxes to be positioned on the x-axis with the correct scale. This is supported and can be done by setting TYPE=LINEAR on the x-axis. We will discuss this in more detail in a subsequent article.

Full SAS Code:

%let gpath=’.’;
ods html close;
%let dpi=200;
ods listing gpath=&gpath image_dpi=&dpi;

/*–VBox–*/
ods listing image_dpi=200;
ods graphics / reset width=2in height=3in imagename=’VBox’;
title ‘Distribution of Cholesterol’;
proc sgplot data=sashelp.heart;
vbox cholesterol;
run;
title;

/*–VBox by Category–*/
ods listing image_dpi=200;
ods graphics / reset width=4in height=3in imagename=’VBoxByCat’;
title ‘Distribution of Cholesterol by Death Cause’;
proc sgplot data=sashelp.heart;
vbox cholesterol / category=deathcause;
xaxis display=(nolabel);
run;
title;

/*–VBox by Category Connect–*/
ods listing image_dpi=200;
ods graphics / reset width=4in height=3in imagename=’VBoxByCatConnect’;
title ‘Distribution of Cholesterol by Death Cause’;
proc sgplot data=sashelp.heart noborder;
vbox cholesterol / category=deathcause connect=mean
fillattrs=graphdata3 dataskin=gloss;
xaxis display=(nolabel);
xaxis display=(noline nolabel noticks);
yaxis display=(noline noticks nolabel) grid;
run;
title;

/*–VBox by Category and Group–*/
proc sort data=sashelp.heart out=heart;;
by descending sex;
run;

/*–VBox by Category and Group–*/
ods listing image_dpi=200;
ods graphics / reset width=4in height=3in imagename=’VBoxByCatGroup’;
title ‘Distribution of Cholesterol by Death Cause’;
proc sgplot data=heart noborder;
vbox cholesterol / category=deathcause group=sex
clusterwidth=0.5 boxwidth=0.8
meanattrs=(size=5) outlierattrs=(size=5);
xaxis display=(noline nolabel noticks);
yaxis display=(noline noticks nolabel) grid;
run;
title;

/*–VBox by Category Notches–*/
ods listing image_dpi=200;
ods graphics / reset width=4in height=3in imagename=’VBoxByCatNotch’;
title ‘Distribution of Cholesterol by Death Cause’;
proc sgplot data=sashelp.heart noborder;
vbox cholesterol / category=deathcause notches
boxwidth=0.2 meanattrs=(size=6) capshape=none ;
xaxis display=(noline nolabel noticks);
yaxis display=(noline noticks nolabel) grid;
run;
title;

/*–VBox by Category Whisker Percentile–*/
ods listing image_dpi=200;
ods graphics / reset width=4in height=3in imagename=’VBoxByCatPct’;
title “Distribution of Cholesterol by Death Cause”;
footnote j=l italic “Whisker Percentile = 99%”;
proc sgplot data=sashelp.heart noborder;
vbox cholesterol / category=deathcause
boxwidth=0.2 meanattrs=(size=6) whiskerpct=1;
xaxis display=(noline nolabel noticks);
yaxis display=(noline noticks nolabel) grid;
run;
title;

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s