SAS Day 33: Box Plot
Definition:
Box Plot or Whisker plot displays the distribution of 5-number summary of a dataset: minimum, maximum, q1, q3, and Median.
Interpreting quartiles:
The 5-number summary approximately divides the data into 4 sections that each containing 25% of the data.
Explore a little more
If we want to look at the Outliers, we define the points below q1- 1.5(q3-q1) and q3+ 1.5(q3-q1) as outliers.
Note: if we transfer the Q1-Q3 range of a boxplot into a normal distribution, then it maps to the peak of a normal curve (± 0.6745σ).
[caption id=“attachment_2204” align=“alignnone” width=“750”]
akshayapatra / Pixabay[/caption]
Example:
we will use sashelp.class as an example for box-plot using SGPLOT and TEMPLATE, they both produce the same result!
**Basic Box-Plot **
Interpretation:
the median weight of female student is a little lower than 90, 25% of female students’ weight are within 75- 82, 25% are within 105-115 and 50% are between 85-102.
Code:
SPGLOT
proc sgplot data=sashelp.class;
title “Distribution of Weight by Sex”;
vbox weight / category= sex;
run;
TEMPLATE
proc template;
define statgraph ClassBox;
begingraph;
entrytitle “Distribution of Weight by Sex”;
layout overlay;
boxplot y=weight x=sex ;
endlayout;
endgraph;
end;
run;proc sort data=sashelp.class out=class;
by sex;
run;
proc sgrender data=class template=ClassBox;
run;
Advance Box Plot:
Code:
proc univariate data=sashelp.class;
var weight ;
class sex;
ods output quantiles =q;
run;data q2(rename=(estimate=weight) where=(Quantile ne " “));
set q;
quantile= scan(quantile, 2,”");
run;proc template;
define statgraph bpp;
begingraph;
entrytitle “Distribution of Weight by Sex” ;
layout overlay;
boxplotparm y=weight x=sex stat=quantile;
endlayout;
endgraph;
end;
run;proc sgrender data=q2 template=bpp;
run;
with the extra univariate step, we have a summary dataset to look for cross-validate the graph.
we can see indeed the min of female students weight is 50.
Reference:
Creating Statistical Graphics in SAS,
*Warren F.Kuhfeld *