SAS Day 34:
Background Story:
Once, in my machine learning class, the professor asked what software do we use for data science? One student answered: “SAS”.
Then the professor laughed and said: “Oh dear, you must be in the wrong class, nobody uses SAS in data science industry”.
SAS stands for Statistical Analytic Software, it is most widely used in health-related fields. Although it is known Python and R are the most popular Data Science Languages, I think SAS has its strength as well, (better than excel!!).
At least it came with the Iris Dataset!
[caption id=“attachment_2240” align=“alignnone” width=“550”]
Fotomanie / Pixabay[/caption]
Today we will use Iris Dataset for Scatter Plots:
Scatter Plot Matrix
ods graphics on / height=500px width=500px;
proc sgscatter data=sashelp.iris(where=(species ="Virginica" ));
title "Fisher Iris Data";
matrix petallength
petalwidth SepalLength/ ellipse=(type=predicted)
diagonal=(histogram normal kernel);
run;
ods graphics on/reset= all;
Panel of scatter plots
ods graphics on / height=500px width=500px;
proc sgscatter data=sashelp.iris;
title "Fisher Iris Data";
plot petallength*petalwidth
sepallength*sepalwidth
petallength*sepallength
petalwidth*sepalwidth
/group=species;
run;
ods graphics on/reset= all;
As we can observe from the previous graphs, Sestosa has more differences compared with Versicolor and Virginica, which is consistent with our Iris Dataset Cluster Analysis with Python.
Personal Thought:
I used to feel a bit ashamed that i use SAS more often than Python or R, because those programs sound a lot cooler. Now, I think SAS deserve my appreciation as well, like the lyrics “Wild Lily also has Spring(野百合也有春天)”! SAS is a wonderful software with Iris Dataset!