This post is talking about how to display descriptive statistics for variables quickly. In the sense that we would like to know an usual and agile way to accomplish it in SAS.
The following examples show how to resolve the below questions (just very simple but quite common):
- How to count distinct values
- How to count variables by group
- How to produce the frequency table of variables
- How to calculate the statistics for variables
In R, it seems like using Hmisc::describe
is available, but not the only function, other external packages or base
functions like summary
can also be utilized very well.
Count Values or Distinct Values
Here we use the proc sql
procedure with the SAS dataset called BirthWgt, to count the Race
variable.
proc sql;
select count(Race) as cnt_race
from sashelp.BirthWgt;
run;
But I feel just count the total number of Race
variable is not make sense. If we would like to count the Married
variables grouped by the Race
variable:
proc sql;
select Race, count(Married) as cnt_married
from sashelp.BirthWgt
group by Race;
run;
If you want to count the distinct value, add the distinct
in the count
function.
proc sql;
select count(distinct Married) as distinct_married
from sashelp.BirthWgt;
run;
Frequency Table
We can use proc freq
to create frequency tables for one or more variables. Such as the example for the SomeCollege
variable with missing values, sorted by Race
and define the output as result
dataset including cumulative frequencies and percentages.
proc sort data = sashelp.BirthWgt;
by Race;
run;
proc freq data=sashelp.BirthWgt;
tables SomeCollege /out=result missing outcum;
by Race;
run;
BTW if you add a statistical argument like chisq
, the result becomes the statistics for the Chi-Square Tests.
Descriptive Statistics
Otherwise we can use proc tabulate
to create a table for displaying multiple statistics quickly.
proc tabulate data = sashelp.cars;
var weight;
table weight * (N Min Q1 Median Mean Q3 Max);
run;
But I think proc means
is more convenient to save the output like:
proc means data = sashelp.cars n nmiss mean std median p25 p75 min max;
var weight;
output out=weight_tbl n=n nmiss=nmiss mean=mean std=std median=median p25=p25 p75=p75 min=min max=max;
run;
Reference
https://www.statology.org/sas-count-distinct/
https://www.statology.org/sas-count-by-group/
https://www.statology.org/sas-frequency-table/
https://www.codeleading.com/article/53981053526/
https://www.statology.org/proc-tabulate-sas/
Please indicate the source: http://www.bioinfo-scrounger.com