0%

Displaying Descriptive Statistics for Variables - SASlearner

This post is talking about how to display descriptive statistics for variables quickly. In the sense that we would like to know an usual and agile way to accomplish it in SAS.

The following examples show how to resolve the below questions (just very simple but quite common):

  • How to count distinct values
  • How to count variables by group
  • How to produce the frequency table of variables
  • How to calculate the statistics for variables

In R, it seems like using Hmisc::describe is available, but not the only function, other external packages or base functions like summary can also be utilized very well.

Count Values or Distinct Values

Here we use the proc sql procedure with the SAS dataset called BirthWgt, to count the Race variable.

proc sql;
    select count(Race) as cnt_race
        from sashelp.BirthWgt;
run;

But I feel just count the total number of Race variable is not make sense. If we would like to count the Married variables grouped by the Race variable:

proc sql;
    select Race, count(Married) as cnt_married
        from sashelp.BirthWgt
        group by Race;
run;

If you want to count the distinct value, add the distinct in the count function.

proc sql;
    select count(distinct Married) as distinct_married
        from sashelp.BirthWgt;
run;

Frequency Table

We can use proc freq to create frequency tables for one or more variables. Such as the example for the SomeCollege variable with missing values, sorted by Race and define the output as result dataset including cumulative frequencies and percentages.

proc sort data = sashelp.BirthWgt;
    by Race;
run;

proc freq data=sashelp.BirthWgt;
    tables SomeCollege /out=result missing outcum;
    by Race;
run;

BTW if you add a statistical argument like chisq, the result becomes the statistics for the Chi-Square Tests.

Descriptive Statistics

Otherwise we can use proc tabulate to create a table for displaying multiple statistics quickly.

proc tabulate data = sashelp.cars;
    var weight;
    table weight * (N Min Q1 Median Mean Q3 Max);
run;

But I think proc means is more convenient to save the output like:

proc means data = sashelp.cars n nmiss mean std median p25 p75 min max;
    var weight;
    output out=weight_tbl n=n nmiss=nmiss mean=mean std=std median=median p25=p25 p75=p75 min=min max=max;
run;

Reference

https://www.statology.org/sas-count-distinct/
https://www.statology.org/sas-count-by-group/
https://www.statology.org/sas-frequency-table/
https://www.codeleading.com/article/53981053526/
https://www.statology.org/proc-tabulate-sas/

Please indicate the source: http://www.bioinfo-scrounger.com