0%

Merge and Transpose - SASlearner

If you often work with data manipulation, obviously you need to know how to merge and transpose data as that is very common in our data processing.

In R, you can use the dplyr package to left_join or inner_join and other functions *_join() to handle your data in any form. As per to transposing, we generally call it Pivot data, and use corresponding pivot_wider and pivot_longer() functions from the tidyr package to handle your data from long to wide, or the opposite. This how to use R to pivot data could refer to my post. Pivoting data in R

As for the SAS, let's see this with step-by-step examples.


Merge

In SAS, it defines the merge processing as three approaches.

  • one-to-one reading
  • concatenating
  • Match-merging

The first two ways are the usage of set, not to discuss in this post. The third way I suppose is the most common in our data processing.

First, we create two example datasets as shown below.

proc sort data = sashelp.class;
    by Name;
run;

data cls1;
    set sashelp.class(keep = Name Sex obs = 10);
run;

data cls2;
    set sashelp.class(keep = Name Age firstobs = 5 obs = 15);
run;

Then we define the most important argument by to specify which variable to join by. Before merging, we must make sure the dataset is sorted by that by variable.

Why?

A simple answer is that the SAS match-merge is based on the classic sequential match algorithm, and the latter is based on the premise that all input streams are sorted identically.

data clss;
    merge cls1 cls2;
    by Name;
run;

Actually in my option, the above code is not commonly used, that is just similar with the full_join all x rows, followed by unmatched y rows. Maybe I would like to use merge processing such as left_join, inner_join. In this case we need to specify the IN= argument in the merge statement.

For instance, post-process like left_join.

data clss2;
    merge cls1(in = x) cls2(in = y);
    by Name;
    if x;
run;

post-process like inner_join.

data clss2;
    merge cls1(in = x) cls2(in = y);
    by Name;
    if x and y;
run;

It can be seen above that we can use IN= to control which rows to be filtered.

However, considering we have to sort the dataset first, I sometimes would like to use proc sql to merge data. Since it’s close to the form I used in R.

proc sql;
    create table clss3 as 
        select x.*, y.* from cls1 as x
            left join cls2 as y on x.Name=y.Name;
quit;

Transpose

In my opinion, for transpose processing, it's better to learn and understand it with a few examples. Simply memorizing the arguments is easy to confuse. I suggest running each code and seeing the output, and thinking of how to realize it.

So let's see the examples to show transpose a dataset from long to wide, i.e. rows to columns.

First, create an example data from sashelp.shoes dataset.

data shoes;
    set sashelp.shoes;
    if Subsidiary in ("Johannesburg" "Nairobi");
    keep Region Product Subsidiary Sales Inventory;
run;

By default, the transpose procedure only transposes the numeric columns from long to wide and ignores any character column. But actually in normal work, we generally will define a series of arguments like prefix, name and label. Besides with the var statement, we can select which column or columns you want to transpose. And for the id statement you can use the variable of a column as the new variable names.

proc transpose data = shoes(where = (Subsidiary = "Johannesburg")) 
    out = res;
    var Sales;
    id Product;
run;

If you want to group the data by a variable, then add the by argument.

proc transpose data = shoes out = res;
    var Sales;
    id Product;
    by Subsidiary;
run;

If you want to re-define the columns _NAME_ and _LABEL, then add the name and label options.

proc transpose data = shoes out = res name = var_name label = label_name;
    var Sales;
    id Product;
    by Subsidiary;
run;

Then we see an example to show how to transpose data from wide to long.

proc transpose data = res(drop = var_name label_name) out = res2;
    var Boot Sandal Slipper;
    by Subsidiary;
run;

Reference

SAS MERGING TUTORIAL
MATCH MERGING DATA FILES IN SAS | SAS LEARNING MODULES
SAS:数据合并简介
Complete Guide to PROC TRANSPOSE in SAS
HOW TO RESHAPE DATA WIDE TO LONG USING PROC TRANSPOSE | SAS LEARNING MODULES

Please indicate the source: http://www.bioinfo-scrounger.com