Selecting N rows from a dataset or K-th element from a macro variable is a common data manipulation process. This post is listing the ways how to resolve these questions.
First or Last N Rows
For the proc sql
method, inobs
or outobs
both can be used to select N rows from a dataset, but worth noting that it will cause differences if you join tables.
proc sql inobs = 5 /*outobs=5*/;
create table cls as
select * from sashelp.class;
quit;
For the SAS code instead of proc sql
, the most straightforward method is to utilize the obs
that is very similar to the sql
method. Otherwise if you would like to select a range of rows, just add another parameter firstobs
.
data raw_o;
set sashelp.class(firstobs = 5 obs = 10);
run;
Utilizing the _N_
variable with the IF-ELSE
statement to reach this purpose I suppose is more flexible sometimes.
data raw_1;
set sashelp.class;
if 5<= _N_ <=10 then output;
run;
So how about selecting the last rows? It seems we have to know the total number of rows, and then utilize the _N_
variable once.
data raw_2;
set sashelp.class;
if &n_rows.-4<=_N_<=&n_rows. then
output;
run;
BTW how to select N observations randomly, we can use the proc surveyselect
procedure and define method = srs
as the simple random selection method, so that we get the random 5 rows from this dataset.
proc surveyselect data = sashelp.class out = rd_class
method = srs sampsize = 5 seed = 123456;
run;
Besides for the row number, we can also add a new row number by group, as shown the following example:
proc sort data = sashelp.class out = sorted_class;
by age;
run;
data sorted_class_2;
set sorted_class;
by age;
if first.age then new_row_number=0;
new_row_number+1;
run;
K-th Elements from Macro Variable
First off, we would want to create a macro list to store information. If we just want to store a single value, as following:
proc sql;
select count(name) into: n_name trimmed from sashelp.class;
quit;
%put &n_name;
Storing multiple values is also very similar.
proc sql;
select count(name),mean(height) format=10.2
into: n_name trimmed, :mean_height trimmed
from sashelp.class;
quit;
%put &n_name &mean_height;
Or just simply want the list values to be assigned to a list of macro variables.
proc sql;
select distinct(name) into: n1-:n19 from sashelp.class;
quit;
But the above example seems like you have to know the total number of distinct values in the dataset. So maybe the common way is to store the column values in a list separated by any delimiter you want.
proc sql;
select distinct(name) into: nameList separated by ' '
from sashelp.class;
/* %let numNames = &sqlobs;*/
quit;
%put &nameList.;
Supposed that I just want the second element in this macro variable, what should I do? Maybe the %scan
function is enough to reach our purpose.
%put %scan(&nameList,2);
Obviously, it seems not as convenient to extract the element as nameList[2]
in R, but it is enough to use in SAS.
Another way is to loop the macro variable by %do
, as below to assign each element to a new column.
%let cntName = %sysfunc(countw(&nameList));
data raw;
array names[&cntName] $200 name1-name&cntName;
do i=1 to &cntName;
names[i]=scan("&nameList", i);
end;
drop i;
run;
Otherwise there are still many great documents that have been posted to show how to store and manipulate lists in SAS, like Choosing the Best Way to Store and Manipulate Lists in SAS
Reference
How to use the SAS SCAN Function?
Creating Lists! Using the Powerful INTO Clause with PROC SQL to Store Information in Macro Lists
How to Select the First N Rows in SAS
Using SAS® Macro Variable Lists to Create Dynamic Data-Driven Programs
Storing and Using a List of Values in a Macro Variable
Please indicate the source: http://www.bioinfo-scrounger.com