0%

Combine RTF files into one file

This post is just a note referred from one article as shown below that I think would be beneficial for anyone who is as new as I am, as this requirement is fairly common in pharamaceutical programming.

A SAS macro to combine portrait and landscape rtf files into one single file

In order to make it suitable for every condition as follows, I will additionally perform an update so that it can be more flexible.

  • containing multiple table, figure and list at the same time
  • using the title as the index of table contents
  • order the files manually (Just provide a solution, have not implemented yet.)

First of all, let's look at the RTF's structure, which is referred to that article.

rtf_structure

It is divided into three parts: opening section, content section and closing section. If we look at our single rtf, that structure is still the same. Consequently, the rtf combining process can be summarized as follows:

  • Read all filenames into SAS (sorted by filename or defined by manual order).
  • Keep the open section of first RTF.
  • Remove both opening and closing sections except the first and the last RTF. And add \pard\sect code in front of \sectd so that all of the files can be combined correctly.
  • Keep the closing section of last RTF.
  • Save the updated RTF code into each SAS dataset. (Do not be saved in a single dataset, as the character length is limited in SAS.)

Now let's see the code we can use in this process. Firstly I import the rtf filenames from the external folder.

data refList(keep=filepath fn);
    length fref $8 fn $80 filepath $400;
    rc = filename(fref, "&inpath");

    if rc = 0 then
        dirid = dopen(fref);

    if dirid <= 0 then
        putlog 'ERR' 'OR: Unable to open directory.';
    nfiles = dnum(dirid);

    do i = 1 to nfiles;
        fn = dread(dirid, i);
        fid = mopen(dirid, fn);

        if fid > 0 and index(fn,"rtf") then do;
            filepath="&inpath\"||left(trim(fn));
            fn = strip(tranwrd(fn,".rtf",""));
            output;
        end;
    end;

    rc = dclose(dirid);
run;

Secondly, read each line in rtf file until find one line that starts with \sectd, which means the above is openning section, and below is content section. And remove the last } except the last rft file.

data rtfdt&i(where = (ptline=1));
    retain ptline;
    set rtfdt&i end = last;

    if substr(line,1,6)='\sectd' then do;
        ptline = 1;

        /*enable to combine portrait and landscape rtf*/
        line="\pard\sect"||compress(tranwrd(line,"\pgnrestart\pgnstarts1",""));
    end;

    if last and line^='}' then
        line=substr(strip(line),1,length(strip(line))-1);
    else if last and line='}' then delete;
run;

Thirdly, when you find the title code in rtf, replace the \pard with \pard\outlinelevel1 so that this title can be identified as index for content table.

%if &titleindex = 1 %then %do;

    data rtfdt&i.;
        set rtfdt&i.;
        retain fl 0;

        if index(line,'\pard\plain\') and (not index(line,'\header\pard')) and (not index(line, '\footer\pard')) then
            fl=1+fl;
    run;

    data rtfdt&i;
        set rtfdt&i;
        by fl notsorted;

        if fl=1 and first.fl then /*add index for the contents as per titles*/
            line=tranwrd(line,'\pard','\pard\outlinelevel1');
    run;

%end;

At last, don't save above rtf contents in one single SAS dataset because as the character length is limited in SAS. And add the } as the closing section so that keep the rtf file complete.


The total code as shown below:

/*Example*/
/*%s_combrtf(inpath=&inpath,outpath=&outpath,outfile=&outfile);*/
/*Parameter Description*/
/*inpath        input path*/
/*outpath       output path*/
/*outfile       output file name*/
/*titleindex    whether to add title index, default is 1*/

%macro s_combrtf(inpath= ,outpath= ,outfile= ,titleindex=1);

    data refList(keep=filepath fn);
        length fref $8 fn $80 filepath $400;
        rc = filename(fref, "&inpath");

        if rc = 0 then
            dirid = dopen(fref);

        if dirid <= 0 then
            putlog 'ERR' 'OR: Unable to open directory.';
        nfiles = dnum(dirid);

        do i = 1 to nfiles;
            fn = dread(dirid, i);
            fid = mopen(dirid, fn);

            if fid > 0 and index(fn,"rtf") then do;
                filepath="&inpath\"||left(trim(fn));
                fn = strip(tranwrd(fn,".rtf",""));
                output;
            end;
        end;

        rc = dclose(dirid);
    run;

    /*sort by filename by default*/
    proc sort data = refList sortseq = linguistic(numeric_collation=on) out = sorted_refList;
        by fn;
    quit;

    data fileorder;
        set sorted_refList;
        FileLevel = 2;
        order = .;
    run;

    data _null_;
        set fileorder  end=last;
        fnref=strip("filename fnref")||strip(_N_)||right(' "')||strip(filepath)||strip('" lrecl=5000 ;');
        call execute(fnref);

        if last then
            call symputx('maxn',vvalue(_n_), 'l');
    run;

    %do i=1 %to &maxn.;

        data rtfdt&i.;
            infile fnref&i. truncover;
            informat line $5000.;
            format line $5000.;
            length line $5000.;
            input line $1-5000;
            line=strip(line);
        run;

        /*add title index and adapt to more flexible*/
        %if &titleindex = 1 %then %do;

            data rtfdt&i.;
                set rtfdt&i.;
                retain fl 0;

                if index(line,'\pard\plain\') and (not index(line,'\header\pard')) and (not index(line, '\footer\pard')) then
                    fl=1+fl;
            run;

            data rtfdt&i;
                set rtfdt&i;
                by fl notsorted;

                if fl=1 and first.fl then
                    /*add index for the contents as per titles*/
                    line=tranwrd(line,'\pard','\pard\outlinelevel1');
            run;

        %end;

        %if &i.=1 %then %do;

            data final;
                set rtfdt&i(keep = line) end = last;

                if last and line^='}' then
                    line=substr(strip(line),1,length(strip(line))-1);
                else if last and line='}' then delete;
            run;

        %end;

        %if &i.^=1 %then %do;

            data rtfdt&i(where = (ptline=1));
                retain ptline;
                set rtfdt&i end = last;

                if substr(line,1,6)='\sectd' then do;
                    ptline = 1;

                    /*enable to combine portrait and landscape rtf*/
                    line="\pard\sect"||compress(tranwrd(line,"\pgnrestart\pgnstarts1",""));
                end;

                if last and line^='}' then
                    line=substr(strip(line),1,length(strip(line))-1);
                else if last and line='}' then delete;
            run;

        %end;

        %if &i.=&maxn. %then %do;
            %local _cnt;

            data final;
                set final

                    %do _cnt=2 %to &maxn;
                        rtfdt&_cnt(keep = line)
                    %end;
                ;
            run;

            data final;
                set final end = last;

                if last then
                    line=strip(line)||strip("}");
            run;

        %end;
    %end;

    data _null_;
        file "&outpath\&outfile..rtf" lrecl=5000 nopad;
        set final;
        put line;
    run;

%mend;

This appoach, in my opinion is quite excellent as it can resolve the issues as follows:

I know that different company put titles and footnotes in different places. Some may place them in header & footer section and some may place them in the body of rtf document. Above macro will works no matter how you place the titles and footnotes.


Reference

A SAS macro to combine portrait and landscape rtf files into one single file Combine multiple RTF files to one file
SM05: An Efficient Way to Combine RTF Files and Create Multi-Level Bookmarks and a Hyperlinked TOC
utl-sas-macro-to-combine-rtf-files-into-one-single-file