0%

This is reference to the "Chapter 8 A graphical compendium" in <SAS and R: Data Management, Statistical Analysis, and Graphics (second edition)>.

I believe that the capability of data science is more than just building predictive models, data visualization is also an integral part, especially in a convincing way.

Read more »

This is reference to the "Chapter 3 Statistical and mathematical functions", "Chapter 4 Programming and operating system interface" and "Chapter 5 Common statistical procedures" in <SAS and R: Data Management, Statistical Analysis, and Graphics (second edition)>.

Read more »

This is reference to the 2.3 section of Data management and 2.4 Date and time variables in <SAS and R: Data Management, Statistical Analysis, and Graphics (second edition)>.

在R中常说的数据集操作是指处理数据框类型的数据,当然有时也会是其他的数据类型。在SAS中就是数据集,SAS相比其他编程方法来说数据类型还是太少了。。

处理数据,常见的不外乎combination, collation, and subsetting

Read more »

我一直想找一个合适的方法来记录SAS的学习笔记,最好能结合以往的编程经验(如R or Python);我想到了当初学习Python的时候是根据实际需要,结合R/Perl的既往经验来互补学习,那么SAS也是可以这样。

Read more »

What is Git?

Git is a version control system used to track changes in computer files. Git's primary purpose is to manage any changes made in one or more projects over a given period of time. It helps coordinate work among members of a project team and tracks progress over time. Git also helps both programming professionals and non-technical users by monitoring their project files.

What is Gitlab?

GitLab is a web-based Git repository that provides free open and private repositories, issue-following capabilities, and wikis. It is a complete DevOps platform that enables professionals to perform all the tasks in a project—from project planning and source code management to monitoring and security. Furthermore, it allows teams to collaborate and build better software.

Read more »

I have a strong interest in data visualization. For me, the purpose to learn this skill is driven by having a good understanding and examination for the question one wants to solve.

Read more »

It is not to be denied that sas is an essential skill for statistical programmers in the pharma field. Of course sas is a programming language which can be derived to different using requirements in different fields. So I think we should follow the actual requirements in the pharmaceutical industry to learn SAS if you want to be a qualified statistical programmer. Therefore the purpose of this post is to record some actual applications by sas so that I can understand and remember sas syntax clearly.

Read more »

We all know that IML/SAS make us use R code in SAS by submit /R statement. A few months ago, I consulted with SAS support for how to import plots by R in IML into RTF templates directly as I could not find any useful information in google. Unfortunately SAS support told me if the plot was created in R, it would need to be saved within the submit block as well using R code. It means if you want to directly import R graphics to RTF, maybe you should use some R function to achieve it.

Read more »

I have kept a note about logistic regression for biomarkers using R, and mentioned that I’d like to compare the code of R and SAS. Therefore how to use SAS to estimate a logistic regression model?

Read more »

Logistic Regression is one of the machine learning(ML) used for solving classification problems. It is used to predict the result of a categorical dependent variable based on one or more continuous or categorical independent variables. I have summarized Its basic principle in one blog (https://www.bioinfo-scrounger.com/archives/750/) referring to the book of "Statistical Learning Method".

Read more »

It seems that making hex stickers have become popular for R packages with the range of packages associated with RStudio. Therefore If you would like to own it, please try these approaches as shown below.

Read more »

When we mention how to find the cutoff, the first response in our brain may be the ROC curve. Absolutely, ROC curve is a very common approach in the biomarker field to look for a fit cutoff to a reagent. However in the ROC curve, the dependent variable must be two categorical variables, which is not universal to different types of data, such as quantitative variables, survival(censored) variables.

Read more »

Generally speaking, it's not common to use pie charts to demonstrate our data, especially in statistics or research. Whereas it’s very popular in business charts, so here are some tips in drawing pie charts and bar plots.

Read more »

CLSI EP05A3 and EP15A3 as the reference

Definition of Intermediate Precision:

Intermediate precision (also called within-laboratory or within-device) is a measure of precision under a defined set of conditions: same measurement procedure, same measuring system, same location, and replicate measurements on the same or similar objects over an extended period of time. It may include changes to other conditions such as new calibrations, operators, or reagent lots. ——Intermediate precision

Read more »

I thought I used to understand the ANOVA definitely. But when I’d like to apply the MANOVA model, I found I was totally wrong. I even had no clear understanding about which variables, continuous or categorical, should be used in ANOVA. So I decided to keep notes to figure out what is the difference between ANOVA, MANOVA and ANCOVA.

Read more »

From now on, if any, I will try my best to keep notes in English to exercise written for work.

Recently I have discussed the non-standard evaluation mode in dplyr package with a colleague. Before that conversation, I always defined the mode as dynamic variables to search in google to solve related problems. Then I knowed that the dynamic mode is called “non-standard evaluation” in dplyr.

Read more »