0%

I have a strong interest in data visualization. For me, the purpose to learn this skill is driven by having a good understanding and examination for the question one wants to solve.

Read more »

It is not to be denied that sas is an essential skill for statistical programmers in the pharma field. Of course sas is a programming language which can be derived to different using requirements in different fields. So I think we should follow the actual requirements in the pharmaceutical industry to learn SAS if you want to be a qualified statistical programmer. Therefore the purpose of this post is to record some actual applications by sas so that I can understand and remember sas syntax clearly.

Read more »

We all know that IML/SAS make us use R code in SAS by submit /R statement. A few months ago, I consulted with SAS support for how to import plots by R in IML into RTF templates directly as I could not find any useful information in google. Unfortunately SAS support told me if the plot was created in R, it would need to be saved within the submit block as well using R code. It means if you want to directly import R graphics to RTF, maybe you should use some R function to achieve it.

Read more »

I have kept a note about logistic regression for biomarkers using R, and mentioned that I’d like to compare the code of R and SAS. Therefore how to use SAS to estimate a logistic regression model?

Read more »

Logistic Regression is one of the machine learning(ML) used for solving classification problems. It is used to predict the result of a categorical dependent variable based on one or more continuous or categorical independent variables. I have summarized Its basic principle in one blog (https://www.bioinfo-scrounger.com/archives/750/) referring to the book of "Statistical Learning Method".

Read more »

It seems that making hex stickers have become popular for R packages with the range of packages associated with RStudio. Therefore If you would like to own it, please try these approaches as shown below.

Read more »

When we mention how to find the cutoff, the first response in our brain may be the ROC curve. Absolutely, ROC curve is a very common approach in the biomarker field to look for a fit cutoff to a reagent. However in the ROC curve, the dependent variable must be two categorical variables, which is not universal to different types of data, such as quantitative variables, survival(censored) variables.

Read more »

Generally speaking, it's not common to use pie charts to demonstrate our data, especially in statistics or research. Whereas it’s very popular in business charts, so here are some tips in drawing pie charts and bar plots.

Read more »

CLSI EP05A3 and EP15A3 as the reference

Definition of Intermediate Precision:

Intermediate precision (also called within-laboratory or within-device) is a measure of precision under a defined set of conditions: same measurement procedure, same measuring system, same location, and replicate measurements on the same or similar objects over an extended period of time. It may include changes to other conditions such as new calibrations, operators, or reagent lots. ——Intermediate precision

Read more »

I thought I used to understand the ANOVA definitely. But when I’d like to apply the MANOVA model, I found I was totally wrong. I even had no clear understanding about which variables, continuous or categorical, should be used in ANOVA. So I decided to keep notes to figure out what is the difference between ANOVA, MANOVA and ANCOVA.

Read more »

From now on, if any, I will try my best to keep notes in English to exercise written for work.

Recently I have discussed the non-standard evaluation mode in dplyr package with a colleague. Before that conversation, I always defined the mode as dynamic variables to search in google to solve related problems. Then I knowed that the dynamic mode is called “non-standard evaluation” in dplyr.

Read more »

Box-Cox变换是Box和Cox在1964年提出的一种广义幂变换方法,是统计建模中常用的一种数据变换,用于连续的响应变量不满足正态分布的情况。Box-Cox变换之后,可以一定程度上减小不可观测的误差和预测变量的相关性。

Read more »

一个SAS encoding的问题

当SAS的配置附件选择u8的sasv9.cfg后,SAS的-ENCODING参数就变成UTF-8,那么若输入数据是其他格式,如euc-cn Simplified Chinese (EUC);那么若不将SAS session 转化未UTF-8,则可能会出现以下报错:

ERROR: Some character data was lost during transcoding in the data set MYDATA.DS3. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding.

Read more »

什么是参考值范围(Reference interval):

是指来源于大量的正常人群中有关实验测定数据,并根据正常人群中不同年龄、性别分别进行统计分析,得到了绝大多数人群中数据的分布范围,并以此确定参考值范围。对超出参考值界限不大的异常值,可以根据病人的临床表现区别对待,可以采取治疗措施,也可以进行观察。

什么是医学决定水平(Medicine decide level):

是临床医生在诊断和治疗疾病时应该掌握和使用的数据,它不同于参考值的另一些限值,临床医生可以通过观察测定值是否高于或低于这些限值,可在疾病诊断中起排除或确认的作用,或对某些疾病进行分级或分类,或对预后作出估计,以提示医师在临床上应采取何种处理方式,如进一步进行某一方面的检查,或决定采取某种治疗措施等等。


Read more »