I have a strong interest in data visualization. For me, the purpose to learn this skill is driven by having a good understanding and examination for the question one wants to solve.
It is not to be denied that sas is an essential skill for statistical programmers in the pharma field. Of course sas is a programming language which can be derived to different using requirements in different fields. So I think we should follow the actual requirements in the pharmaceutical industry to learn SAS if you want to be a qualified statistical programmer. Therefore the purpose of this post is to record some actual applications by sas so that I can understand and remember sas syntax clearly.
We all know that IML/SAS make us use R code in SAS by
submit /R statement. A few months ago, I consulted with SAS support for how to import plots by R in IML into RTF templates directly as I could not find any useful information in google. Unfortunately SAS support told me if the plot was created in R, it would need to be saved within the submit block as well using R code. It means if you want to directly import R graphics to RTF, maybe you should use some R function to achieve it.
I have kept a note about logistic regression for biomarkers using R, and mentioned that I’d like to compare the code of R and SAS. Therefore how to use SAS to estimate a logistic regression model?
Logistic Regression is one of the machine learning(ML) used for solving classification problems. It is used to predict the result of a categorical dependent variable based on one or more continuous or categorical independent variables. I have summarized Its basic principle in one blog (https://www.bioinfo-scrounger.com/archives/750/) referring to the book of "Statistical Learning Method".
It seems that making hex stickers have become popular for R packages with the range of packages associated with RStudio. Therefore If you would like to own it, please try these approaches as shown below.
When we mention how to find the cutoff, the first response in our brain may be the ROC curve. Absolutely, ROC curve is a very common approach in the biomarker field to look for a fit cutoff to a reagent. However in the ROC curve, the dependent variable must be two categorical variables, which is not universal to different types of data, such as quantitative variables, survival(censored) variables.
Generally speaking, it's not common to use pie charts to demonstrate our data, especially in statistics or research. Whereas it’s very popular in business charts, so here are some tips in drawing pie charts and bar plots.
CLSI EP05A3 and EP15A3 as the reference
Definition of Intermediate Precision：
Intermediate precision (also called within-laboratory or within-device) is a measure of precision under a defined set of conditions: same measurement procedure, same measuring system, same location, and replicate measurements on the same or similar objects over an extended period of time. It may include changes to other conditions such as new calibrations, operators, or reagent lots. ——Intermediate precision
I thought I used to understand the ANOVA definitely. But when I’d like to apply the MANOVA model, I found I was totally wrong. I even had no clear understanding about which variables, continuous or categorical, should be used in ANOVA. So I decided to keep notes to figure out what is the difference between ANOVA, MANOVA and ANCOVA.
From now on, if any, I will try my best to keep notes in English to exercise written for work.
Recently I have discussed the non-standard evaluation mode in dplyr package with a colleague. Before that conversation, I always defined the mode as dynamic variables to search in google to solve related problems. Then I knowed that the dynamic mode is called “non-standard evaluation” in dplyr.
Excerpt from the paper： A Guide to SDTM Mapping The Process, Typical Scenarios, & Best Practices.
当SAS的配置附件选择u8的sasv9.cfg后，SAS的-ENCODING参数就变成UTF-8，那么若输入数据是其他格式，如euc-cn Simplified Chinese (EUC)；那么若不将SAS session 转化未UTF-8，则可能会出现以下报错：
ERROR: Some character data was lost during transcoding in the data set MYDATA.DS3. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding.
什么是医学决定水平（Medicine decide level）：
SAS 系统全称为 Statistics Analysis System，最早由北卡罗来纳大学的两位生物统计学研究生编制，并于 1976 年成立了 SAS 软件研究所，正式推出了 SAS 软件。经过多年的发展，SAS 已被全世界 120 多个国家和地区的近三万家机构所采用，直接用户则超过三百万人，遍及金融、医药卫生、生产、运输、通讯、政府和教育科研等领域。