0%

Logistic Regression is one of the machine learning(ML) used for solving classification problems. It is used to predict the result of a categorical dependent variable based on one or more continuous or categorical independent variables. I have summarized Its basic principle in one blog (https://www.bioinfo-scrounger.com/archives/750/) referring to the book of "Statistical Learning Method".

Read more »

It seems that making hex stickers have become popular for R packages with the range of packages associated with RStudio. Therefore If you would like to own it, please try these approaches as shown below.

Read more »

When we mention how to find the cutoff, the first response in our brain may be the ROC curve. Absolutely, ROC curve is a very common approach in the biomarker field to look for a fit cutoff to a reagent. However in the ROC curve, the dependent variable must be two categorical variables, which is not universal to different types of data, such as quantitative variables, survival(censored) variables.

Read more »

Generally speaking, it's not common to use pie charts to demonstrate our data, especially in statistics or research. Whereas it’s very popular in business charts, so here are some tips in drawing pie charts and bar plots.

Read more »

CLSI EP05A3 and EP15A3 as the reference

Definition of Intermediate Precision:

Intermediate precision (also called within-laboratory or within-device) is a measure of precision under a defined set of conditions: same measurement procedure, same measuring system, same location, and replicate measurements on the same or similar objects over an extended period of time. It may include changes to other conditions such as new calibrations, operators, or reagent lots. ——Intermediate precision

Read more »

I thought I used to understand the ANOVA definitely. But when I’d like to apply the MANOVA model, I found I was totally wrong. I even had no clear understanding about which variables, continuous or categorical, should be used in ANOVA. So I decided to keep notes to figure out what is the difference between ANOVA, MANOVA and ANCOVA.

Read more »

From now on, if any, I will try my best to keep notes in English to exercise written for work.

Recently I have discussed the non-standard evaluation mode in dplyr package with a colleague. Before that conversation, I always defined the mode as dynamic variables to search in google to solve related problems. Then I knowed that the dynamic mode is called “non-standard evaluation” in dplyr.

Read more »

Box-Cox变换是Box和Cox在1964年提出的一种广义幂变换方法,是统计建模中常用的一种数据变换,用于连续的响应变量不满足正态分布的情况。Box-Cox变换之后,可以一定程度上减小不可观测的误差和预测变量的相关性。

Read more »

一个SAS encoding的问题

当SAS的配置附件选择u8的sasv9.cfg后,SAS的-ENCODING参数就变成UTF-8,那么若输入数据是其他格式,如euc-cn Simplified Chinese (EUC);那么若不将SAS session 转化未UTF-8,则可能会出现以下报错:

ERROR: Some character data was lost during transcoding in the data set MYDATA.DS3. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding.

Read more »

什么是参考值范围(Reference interval):

是指来源于大量的正常人群中有关实验测定数据,并根据正常人群中不同年龄、性别分别进行统计分析,得到了绝大多数人群中数据的分布范围,并以此确定参考值范围。对超出参考值界限不大的异常值,可以根据病人的临床表现区别对待,可以采取治疗措施,也可以进行观察。

什么是医学决定水平(Medicine decide level):

是临床医生在诊断和治疗疾病时应该掌握和使用的数据,它不同于参考值的另一些限值,临床医生可以通过观察测定值是否高于或低于这些限值,可在疾病诊断中起排除或确认的作用,或对某些疾病进行分级或分类,或对预后作出估计,以提示医师在临床上应采取何种处理方式,如进一步进行某一方面的检查,或决定采取某种治疗措施等等。


Read more »

SAS 系统全称为 Statistics Analysis System,最早由北卡罗来纳大学的两位生物统计学研究生编制,并于 1976 年成立了 SAS 软件研究所,正式推出了 SAS 软件。经过多年的发展,SAS 已被全世界 120 多个国家和地区的近三万家机构所采用,直接用户则超过三百万人,遍及金融、医药卫生、生产、运输、通讯、政府和教育科研等领域。

SAS在读书那会就听说了,但是由于其极其贵。。而且在学术界用的似乎并不广?所以也就没接触过了。

Read more »

随手记一下

R语言在下载时是不区分语言版本的,但软件的语言版本是跟操作系统语言一致,比如win10操作系统设置为中文,则R语言安装后则是显示中文

Read more »

本文只是对网上收集资料的简单整理

非劣效设计临床试验指导原则中:

当确证某个药物疗效时,优效试验(试验药与安慰剂、试验药的较低剂量或阳性药相比较的优效性)一般是理想选择。当优效试验不适用时,如使用安慰剂对照不符合伦理要求,可考虑采用非劣效试验。非劣效试验是为了确证试验药临床疗效,即使低于阳性对照药,但其差异也是在临床可接受范围之内。

Read more »

一般在诊断试剂注册的临床试验中,对于acceptance criteria,一般是看其CI(confidence interval)的下限比较多(一般在sample size的时候也多设计下限条件);不同acceptance criteria对应的CI计算方法也各有不同,但是也都是比较常见的几种:

Read more »

试验中如有金标准,在与金标准比较时,应报告灵敏性和特异性、阳性似然比和阴性似然 比、阳性预测值和阴性预测值、其双侧95%置信区间。与非金标准比较时,应报告阳性一致性 百分比、阴性一致性百分比、总体一致性百分比,仅仅使用“敏感性”和“特异性”描述新试验与 非金标准的比较结果是不恰当的

而对于一致性的计算,使用Kappa检验的方法

Kappa检验由Cohen于1960年提出,因此又称为Cohen's Kappa

Read more »