0%

Cox proportional hazards model

参考自:DATA MINING Desktop Survival Guide by Graham Williams

主要为了理解Cox比例风险模型以及其预测结果

首先看下lung数据集对于sex协变量的cox结果

library(survival)

l.coxph <- coxph(Surv(time, status) ~ sex, data=lung)
>summary(l.coxph)
Call:
coxph(formula = Surv(time, status) ~ sex, data = lung)

  n= 228, number of events= 165 

       coef exp(coef) se(coef)      z Pr(>|z|)   
sex -0.5310    0.5880   0.1672 -3.176  0.00149 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    exp(coef) exp(-coef) lower .95 upper .95
sex     0.588      1.701    0.4237     0.816

Concordance= 0.579  (se = 0.021 )
Likelihood ratio test= 10.63  on 1 df,   p=0.001
Wald test            = 10.09  on 1 df,   p=0.001
Score (logrank) test = 10.33  on 1 df,   p=0.001

其中sex=1代表male,sex=2代表female;从上述结果可看出,coef为log(Hazards ratio),即cox回归方程中的beta;所以HR(Hazards ratio)为0.5880,即exp(coef)

HR小于1,说明female的风险低于male(根据sex值的大小来,second group relative to the first group);而exp(-coef)则是反之,是first group relative to the second,其值大于1,同样说明male风险高于female


接下来我想基于训练集得到的cox回归模型,预测新测试集的预后结果,可使用predict函数,并看下其不同参数(type="lp"riskexpectedterms)的区别

results <- rownames_to_column(lung, var = "id") %>%
  select(c("id", "time", "status", "sex")) %>%
  bind_cols(tibble(lp = predict(l.coxph, type="lp"),
                   risk = predict(l.coxph, type="risk"),
                   expected = predict(l.coxph, type="expected"),
                   terms = predict(l.coxph, type="terms")))
> head(results, 10)
   id time status sex         lp      risk  expected        sex
1   1  306      2   1  0.2096146 1.2332026 0.8280030  0.2096146
2   2  455      2   1  0.2096146 1.2332026 1.3880051  0.2096146
3   3 1010      1   1  0.2096146 1.2332026 3.5060567  0.2096146
4   4  210      2   1  0.2096146 1.2332026 0.5185439  0.2096146
5   5  883      2   1  0.2096146 1.2332026 3.5060567  0.2096146
6   6 1022      1   1  0.2096146 1.2332026 3.5060567  0.2096146
7   7  310      2   2 -0.3214090 0.7251266 0.4999485 -0.3214090
8   8  361      2   2 -0.3214090 0.7251266 0.6107055 -0.3214090
9   9  218      2   1  0.2096146 1.2332026 0.5367625  0.2096146
10 10  166      2   1  0.2096146 1.2332026 0.3309150  0.2096146

对于type="lp"参数,所计算的值可以作为线性预测变量的各个组成部分,若值大于0,说明该subject有相比mean survival(这个计算自训练集)有high risk,小于0则是low risk。按照上述逻辑,其计算方式如下:

> (lung[1,"sex"] - l.coxph$means) %*% coef(l.coxph)["sex"]
          [,1]
[1,] 0.2096146

对于type="risk"参数,所计算的值作为exp后的结果(risk score exp(lp) ),计算subject的risk value;若value大于1,说明其高于人群平均水平,低于1则反之;通过对risk排序,可得到待预测的人群中top 20%的subjects

> exp((lung[1,"sex"] - l.coxph$means) %*% coef(l.coxph)["sex"])
         [,1]
[1,] 1.233203

对于type="expected"参数,其计算的是对应subject在随访观察期内预期发生的事件的数目,The survival probability for a subject is equal to exp(-expected)

对于type="terms"参数,假如cox中只有一个变量(且这个变量是二分类变量),其计算结果同lp。。我是这么理解的,这个type是指类的意思

假如我再加上ph.ecog变量

options(na.action=na.exclude)
l.coxph <- coxph(Surv(time, status) ~ sex+ph.ecog, data=lung)
results <- rownames_to_column(lung, var = "id") %>%
  select(c("id", "time", "status", "sex", "ph.ecog")) %>%
  bind_cols(tibble(lp = predict(l.coxph, type="lp"),
                   risk = predict(l.coxph, type="risk"),
                   expected = predict(l.coxph, type="expected"),
                   terms = predict(l.coxph, type="terms")))

查看这个协变量的组合个数:

> table(results$sex, results$ph.ecog)
   
     0  1  2  3
  1 36 71 29  1
  2 27 42 21  0

那么lp参数下就会有7个系数:

> length(table(results$lp))
[1] 7

而从type参数来看,其类只有2+4=6个:

> length(table(results$terms))
[1] 6

以上是根据网上资料简单的搬抄。。。。

本文出自于http://www.bioinfo-scrounger.com转载请注明出处