Doris Gao

Harrell's concordance index (c-index) can be used to evaluate the discriminatory power and the predictive accuracy of Cox models. An easy way out as a surrogate for ROC analysis.
=================================================
Approach 1
-------------------------------------------------------------------------
Use function "rcorrcens" in package "Hmisc"
Limitations:

Can only handle un-censored data
Roughly handle categorical predictor with more than 2 categories

=================================================
Sample code:
library(survival)
library(Hmisc)
attach(sample.data)
surv <- Surv(survival, censor)
rcorrcens(surv ~ group)

=================================================
Approach 2
-------------------------------------------------------------------------
Direct output from coxph
Require higher version of R, say R 2.15, didn't test with older versions
=================================================
Sample code:
library(survival)
attach(sample.data)
surv <- Surv(survival, censor)
sum.surv <- summary(coxph(surv ~ group))
c_index <- sum.surv$concordance

=================================================

Approach 3

-------------------------------------------------------------------------

Use function "survConcordance"
Result is the same as in Approach 2

=================================================

Sample code:

library(survival)

attach(sample.data)

surv <- Surv(survival, censor)

fit <- coxph( surv ~ group)

survConcordance(surv ~ predict(fit))

=================================================

Approach 4

-------------------------------------------------------------------------

Use package "survcomp" in bioconductor
Different result from Approach 2/3
The disparity is due to two different estimation approaches that are used to handle tied risks (i.e. cases when two observations have the same survival with the same x). The method that approaches 4/5 use ignores the tied risks, the other method (approaches 2/3) takes into consideration of the tied risks. In terms of formulation/symbol (for illustration only):

Approaches 4/5 used:

Concordance = #all concordant pairs/#total pairs ignoring ties.

Approaches 2/3 used:

Concordance = (#all concordant pairs + #tied pairs/2)/(#total pairs including ties).

Those #s can be obtained in the output of Approach 3.

=================================================
Sample code:

source("http://bioconductor.org/biocLite.R")
biocLite("survcomp")
library(survcomp)
surv <- Surv(survival, censor)
fit <- coxph(surv ~ group, data= sample.data)
coxPredict <- predict(fit, data=sample.data, type="risk")
concordance.index(x=coxPredict, surv.time=survival, surv.event=censor, method="noether")

=================================================

Approach 5

-------------------------------------------------------------------------

Use function "cph" in package "rms"
Provide both un-adjusted and bias adjusted c-index
Un-adjusted c-index is the same as the one from Approach 4
=================================================
Sample code:
library(rms)
surv <- Surv(survival, censor)
set.seed(1)
fit.cph <- cph(surv ~ group, data= sample.data, x=TRUE, y=TRUE, surv=TRUE)

# Get the Dxy
v <- validate.cph(fit.cph, dxy=TRUE, B=1000)
Dxy = v[rownames(v)=="Dxy", colnames(v)=="index.corrected"]
orig_Dxy = v[rownames(v)=="Dxy", colnames(v)=="index.orig"]

# The c-statistic according to Dxy=2(c-0.5)
bias_corrected_c_index  <- abs(Dxy)/2+0.5
orig_c_index <- abs(Orig.Dxy)/2+0.5

=================================================

Combining Approaches 2/3 & 4/5 and Calculate p-value for Testing Two C-indices for Both Estimation Methods

-------------------------------------------------------------------------

Did not find it elsewhere online. Hope someone could find this helpful.
=================================================
Sample code:
surv <- Surv(survival, censor)
c_index <- function(group, ties=TRUE){
fit <- coxph(surv ~ group, data=sample.data)
coxPredict <- predict(fit, data=sample.data, type="risk")

# Approaches 4/5
if (ties==F) {
concordance.index(x=coxPredict, surv.time=survival, surv.event=censor, method="noether")
}
  # Approaches 2/3
else if (ties==T) {
survConcordance(surv ~ coxPredict, data=sample.data)
}
}
c_index_ties1 <- c_index(group=group1, ties=TRUE)
c_index_ties2 <- c_index(group=group2, ties=TRUE)

c_index_no_ties1 <- c_index_ties(group=group1, ties=F)
c_index_no_ties2 <- c_index_ties(group=group2, ties=F)

# p-value of testing two c-indices ignoring ties
round(cindex.comp(c_index_no_ties1, c_index_no_ties2)$p.value,3)

# Function for p-value of testing two c-indices accounting for ties
# t-test for dependent variables is used for significance
# Input variables are objects obtained from the first function

cindex.p.ties <- function(c_index_ties1, c_index_ties2, c_index_no_ties1, c_index_no_ties2) {
   eps <- 1E-15
  n <- c_index_no_ties1$n
r <- cor(c_index_no_ties1$data$x, c_index_no_ties2$data$x, use="complete.obs", method="spearman")
if ((1 - abs(r)) > eps) {
t.stat <- (c_index_ties1$concordance - c_index_ties2$concordance) / sqrt(c_index_ties1$std.err^2 + c_index_ties2$std.err^2 - 2 * r * c_index_ties1$std.err * c_index_ties2$std.err)
diff.ci.p <- pt(q=t.stat, df=n - 1, lower.tail=FALSE)
} else { diff.ci.p <- 1 }
return(list("p.value"=diff.ci.p))
}
cindex.p.ties(c_index_ties1=c_index_ties1, c_index_ties2=c_index_ties2, c_index_no_ties1=c_index_no_ties1, c_index_no_ties2=c_index_no_ties2)

Suppose there are two parameters (START, END) to be carried from Linux shell (bash) script to R. The trick is to use the argument "--args START=1 END=10" in a SGE jobscript to submit a R job in the BATCH mode: qsub test.sh

R CMD BATCH --no-save --no-restore '--args START=1 END=10' sample.R

Here is a template test.sh

#!/bin/bash 
#$ -cwd 
#$ -N NAME_OF_JOB
#$ -M YOUR_EMAIL_ADDRESS
# email me when the job e-ends,a-aborts,s-suspends
#$ -m eas
#$ -S /bin/sh 
# Combining output/error messages into one file
#$ -j y

R CMD BATCH --no-save --no-restore '--args START=1 END=10' sample.R

In the sample.R program, it should contain


#======================================================== 

# At the beginning of R, it should contain 

#======================================================== 

## First read in the arguments listed at the command line 

args = (commandArgs(TRUE)) 



## args is now a list of character vectors 

if (length(args)==0) {

                                print("No arguments supplied.") 

                                ## Supply default values 

                                START = 0 

                                END = 0 

                               }

else {

        for (i in 1:length(args)) { 

                                            eval(parse(text=args[[i]])) 

                                           } 

       } 

#======================================================== 

print(START) 

print(END) 

#======================================================== 

# Main R program 

#======================================================== 

for (i in  START:END) {
...
}

Reference Click Me

Doris Gao

Friday, October 5, 2012

5 Ways to Estimate Concordance Index for Cox Models in R, Why Results Aren't Identical?

Wednesday, August 29, 2012

Pass Parameters from Linux/Unix Shell to R

Handling Large Datasets in R

Thursday, April 5, 2012

Decision Trees in R

Random Fields in R

A Trick to Better Search R in Google

Great News for Lazy Bones Who are Addictive to R