Title: | Imprecise Dirichlet Process for Survival Analysis |
---|---|
Description: | Functions to perform robust nonparametric survival analysis with right censored data using a prior near-ignorant Dirichlet Process. Mangili, F., Benavoli, A., de Campos, C.P., Zaffalon, M. (2015) <doi:10.1002/bimj.201500062>. |
Authors: | Francesca Mangili <[email protected]>, Alessio Benavoli <[email protected]>, Cassio P. de Campos <[email protected]>, Marco Zaffalon <[email protected]> |
Maintainer: | Francesca Mangili <[email protected]> |
License: | GPL (>= 3) | file LICENSE |
Version: | 1.2 |
Built: | 2025-03-01 04:36:26 UTC |
Source: | https://github.com/cran/IDPSurvival |
Data on patients diagnosed with AIDS in Australia before 1 July 1991.
Aids2
Aids2
This data frame contains 2843 rows and the following columns:
state
Grouped state of origin: "NSW "
includes ACT and
"other"
is WA, SA, NT and TAS.
sex
Sex of patient.
diag
(Julian) date of diagnosis.
death
(Julian) date of death or end of observation.
status
"A"
(alive) or "D"
(dead) at end of observation.
T.categ
Reported transmission category.
age
Age (years) at diagnosis.
This data set has been slightly jittered as a condition of its release, to ensure patient confidentiality.
Dr P. J. Solomon and the Australian National Centre in HIV Epidemiology and Clinical Research.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
The aml
data frame has 23 rows and 3 columns.
A clinical trial to evaluate the efficacy of maintenance chemotherapy for acute myelogenous leukaemia was conducted by Embury et al. (1977) at Stanford University. After reaching a stage of remission through treatment by chemotherapy, patients were randomized into two groups. The first group received maintenance chemotherapy and the second group did not. The aim of the study was to see if maintenance chemotherapy increased the length of the remission. The data here formed a preliminary analysis which was conducted in October 1974.
aml
aml
This data frame contains the following columns:
time
The length of the complete remission (in weeks).
cens
An indicator of right censoring. 1 indicates that the patient had a relapse
and so time
is the length of the remission. 0 indicates that the patient
had left the study or was still in remission in October 1974, that is the
length of remission is right-censored.
group
The group into which the patient was randomized. Group 1 received maintenance chemotherapy, group 2 did not.
Package boot also has the dataset aml
.
The data were obtained from
Miller, R.G. (1981) Survival Analysis. John Wiley.
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press.
Embury, S.H, Elias, L., Heller, P.H., Hood, C.E., Greenberg, P.L. and Schrier, S.L. (1977) Remission maintenance therapy in acute myelogenous leukaemia. Western Journal of Medicine, 126, 267-272.
Tests if there is a difference between two survival curves based on two samples (X and Y)
with right censored data. More precisely it test whether the probabiliy P(X<Y) is greater
than, lower than or equal to 1/2.
The prior near-ignorance Dirichlet Process (IDP) rank sum test is used.
It returns the result of the deicison.
H=1 indicates that the alternative hypothesis is true with posterior
probability greater than level
.
H=0 indicates the hypothesis is not true with posterior greater than level
,
H=2 indicates an indeterminate instance. This means that the decision depends on
the choice of the prior.
isurvdiff(formula, data, groups=c(1,2), s=0.25, alternative = c("two.sided", "less", "greater"), exact=NULL, level = 0.95, display=TRUE, nsamples=10000, rope=0, tmax=NULL)
isurvdiff(formula, data, groups=c(1,2), s=0.25, alternative = c("two.sided", "less", "greater"), exact=NULL, level = 0.95, display=TRUE, nsamples=10000, rope=0, tmax=NULL)
formula |
a formula expression of the form |
data |
an optional data frame in which to interpret the variables occurring in the formula. |
groups |
a vector of two element indicating which value of the predictor represents groups 1 and 2. |
s |
sets the value of the prior strength s of the Dirichlet Process. |
alternative |
define the direction of the test:
"greater" –evaluates the hypothesis P(X < Y)>1/2, i.e., returns H=1
if the lower probability of the hypothesis is larger than
|
exact |
computes the posterior probability if value is TRUE,
or uses a normal approximation if value is FALSE.
If you omit this argument, |
level |
sets the significance level alpha = 1- |
display |
determines whether the posterior distributions of P(X<Y) have to be plotted (TRUE) or not (FALSE). |
nsamples |
if |
rope |
introduces a (symmetric) Region of Practical Equivalence (ROPE) around 1/2, i.e., [1/2-value,1/2+value]. |
tmax |
whether to consider the difference in survival up to time tmax. NULL is the default and means without limit. |
a list with components:
h |
The decision of the test: H=0 -> accept the null hypothesis; H=1 -> rejects the null hypothesis; H=2 -> indeterminate (a robust decision cannot be made). |
prob |
the probability of the alternatice hypotesis P(X<Y)>1/2
if |
Lower.Cred.Int |
lower HPD credible interval.
Confidence level defined by |
Upper.Cred.Int |
upper HPD credible interval. Confidence level defined by |
alternative |
the direction of the test |
strata |
the number of subjects contained in each group. |
exact |
logical variable saying if the exact posterior distributions
have been computed ( |
This function implements the IDP sum-rank test describe in Mangili and others (2014).
Benavoli, A., Mangili, F., Zaffalon, M. and Ruggeri, F. (2014). Imprecise Dirichlet process with application to the hypothesis test on the probability that X < Y. ArXiv e-prints, http://adsabs.harvard.edu/abs/2014arXiv1402.2755B.
Mangili, F., Benavoli, A., Zaffalon, M. and de Campos, C. (2014). Imprecise Dirichlet Process for the estimate and comparison of survival functions with censored data.
Surv
.
data(lung,package='survival') test <-isurvdiff(Surv(time,status)~sex,lung,groups=c(1,2), alternative = 'two.sided',s=0.5, nsamples=1000) print(test) data(Aids2) fdata <- Surv(time, status) ~ T.categ dataset <- Aids2 groups=c("blood","haem") dataset["time"]<-dataset[4]-dataset[3] dataset[5]<-as.numeric(unlist(dataset[5])) test <-isurvdiff(fdata,dataset,groups=groups, alternative = 'greater',s=0.5, nsamples=1000) print(test)
data(lung,package='survival') test <-isurvdiff(Surv(time,status)~sex,lung,groups=c(1,2), alternative = 'two.sided',s=0.5, nsamples=1000) print(test) data(Aids2) fdata <- Surv(time, status) ~ T.categ dataset <- Aids2 groups=c("blood","haem") dataset["time"]<-dataset[4]-dataset[3] dataset[5]<-as.numeric(unlist(dataset[5])) test <-isurvdiff(fdata,dataset,groups=groups, alternative = 'greater',s=0.5, nsamples=1000) print(test)
Search for the maximum values of parameter s for which the IDP test isurvdiff(formula,...)
issues a determinate decision.
The function test values of s up to the parameter smax. If for smax the IDP test is still determinate,
isurvdiff.smax
returns list(smax,testout). If for s=0 the test is already indeterminate,
isurvdiff.smax
returns list(-1,testout), where testout is the last executed test.
isurvdiff.smax(formula, ..., verbose=FALSE, accuracy=0.05, smax=12)
isurvdiff.smax(formula, ..., verbose=FALSE, accuracy=0.05, smax=12)
formula |
a formula expression of the form |
verbose |
whether to display each value of s that is tried |
accuracy |
to which precision s should be computed |
smax |
to which maximum value s should be tried |
... |
All arguments of |
A list with components:
s |
The maximum value of s for which the test returns a determinate decision (H=0 or H=1). |
test0 |
The value returned by |
This function implements the IDP sum-rank test describe in Mangili and others (2014).
Benavoli, A., Mangili, F., Zaffalon, M. and Ruggeri, F. (2014). Imprecise Dirichlet process with application to the hypothesis test on the probability that X < Y. ArXiv e-prints, http://adsabs.harvard.edu/abs/2014arXiv1402.2755B.
Mangili, F., Benavoli, A., Zaffalon, M. and de Campos, C. (2014). Imprecise Dirichlet Process for the estimate and comparison of survival functions with censored data.
data(lung,package='survival') lung <- lung[1:40,] # reduced data set just to ensure that the # example is very fast to run for building the package test <-isurvdiff.smax(Surv(time,status)~sex,lung,groups=c(1,2), alternative = 'two.sided', nsamples=1000) # better to use larger value of nsamples # this small value is to run it quickly print(test$test0) cat("Maximum s giving the same decision: ",test$s)
data(lung,package='survival') lung <- lung[1:40,] # reduced data set just to ensure that the # example is very fast to run for building the package test <-isurvdiff.smax(Surv(time,status)~sex,lung,groups=c(1,2), alternative = 'two.sided', nsamples=1000) # better to use larger value of nsamples # this small value is to run it quickly print(test$test0) cat("Maximum s giving the same decision: ",test$s)
This function creates survival curves from right censored data using the prior near-ignorance Dirichlet Process (IDP).
isurvfit(formula, data, s=0.5, weights, subset, display=TRUE, conf.type=c('exact', 'approx', 'none'), nsamples=10000, conf.int= .95)
isurvfit(formula, data, s=0.5, weights, subset, display=TRUE, conf.type=c('exact', 'approx', 'none'), nsamples=10000, conf.int= .95)
formula |
a formula object, which must have a
|
data |
a data frame in which to interpret the variables named in the |
s |
sets the value of the prior strength s of the Dirichlet Process. |
weights |
the weights must be finite and nonnegative; it is strongly recommended that
they be strictly positive, since zero weights are ambiguous, compared
to use of the |
subset |
expression saying that only a subset of the rows of the data should be used in the fit. |
display |
determines whether the survival curves have to be plotted (TRUE) or not (FALSE). |
conf.type |
a variable saying how the credible interval shold be computed: 'exact': Monte-Carlo smapling from the exact distribution, 'approx': Gaussian approximation, 'none': no credible interval is computed. |
nsamples |
number pf samples used to approximate the credible intervals
if |
conf.int |
confidence level of the credible interval. |
The estimates are obtained using the IDP estimator by Mangili and others (2014) based on the prior near-ignorance Dirichlet Process model by Benavoli and others (2014).
an object of class "isurvfit"
.
See isurvfit.object
for
details. Methods defined for survfit objects are
print
and plot
.
Benavoli, A., Mangili, F., Zaffalon, M. and Ruggeri, F. (2014). Imprecise Dirichlet process with application to the hypothesis test on the probability that X < Y. ArXiv e-prints, http://adsabs.harvard.edu/abs/2014arXiv1402.2755B.
Mangili, F., Benavoli, A., Zaffalon, M. and de Campos, C. (2014). Imprecise Dirichlet Process for the estimate and comparison of survival functions with censored data.
isurvfit.object
,
plot.isurvfit
,
Surv
.
data(aml) fit <- isurvfit(Surv(time, cens) ~ 1, data=aml, display=TRUE, nsamples=1000) legend('topright', c("Lower expectation", "Upper expectation","confidence intervals"), lty=c(1,1,2),lwd=c(1,2,1)) title("IDP survival curve (s=0.5) \nAcute Myelogenous Leukemia dataset") data(Aids2) dataset <- Aids2 dataset["time"]<-dataset[4]-dataset[3] dataset[5]<-as.numeric(unlist(dataset[5])) fit <- isurvfit(Surv(time, status) ~ T.categ, dataset,s=1, subset=(!is.na(match(T.categ, c('blood','haem','het')))), nsamples=1000,conf.type='none') legend('topright',c("Heterosexual contact","Hemophilia","Blood"), title="Transmission category:",lty=c(1,1,1),col=c(1,2,3),pch=c(1,2,3)) title("IDP survival curve (s=1) \nAids dataset") print(fit) leukemia.surv <- isurvfit(Surv(time, cens) ~ group, data = aml, display=FALSE) plot(leukemia.surv) legend(100, .9, c("Maintenance", "No Maintenance"), lty=c(1,1),lwd=c(2,1), col=c('black','red'),pch=c(1,2)) title("IDP Curves\nfor AML Maintenance Study")
data(aml) fit <- isurvfit(Surv(time, cens) ~ 1, data=aml, display=TRUE, nsamples=1000) legend('topright', c("Lower expectation", "Upper expectation","confidence intervals"), lty=c(1,1,2),lwd=c(1,2,1)) title("IDP survival curve (s=0.5) \nAcute Myelogenous Leukemia dataset") data(Aids2) dataset <- Aids2 dataset["time"]<-dataset[4]-dataset[3] dataset[5]<-as.numeric(unlist(dataset[5])) fit <- isurvfit(Surv(time, status) ~ T.categ, dataset,s=1, subset=(!is.na(match(T.categ, c('blood','haem','het')))), nsamples=1000,conf.type='none') legend('topright',c("Heterosexual contact","Hemophilia","Blood"), title="Transmission category:",lty=c(1,1,1),col=c(1,2,3),pch=c(1,2,3)) title("IDP survival curve (s=1) \nAids dataset") print(fit) leukemia.surv <- isurvfit(Surv(time, cens) ~ group, data = aml, display=FALSE) plot(leukemia.surv) legend(100, .9, c("Maintenance", "No Maintenance"), lty=c(1,1),lwd=c(2,1), col=c('black','red'),pch=c(1,2)) title("IDP Curves\nfor AML Maintenance Study")
This class of objects is returned by the isurvfit
class of functions
to represent a IDP survival curve.
Objects of this class have methods for the functions print
,
plot
.
n |
total number of subjects in each curve. |
time |
the time points at which the curve has a step. |
n.risk |
the number of subjects at risk at t. |
n.event |
the number of events that occur at time t. |
n.censor |
the number of subjects who exit the risk set, without an event, at time t. (This number can be computed from the successive values of the number at risk). |
survUP |
the estimate of upper expectation of the survival probability at time t+0. |
survLOW |
the estimate of lower expectation of the survival probability at time t+0. |
survLOW0 |
the estimate of lower expectation of the survival probability at time t=0. The upper expectation is always 1. |
std.up |
the standard deviation of the upper distribution of the survival probability. |
std.low |
the standard deviation of the lower distribution of the survival probability. |
upper |
upper confidence limit for the survival curve. |
lower |
lower confidence limit for the survival curve. |
lower0 |
lower confidence limit for the survival curve at t=0. The upper is always 1. |
conf.type |
the approximation used to compute the confidence limits. |
conf.int |
the level of the confidence limits, e.g. 90 or 95%. |
strata |
number of elements of the |
call |
an image of the call that produced the object. |
isurvfit
objects
A plot of survival curves is produced, one curve for each strata.
## S3 method for class 'isurvfit' plot(x, se.fit=TRUE, ...)
## S3 method for class 'isurvfit' plot(x, se.fit=TRUE, ...)
x |
an object of class |
se.fit |
determines whether confidence intervals will be plotted. |
... |
other arguments passed to the standard plot function |
leukemia.surv <- isurvfit(Surv(time, status) ~ x, data = aml, display=FALSE) plot(leukemia.surv) legend(100, .9, c("Maintenance", "No Maintenance"), lty=c(1,1),lwd=c(2,1),col=c('black','red'),pch=c(1,2)) title("IDP Curves\nfor AML Maintenance Study")
leukemia.surv <- isurvfit(Surv(time, status) ~ x, data = aml, display=FALSE) plot(leukemia.surv) legend(100, .9, c("Maintenance", "No Maintenance"), lty=c(1,1),lwd=c(2,1),col=c('black','red'),pch=c(1,2)) title("IDP Curves\nfor AML Maintenance Study")