Sample size calculation for studies with grouped survival data

Grouped survival data arise often in studies where the disease status is assessed at regular visits to clinic. The time to the event of interest can only be determined to be between two adjacent visits or is right censored at one visit. In data analysis, replacing the survival time with the endpoint or midpoint of the grouping interval leads to biased estimators of the effect size in group comparisons. Prentice and Gloeckler developed a maximum likelihood estimator for the proportional hazards model with grouped survival data and the method has been widely applied. Previous work on sample size calculation for designing studies with grouped data is based on either the exponential distribution assumption or the approximation of variance under the alternative with variance under the null. Motivated by studies in HIV trials, cancer trials and in vitro experiments to study drug toxicity, we develop a sample size formula for studies with grouped survival endpoints that use the method of Prentice and Gloeckler for comparing two arms under the proportional hazards assumption. We do not impose any distributional assumptions, nor do we use any approximation of variance of the test statistic. The sample size formula only requires estimates of the hazard ratio and survival probabilities of the event time of interest and the censoring time at the endpoints of the grouping intervals for one of the two arms. The formula is shown to perform well in a simulation study and its application is illustrated in the three motivating examples.

Keywords: grouped survival data, proportional hazards model, sample size calculation

1 |. INTRODUCTION

In studies with survival outcomes, it is desirable to observe the exact survival times for all subjects. In practice, however, some kind of censoring is inevitable, with right censoring being the most common, which is due to the finite study period and early dropout of study subjects. 1 While in right censoring, the survival times of some of the subjects are exactly observed, and in some other studies, the survival time can never be determined to the desired precision, which results in interval censored data. 2 Interval censored data usually arise when the event status, eg, HIV infection or disease progression, is assessed at regular visits to the clinic. When the visit times are predetermined (nonrandom), then the resulting data are called grouped survival data (see, eg, the works of Prentice and Gloeckler 3 and Li et al 4 ). Grouped survival data are common in HIV studies where subjects are tested for HIV infection at regular visits to the clinic. The time to infection is determined to be between the first visit at which the test is positive and its previous visit. Similar data are encountered in cancer clinical trials where the time to disease progression (TTP) is the endpoint of interest. TTP is the time from treatment initiation to disease progression, in which dropouts or deaths before progression are considered as censoring. In this kind of studies, regular assessments of disease progression are performed, and the TTP can only be observed to be between two adjacent assessment times. A further example comes from the study of drug toxicity where the LD50 (the dosage at which 50% of cells are killed) of a drug is of interest. In the experiments, only a finite number of doses are tested, and thus, the LD50 of the drug can only be determined to be between two adjacent doses. See the work of Njiaju et al 5 for a specific example. Here, LD50 is not a “survival time” itself, but the data can be analyzed using techniques for grouped survival data.

Prentice and Gloeckler 3 discussed previously existing methods for analyzing grouped survival data. Those methods are either computationally infeasible or give inconsistent estimators of the hazard ratio in a proportional hazards model. This motivated them to develop the maximum likelihood method for analyzing grouped data with a proportional hazards model. Although the proportional hazards model is a semiparametric model, the likelihood function of the grouped survival data only depends on finite number of parameters and the Newton-Raphson method can be used to solve for the score equation. This analysis method has been widely used in practice. As to the design of clinical trials with grouped survival data, Lui 6 and Lui et al 7 derived sample size formulae for cohort studies with grouped data in the special case of exponential distribution. Inoue and Parmigiani 8 and Raab et al 9 discussed the design of such trials but focused on the design of follow-up intervals based on parametric models for the survival time instead of sample size determination for group comparisons. Lachin 10 derived a sample size formula for grouped survival data based on the score test of the log hazard ratio. 3 In deriving the formula, the variance of the score statistic under the alternative is approximated by the variance under the null (see eq. (34) in the work of Lachin 10 ).

We are mainly motivated by clinical trials to study time to HIV infection and cancer clinical trials where the time to disease progression is the primary outcome of interest. As the first example, Flynn et al 11 reported results of a randomized placebo-controlled phase III trial to test the efficacy of a preventive HIV-1 vaccine and found that, compared with placebo, the vaccine did not prevent HIV-1 acquisition. In this trial, subjects were administered vaccine or placebo at months 0, 1, 6, 12, 18, 24, and 30, with a final follow-up visit at month 36. At each visit, blood sample was obtained to assess HIV-1 status. Flynn et al used endpoints of intervals to approximate time to HIV infection and then used partial likelihood to estimate the hazard ratio in a proportional hazards model. They found similar infection rates in the two arms and the difference was not significant. As we will show in the simulation, the commonly used method of approximating the grouped survival time with an endpoint or the midpoint of the corresponding interval and then using methods for right censored data results in biased estimation (also see the work of Panageas et al 12 ). Desirably, the consistent and efficient method in the work of Prentice and Gloeckler 3 should be used for data analysis. If a similar potential clinical trial is planned in which the method of Prentice and Gloeckler is to be used for analysis, then the question arises as to how the necessary sample size for such a trial can be determined.

For the second example, consider cancer clinical trials in which TTP is the primary endpoint for evaluating treatment effect. There are many such trials in practice. 13–15 Here, we are specifically motivated by CALGB 30607, a phase III trial studying the effect of Sunitinib, a multitargeted receptor tyrosine kinase inhibitor, versus placebo in treating advanced non–small cell lung cancer. 16 Computerized tomography (CT) scan was used to determine whether a patient has progressed after treatment initiation and was scheduled every 6 weeks for all patients until progression. The progression-free survival (PFS), which is the time to progression or death (whichever comes first), was used as the primary endpoint in the original trial. However, PFS is a composite endpoint that includes both TTP and time to death. It is either exactly observed (for death) or is observed to be in an interval (for TTP), and thus, only part of the data are grouped. Generally, one consideration for using PFS as the endpoint is to have more events in the analysis (compared to TTP), but at the same time, it entails problems such as validity of the proportional hazards model and difficulty in interpretation of this model for such type of data. 17 Therefore, it is of interest to use TTP as the primary endpoint to assess treatment effect. In this case, grouped data arise and the traditional sample size formulae for survival analysis do not apply.

We derive a sample size formula for clinical trials with grouped survival data, based on the Wald test for the logarithm of the hazard ratio in a proportional hazards model and the method of Prentice and Gloeckler for data analysis. The key is the determination of the formula for the asymptotic variance of the estimator for the log hazard ratio, which is the inverse of the efficient information of the log hazard ratio in the presence of nuisance parameters related to the baseline hazard function. While the Wald test is asymptotically equivalent to the score test (as well as the likelihood ratio test), we do not rely on any approximation of the variances, and thus, our sample size formula is different from that in the work of Flynn et al. 10 We show that the difference between the sample size using approximation of variances and the sample size without using such an approximation can be quite significant (10% to 20%). We conduct simulation to assess the performance of our sample size formula. It is shown that, while the sample size using approximation of variances may yield coverage rates far from the nominal level, our sample size formula without using approximation of variances performs much better in these cases. We also show by simulation that the common practice of replacing the grouped survival times with endpoints (or midpoints) of the corresponding intervals leads to biased estimation. An additional contribution is to show by simulation that, the power gain of the test with more frequent measurements becomes insignificant quite rapidly, thus a high frequency of measurements, ie, fine intervals, may not be necessary. In practice an “optimal” frequency can be decided by taking into account the trade-off between the efficiency of the inference and the cost of measurements. Finally, compared with the sample size calculation in trials with right censored data, we only need estimates of the baseline survival distribution at the finite number of visit times instead of the whole survival curve, which is due to the grouped nature of the data.

The following content is arranged as follows. Notation is introduced and the sample size formula is derived in Section 2. The performance of the sample size formula is assessed via simulation in Section 3, and Section 4 illustrates the wide applicability of the sample size formula in three distinct areas, including an HIV trial, a trial in lung cancer, and an in vitro experiment in studying LD50 of paclitaxel, a drug for cancer treatment. We then conclude with a discussion in Section 5. Details of the derivation of the sample size formula is put in the Appendix.

2 |. DERIVATION OF THE SAMPLE SIZE FORMULA

Suppose the survival time, denoted by T, is observed to be in one of r intervals, denoted by A_i = [a_{i − 1}, a_i), for i = 1, …, r with a₀ = 0 and a_r = ∞. Suppose that n subjects are randomized to one of two treatment arms. Let Z be the indicator for the two arms, where Z = 1 stands for the experimental treatment arm and Z = 0 stands for the placebo (or standard treatment) arm. Denote p_z = P(Z = z) for z = 0, 1, to be the randomization probabilities. Suppose the survival time of a subject falls into the Kth interval or is right censored at time a_{K − 1} (early dropout), where K can take values 1, …, r. Let Δ be the event indicator that takes value 0 if the subject is right censored and 1 otherwise. Note that T being observed in the interval [a_{r − 1}, a_r) is equivalent to T being right censored at a_{r − 1} and thus Δ = 0. The observed data consist of (K, Δ, Z). Suppose that, given Z, T follows a proportional hazards model

λ ( t | Z ) = λ 0 ( t ) e Z β ,

where λ(t|Z) is the hazard function of T given Z, λ₀(t) is the baseline hazard function, and β is the log hazard ratio that is the parameter of interest. We will derive the sample size formula for testing for H₀ : β = β₀ versus H₁ : β ≠ β₀, based on the asymptotic distribution of the maximum likelihood estimator (MLE) for β, where usually β₀ = 0. Prentice and Gloeckler 3 gave the likelihood function of the observed data conditional on covariate Z in the case that Δ = 1 holds whenever K < r, ie, T is not right censored before time a_{r − 1} (see display (2) therein). However, in practice, this may not be true due to early dropout and finite study period. Thus, we extend this by allowing T to be right censored at one of a₁, …, a_{r − 2}. Let C be the censoring time with survival function S C Z ( t ) conditional on Z = z. Here, we allow different censoring distributions in the two arms. Suppose that C is independent of T given Z. Denote

α j = e − ∫ a j − 1 a j λ 0 ( t ) d t , for 1 ≤ j ≤ r − 1.

Then, the likelihood function of the observed data conditional on Z is given by

P ( Δ = 1 , K = k | Z ) = P ( a k − 1 < T ≤ a k | Z ) P ( C >a k | Z ) = ( 1 − a k e Z β ) ∏ j = 1 k − 1 α j e Z β S C Z ( a k ) , for 1 ≤ k ≤ r − 1 ,