In numerous studies, patient satisfaction with inpatient psychiatric care is considered as an important quality indicator of mental health services. However, besides financial and organizational constraints, classical single-stage studies focusing on patient satisfaction are facing the methodological and statistical drawback of no early identification of promising or harmful treatment effects.
Using the example of a patient satisfaction survey of two psychiatric hospitals in Germany, we apply an interim analysis on the observed data (e.g. after Stage 1 in a two-stage design) to determine whether to stop the survey prior the prescheduled end of the study or not. Additionally, we are interested if the two psychiatric hospitals differ in patient satisfaction.
Among the wide field of interim analyses, we apply a highly sophisticated adaptive group sequential design which necessitates a more careful planning.
The project is designed as a prospective cohort study. Before the application of adaptive group sequential methods, we have to specify a priori all necessary details. Hence, we choose the approach by Wang and Tsiatis (1987) with K = 2 stages and power parameter Δ = .25. Then, we select the inverse normal combination test by Lehmacher and Wassmer (1999) for introducing adaptivity. Thereby, we obtain the first stage early stopping and the second stage rejection boundary α1 = .00768 and c = .0208, respectively, both based on the overall α = .025 and one-sided testing (which is numerically identical to two-sided testing and α = .05). Additionally, we set the early stopping boundary α0 = 1. Finally, we compute the critical boundary of the conditional error function of A(α1) = 0.32 and set it as the lower limit to increase the sample size.
The data are sampled at two different psychiatric hospitals. For the application of group sequential methods, a priori sample size estimation for a two groups comparison with the quantities α = .05 (two-sided), β = .1 (for 90% power), and a medium effect size of δ = .05 yields the exact total sample size of N = 170.06. This has to be corrected by the sample size inflation factor IF = 1.034 of the chosen design by Wang and Tsiatis, which gives N(IF) = 175.85 and rounds up to the next integer such that N(IF;r.) = 176 ('r.' denotes rounded). This yields nA1 = nB1 = 44 for Stage 1 and nA2 = nB2 = 44 for Stage 2 (A and B denote the two psychiatric hospitals, respectively).
The statistical data analysis has been performed with the statistics software R and IBM® SPSS® Statistics, Version 23 (SPSS: only for the t-test).
The statistical prerequisites for parametric statistical analyses (on the aggregated mean score of the questionnaire) such as variance homogeneity and normality were fulfilled. After data sampling of nA1 = nB1 = 44 for Stage 1, the independent two sample t-test revealed no sufficiently large effect for an early stopping for efficacy (t(86) = .81; p = .21; one-sided p-value). Hence, the computation of the actually achieved conditional power based on the current trend yielded 10 per cent (cp = 0.1). It is below the a priori set critical boundary of the conditional error function A(α1), which indicates to not increase the sample size. In fact, the adjusted new sample size for Stage 2 to potentially achieve a significant result after Stage 2 would necessitate an unrealistically large sample size of nA2 = nB2 = 578 for 80 and nA2 = nB2 = 765 for 90 per cent power.
Given these results after Stage 1 and the very low achieved conditional power, we decided to not increase the sample size. On top of this and based on the very clear trend and overwhelming evidence of no difference in patient satisfaction between the two psychiatric hospitals, we performed stochastic curtailment and decided to not further investigate this endpoint after Stage 1. The achieved conditional power deviates by far from the a priori intended and desired 90 per cent power for sample size estimation in the planning phase. It is obvious, that the chance of obtaining a significant result after Stage 2 is highly unlikely.
The greatest practical impact of our undertaking is the saving of a total of 88 patients of the a priori planned Stage 2. In summary, by saving on average over many studies precious resources such as time and money while enhancing the ethical standard, we conclude that sequential monitoring using adaptive group sequential designs greatly improves mental health services research.
Lehmacher, W., & Wassmer, G. (1999). Adaptive Sample Size Calculation in Group Sequential Trials. Biometrics, 55, 1286-1290.
Wang, S. K., & Tsiatis, A. A. (1987). Approximately optimal one-parameter boundaries for group sequential trials. Biometrics, 43, 193-199.