History of Interim Monitoring
he impetus for interim monitoring of clinical trials evolved principally
from some of the early multi-center clinical trials conducted in the 1960's by
the National Institutes of Health in the US and the Medical Research Council in
the UK The University Group Diabetes Program (UGDP), one of the early NIH
sponsored clinical trials, was one of the first clinical trials to employ an
independent data and safety monitoring board or DSMB. This was also one of the
first clinical trials in which the DSMB recommended termination of one of the
arms of the study prematurely based on interim results. Other major clinical
trials which followed, such as the Coronary Drug Project, also employed an
independent data monitoring committee.
hese and other trials stimulated the development of statistical methods
for the sequential interim monitoring of emerging results. From the
beginning, it was well known that the problem of repeated significance tests
would distort the operating characteristics of basic statistical tests and
confidence intervals. Various statistical approaches have been developed over
the years to address this problem. Among the earliest was the work by Cornfield
using a Bayesian approach, followed by the RST plans of Armitage and colleagues,
which evolved into the now common group sequential procedures of Pocock,
O'Brien-Fleming, and Slud-Wei, among others, and then the more general spending
functions of Lan and DeMets and the stochastic curtailment procedures of Lan,
Simon and Halperin. These methods have now been applied to numerous statistical
procedures. We are now at the stage where it is possible to implement a
statistical monitoring procedure for virtually any type of statistical analysis
one might envision for a clinical trial.
n 1978 the NIH issued guidelines which require that all NIH-funded clinical
research should employ a procedure for safety monitoring. In 1988 the FDA
issued guidelines which addressed issues related to the statistical analysis of
clinical trials, including interim analyses. Largely in response to the FDA
guidelines, the PMA issued guidelines for the implementation of interim analyses
in industry-sponsored trials. More recently, the 1997 International Conference
on Harmonization (ICH) issued its Draft Guidelines on Statistical Principles for
Clinical Trials which included recommendations on the implementation of interim
monitoring in clinical trials.
n this presentation I would like to contrast the objectives of interim
monitoring in these settings and to offer some recommendations as to when it is
appropriate to consider interim monitoring in a pharmaceutical
industry-sponsored trial, and how it might best be implemented.
Interim Monitoring Objectives in Public and Industry-Sponsored Trials
he basic objectives of interim monitoring focus on 3 principal issues:
1. to protect the safety of the patients enrolled in the trial;
2. to terminate the trial as early as possible so that the best treatment may
then be made available to all subjects. This is sometimes called the ethical
argument for interim monitoring. And of course:
3. to reduce the cost of a study by terminating that study early if there is
overwhelming evidence that the treatment is effective or that it is ineffective.
nterim monitoring has also been used to select one among many doses or
different drugs, or also to re-evaluate the sample size. In my presentation,
however, I am going to address only the 3 main objectives.
n NIH-sponsored trial is very different from an industry-sponsored trial. For
an NIH-sponsored trial there is only one audience: the scientific and the
clinical community. Such trials involve studies of non-pharmacologic
interventions such as intensive treatment in diabetes, new surgical procedures
such as laser treatment for diabetic eye disease, new uses of established agents
such as ACE inhibition in diabetic kidney disease, the evaluation of competing
agents such as various anti-arrhythmic drugs in the CAST trial, orphan drugs
such as chenodiol for the dissolution of gallstones, and occasionally studies of
novel new agents. In most NIH studies, the mechanism is to publish the results
in a major medical journal and then implement treatments in eligible patients.
he industry model, however, is quite different. The audience consists
foremost of the FDA and then the clinical community. The trials are designed to
evaluate new pharmacologic agents or new devices, or new indications for
established agents or devices. The mechanism requires that the results of the
studies undergo FDA review and approval, in addition at times to publication of
the results, prior to the treatment of appropriate patients. In this context,
the role of interim monitoring in meeting the three principal objectives is
different under these two models.
oth the NIH and industry models place a premium on patient safety. This is
largely not a statistical issue. However, the mechanisms used to monitor
adverse effects differ. In NIH trials, this responsibility is vested completely
with the independent DSMB. In industry-sponsored trials, there are rigorous
safeguards for patient due to the extensive laboratory screening and due to the
immediate reporting of a wide range of potential adverse effects directly to the
clinical monitor, who has access to the treatment code.
n my opinion, the ethical objective of offering the best treatment to all
patients as soon as possible applies to the NIH model but does not necessarily
apply to the industry model. If an industry-sponsored trial is stopped early
due to overwhelming signs of effectiveness in the opinion of the DSMB, the drug
still may not be made generally available outside of that clinical trial,
especially to the general public, until after the FDA has had the opportunity to
review and approve the complete New Drug Application. In fact, early
termination in this case may backfire and lead to a delay of FDA approval if the
FDA has concerns over the process of interim monitoring itself, or later
concludes that the evidence provided by the trial is found lacking in some
respect.
imilar considerations also apply to cost savings. In a government or
industry-sponsored trial, there would be substantial savings in total costs if
the study is stopped prematurely due to adverse events or if it is stopped early
due to lack of effectiveness. However, in an industry-sponsored trial the
potential for cost savings is questionable if a study is stopped early due to
interim signs of effectiveness. There will be no cost savings if early
termination in fact eventually delays FDA approval of the new drug.
Mechanism for Interim Monitoring of Public and Industry Sponsored Trials
s stated previously, the mechanisms for interim monitoring largely differ
between NIH and industry sponsored trials. These differences also involve other
features of trial management. In the NIH trial, all trial data are promptly
entered into the data management system as it is collected in the clinic, often
daily but no less than weekly. All data are immediately edited for errors using
computerized editing procedures. NIH trials generally do not employ CRAs to
harvest the data. The principal advantage of this process is that all of the
data are immediately available for analysis and review. Periodically, every 6
to 12 months, the DSMB reviews analyses of all outcome data to assess the
differences between groups in overall effectiveness, and adverse effects so as
to derive an judgment of the overall benefit/risk ratio. These analyses
generally present risk ratios based on both numerators and denominators for each
outcome. This approach relies heavily on an empirical evaluation of all outcome
data, which requires statistical adjustments for the multiple sequential
analyses of the data. Many of these procedures provide a stopping
boundary, so called because termination of the trial may be justified when
the boundary is crossed which indicates that statistical significance has been
achieved. However there is some flexibility in acting on these boundaries.
Basically, statistical procedures for computing stopping rules or
boundaries have the same purpose of any statistical procedure: they allow one
to assess the strength of evidence in the data. The multiple analyses are then
used by the committee to form an overall opinion or Gestalt regarding the
overall benefit to risk ratio.
ased on these comprehensive analyses, the DSMB may recommend termination
of the trial. In general, some of the considerations which enter into that
decision are whether the accumulated data provide compelling, conclusive results
with respect to the principal outcomes in an intention-to-treat analysis. Here
this is not purely a matter of statistical significance for any one outcome.
Rather, it is a matter of whether all of the objectives of the trial have been
met, or alternately whether no further gains are expected to accrue if the trial
were to continue to its originally designed conclusion. The first trial which
was modified on the recommendation of the DSMB was the University Group Diabetes
Program. Here the DSMB met to review all accumulating data, but there were no
formal stopping rules yet developed. The DSMB recommended termination of the
tolbutamide arm of the trial when an excess of deaths was observed compared to
the placebo group. As in all trials, mortality was monitored, but there was no
advance concern that any of the treatments under study would affect mortality.
When the results were nominally significant, without adjustment for repeated
sequential analyses, the DSMB recommended termination. An outcry followed
because many did not believe the result. Some have suggested that the
tolbutamide group was in fact stopped too early. Another major NIH trial was
the Beta-Blocker Heart Attack Trial which was terminated when a statistically
significant reduction in mortality was manifest. This trial is of academic
interest because it is through the process of developing monitoring boundaries
for this trial that the idea of the alpha spending function emerged. Because
there was a single dominant outcome, the considerations in stopping were simple.
his is not always the case. One of the most complex trials in my experience
was the Diabetes Control and Complications Trial (DCCT). This trial had one
primary outcome but numerous secondary outcomes, some in fact being more serious
clinical outcomes than the primary outcome. Thus, the trial was terminated over
a year after the primary outcome analysis had crossed the boundary
in order to accrue additional evidence that treatment had an impact on all of
the complications of diabetes. In all of these instances, the mechanism was to
publish, then treat. One of the few industry-sponsored clinical trials to have
been terminated early was the study of AZT for AIDS in pregnancy. This actually
was a co-sponsored trial by the NIH and industry, but it was designed and
monitored from the perspective of the regulatory requirements for an NDA.
ll of this is substantially different from the traditional interim monitoring
procedure for safety in industry sponsored trials centered about the role of a
clinical monitor. First, the data are often not entered in a timely manner.
The data forms are periodically harvested by a CRA, usually every few months,
and later the forms often are entered into a data base management system, in
some cases towards the end of the study. The safety net is the clinical monitor
appointed by the sponsor who monitors the adverse event reports submitted by the
study investigators. However, there is no ongoing statistical analysis of the
relative risks of adverse events between groups, in the sense of a computation
of a risk ratio or a p-value. Also, there is no ongoing analysis of the
benefits of therapy to allow the assessment of a benefit to risk ratio.
n part due to the successes of interim monitoring of NIH sponsored trials,
and the proliferation of statistical methods for interim monitoring, the use of
a DSMB or independent data monitoring committee (IDMC) has become more common in
industry sponsored trials. To do so requires some changes in the way the data is
collected and managed by the sponsor. In order to conduct accurate interim
analyses for the DSMB, the data must be collected, entered into the data base
management system, and edited for errors in a timely manner. It is critical
that a clean data base be locked or closed out as of a fixed date
prior to the preparation of analyses for the DSMB. However, in an
industry-sponsored trial, the DSMB has a more limited role and less flexibility
in the way it approaches its task. In an industry-sponsored trial, there must
be precise assessment of the type I error probability for each analysis which
may later be pivotal in the sponsors application for marketing. This
requires that the operations of the DSMB be pre-specified as much as possible
with respect to the outcomes to which formal sequential boundaries are to be
applied and the statistical methods to be employed. Among these, the principal
decisions with regulatory implications concern the choice of the primary outcome
and the choice of the primary analysis strategy, both for the final analysis and
also for the interim analyses.
ome of these issues are illustrated by the recent clinical trial comparing
zidovudine (AZT) Vs placebo, administered to a pregnant woman with HIV and her
infant, in order to prevent the transmission of HIV to the infant. The trial
was designed to assess the effect of AZT on the rate of transmission of HIV to
the infant up to one year after birth. The usual approach to the analysis of
such a study might be an analysis of cumulative incidence. However, the
investigators wisely reasoned that since the study was designed to capture
information up to one year in the majority of patients, then the monitoring
procedure should be based upon a landmark analysis of the cumulative incidence
at one year. In this case, it would be undesirable to monitor the results using
the log rank test for the hazard function of the events. In fact, in a 1984
paper, Fleming, Green and Harrington (Controlled Clinical Trials) pointed out
that the log rank test or any rank test may in fact be misleading in the context
of interim monitoring. In this case it would be more appropriate to base the
monitoring procedure on a Z-test for proportions at a fixed point in time, such
as the overall proportion of events observed by one year in the cohort of
patients so followed, or estimated from the survival curves at one year.
lthough the results were described by a Kaplan-Meier cumulative incidence
curve, the test statistic was based on the difference between the 72 week
cumulative incidences, estimated from this curve, using the Greenwood-estimated
variances. The difference between the 72 week cumulative incidence was 8.3% in
the AZT group Vs 25.5% in the placebo group, with z=4.03, greater than that
specified by the monitoring boundary. At the FDA review, (as a guest of the
agency), this analysis was considered compelling. Had the study been monitored
by a sequential logrank test, the results might have "crossed the boundary"
sooner, not yielding a compelling difference at 72 weeks.
General Issues in Interim Monitoring
(top of page)
ith this background, the following are some general issues which should be
considered in establising a DSMB for a Phase III clinical trial. The most
important is to emphasize that the judgment to terminate a trial should not be
based only on achieving a given p-value, but rather requires careful
consideration of a variety of issues, some clinical, some statistical, and some
regulatory.
he first issue concerns the nature of the outcome assessments. In all
clinical trials, whether under the NIH or the industry model, a major
consideration is the primacy of a single outcome versus the importance of other
outcomes. Perhaps the only instance in which there is no ambiguity is where the
principal outcome is all cause mortality. In almost any other situation, the
importance of various study outcomes can be debated. If there is a difference
of opinion as to the importance of the outcome which leads to early termination
of a trial, then the impact of that trial may be jeopardized.
n equally important consideration is whether the safety or toxicity
profile of the therapy has been adequately assessed. Phase III trials are
conducted to evaluate the potential adverse effects of an agent, and to do so
usually requires many more patient years of exposure to the agent than is
required to establish clinical effectiveness. Therefore, any decision to
terminate a trial due to a demonstration of effectiveness must be considered in
terms of the adequacy of the safety profile so far established from this and
other trials. The central issue is whether the benefit:risk ratio will still be
adequately assessed if a trial is stopped prematurely.
he DSMB must consider that data collection is a dynamic process, and that
yet-to-be observed features of the data may impact on the ultimate credibility
of the study. Among the most important considerations is the impact of
potential losses to follow-up at the time of an interim monitoring. One never
has complete ascertainment of all events at any interim look due to the built in
lags of data reporting and collection. This should be a major consideration in
any decision to terminate a study prematurely.
he DSMB must also assess the totality of the evidence from the trial. To
the extent possible, the DSMB and the statistical center serving the DSMB,
should conduct the same panels of analyses which would be employed in the final
marketing application to describe the overall consistency of the trial results.
These would include the consistency of the treatment effect among the various
secondary or related outcome measures related to the natural history of the
disease or to the mechanism of action of the agent. These would also include
analyses of the consistency of the overall treatment effect across clinic
populations and across subgroups of the population. The bottom line is that
the DSMB must consider whether early termination would affect the overall
precision with which the trials results address the demonstration of
effectiveness and of safety, and the overall credibility of the trial in the
regulatory review process.
n fact, the ICH guidelines state that Most clinical trials intended to
support the efficacy and safety of an investigational product should proceed to
full completion of planned sample size accrual; trials should be stopped early
only for ethical reasons or if power is no longer acceptable.
General Issues in FDA Review
(top of page)
lthough the FDA and the ICH are open to the use of a DSMB in the monitoring of
a clinical trial, from my experience there are a variety of issues that may be
raised in the FDA review of a clinical trial in which interim monitoring was
performed, whether or not it was terminated early. Some of these are addressed
in the FDA, PMA and ICH guidelines, but many are not.
he first issue is the concern that the process of interim monitoring may
appear to introduce bias into the study results. Here we must draw the
distinction between clinical monitoring on the one hand versus data monitoring
of the aggregate group data. By clinical monitoring I refer to monitoring the
overall implementation of the protocol and the recording and classification of
study events. If individuals involved in data monitoring are also involved in
clinical monitoring, then it is possible that the entire study may be "tainted"
due to concerns that the process of study management may introduce a bias in the
study results. For this reason, there may be concerns raised if the sponsor
plays any role in the interim monitoring process.
nother issue is the duration of exposure for the assessment of the safety
or potential toxicity of an agent. If a trial is terminated early, then the
duration of exposure in individual patients and the overall patient years of
exposure in the cohort will be reduced. Both are important considerations,
especially in indications where prolonged use of the agent is anticipated.
nother issue is whether or not an adequate number of outcome events have
been observed, apart from considerations of p values or levels of significance.
If the indication is a discrete event, such as total mortality, there may be
concerns about approving an NDA in which fewer than 100 patients reach that
outcome during the two or more pivotal trials, irrespective of the observed
relative risks and p values.
related consideration is the time course of the effects of the agent.
Again to use mortality as an example, if the median survival time is one year,
then it is reasonable to desire data showing the effects of the drug up to and
beyond one year of treatment. If a study is terminated early based on the first
six months of observation, then there will be few patients exposed for a year,
thus jeopardizing the overall NDA. This issue must be addressed in the
planning of the trial since the statistical approach to monitoring the trial
will differ for a trial aimed at establishing a short term effect, for which a
logrank test of significance may be appropriate, versus a trial aimed at
establishing long-term effects, for which a proportions test at a specific point
in time is more appropriate, as was the case in the AZT in pregnancy study
mentioned earlier. Finally, all plans, procedures, analyses, meetings, data
reviews and recommendations must be completely documented if the trial is to be
credible to the regulatory agencies.
Recommendations
y overall recommendation is that interim monitoring should not be considered
as routine practice in industry-sponsored trials. I feel that interim
monitoring should principally be considered in instances where:
1. there are pre-existing safety concerns for which a data and safety
monitoring board may provide an added measure of safety beyond that provided by
the usual clinical monitor;
2. early termination for effectiveness would be so clear as to not
jeopardize in any way the FDA review and approval of the resulting NDA; which in
turn requires that
3. a single, dominant, unambiguous outcome measure is employed. In
addition, due to potential concerns over the impact of the overall study
management on the biases in the outcome results, the sponsor should not
participate in the reviews by the DSMB in any way.
he DSMB should be completely external to the sponsor. The ICH guidelines
state that when there are sponsor representatives on the IDMC, their role
should be clearly defined in the operating procedures of the committee... (and)
the procedures should also address the control of dissemination of interim trial
results within the sponsor organization. If any employee of the sponsor
participates in the review of the emerging data, it is difficult to provide
complete assurance that the information was in fact not disseminated or that it
had no effect on the conduct of the trial.
or this reason, I also recommend that the statistician member of the DSMB
should not be associated with the study, and the operational statistician
responsible for the conduct of the study and the final analyses of the study
results should not attend the DSMB meetings. In order to maintain complete
masking of the sponsor, it is preferable that the analyses for presentation to
the DSMB should be conducted by an independent statistician outside of the
company, possibly the statistician member of the DSMB. In this case, the data
are provided to an external independent statistician who also has access to the
unmasked treatment code and who then conducts the interim analyses for
presentation only to the external DSMB in a closed meeting.
inally, as stressed in the various guidelines, all criteria for early
termination should be explicitly described in the protocol including the number
of planned looks, the approximate information times, the outcomes to be
monitored and the statistical techniques to be employed. The sponsor should
also request that the DSMB maintain complete documentation of interim looks with
archival of the interim database, the interim analyses, and the deliberations of
the committee. With these recommendations, I believe it is possible to
implement interim monitoring in appropriate industry-sponsored trials in a
manner which will meet the trial objectives and also minimize the potential
adverse effects that interim monitoring itself may have on the review of the
study and its results.
DIA298tx.htm DIA meeting 2/98 in California
|