John M. Lachin
John Wiley and Sons, 2000
ISBN: 0-471-36996-9
hank you for your interest in this book. Throughout the book I
describe the implementation of the various methods using SAS
procedures, when available. Here I have also included my own SAS
programs and macros referred to in the book that were used to perform
supplemental computations. My programs are "workmanlike"
they get the job done without a lot of bells and whistles.
Thus, the programs may require a little study to see how they work.
Rather than allow the user to specify various options, and to customize
the program to meet the needs of an analysis, my programs do whole sets
of computations simultaneously. For example, the program for the
analysis of multiple 2x2 tables does all analyses for the risk
difference, odds ratio and relative risk scales simulatneously. Other
than for exercises, the user should decide a priori exactly which
elements of the output are to be employed. It is cheating to inspect
all the various computations and then choose that one which provides
the most "favorable" results.
The materials are organized by chapter. The individual elements may
be downloaded directly to your local computer and then accessed using a
basic word processor or by SAS. The entire contents of this site are
available as a single zip file for windows/Mac users, and as a tar file
for unix users. (These links will be provided at a later
date.)
One of the main features of the book are extensive problem sets for
each chapter. However, I am sure that many more could have been
included. I welcome suggestions for additional Problems (or Examples).
Any appropriate suggestions will also be posted on this site (with
attribution) so that others may also benefit. Check the supplement with
additional examples, problems and data sets. I have also included here
some additional programs for computations related to those presented in
the text. (Supplement to be provided at a later date.)
In our graduate program, I teach a one semester course from this text
(Statistics 225) that is required of all of our MS and PhD students in
Biostatistics, and all PhD students in Epidemiology. All students are
required to have completed a two semester course in mathematical
statistics at the level of Hogg and Craig. See the syllabus for my one
semester course. (Syllabus to be provided at a later date.)
The following are the reference materials for each chapter,
principally SAS programs and macros. This link contains the data sets employed for the analyses presented in the
book. These are "flat files" which can be downloaded and
accessed directly.
Chapter 1
his introductory chapter describes the natural history of diabetic
nephropathy and the results of the
Diabetes Control and
Complications Trial. This link provides, among other information,
the complete DCCT bibliography, with links to the abstract of each
paper, and in some cases, to the complete manuscript. The following are
the links for the specific papers referred to in this book.
- The DCCT Research Group. The Diabetes Control and Complications
Trial (DCCT): Update. Diabetes Care, 13:427-433, 1990. Abstract
- The DCCT Research Group. The Diabetes Control and Complications Trial
(DCCT): Update. Diabetes Care, 13:427-433, 1990. Abstract
- The Diabetes Control and Complications Trial Research Group. The
effect of intensive treatment of diabetes on the development and
progression of long-term complications in insulin-dependent diabetes
mellitus. The New England Journal of Medicine, 329:977-986,
1993. Abstract
- The Diabetes Control and Complications (DCCT) Research Group.
Effect of intensive therapy on the development and progression of
diabetic nephropathy in the Diabetes Control and Complications Trial.
Kidney International, 47:1703-1720, 1995. Abstract
- The Diabetes Control and Complications Trial Research Group. The
relationship of glycemic exposure (HbA1c) to the risk of
development and progression of retinopathy in the Diabetes Control and
Complications Trial. Diabetes, 44:968-983, 1995. Abstract
- The Diabetes Control and Complications Trial Research Group. Adverse
events and their association with treatment regimens in the Diabetes Control
and Complications Trial. Diabetes Care, 18:1415-1427, 1995.
Abstract
- The Diabetes Control and Complications Trial Research Group. The
absence of a glycemic threshold for the development of long-term
complications: The perspective of the Diabetes Control and Complications
Trial. Diabetes, 45:1289-1298, 1996. Abstract
- The Diabetes Control and Complications Trial Research Group.
Hypoglycemia in the Diabetes Control and Complications Trail.
Diabetes, 46:271-286, 1997.
Abstract
- Diabetes Control and Complications Trial Research Group (DCCT)
(2000). The effect of pregnancy on microvascular complications in the
diabetes control and complications trial. Diabetes Care,
23:1084-1091. Abstract
Chapter 2
his chapter describes the basic analysis of proportions and a 2x2
table. The methods in f 4 are generalizations of the methods in
this chapter to allow for a stratified analysis. The book describes
analyses using the SAS PROC FREQ. I have also written macros to perform
many of these analyses. See the descriptions under Chapter 4. The
following programs are provided:
exact2x2.sas:
This program takes the exact confidence limits on the odds ratio from
StatXact (or SAS) and then computes the corresponding limits on the
probabilities, and the exact corresponding limits on the risk difference
and the relative risk. For illustration, the exact computations in SAS
PROC FREQ are also presented for the odds ratio, as well as the
asymptotic limits for the other scales.
Figure21.sas:
generates figures relating the odds ratio, relative risk and risk
difference when one measure is held constant over a range of values of
p2. A version of this was used to generate Figure 2.1 where the
risk difference was held constant.
hypr2x2.sas:
computes hypergeometric probabilites for all possible 2x2 tables with
fixed margins, as a function of a specified odds ratio.
par2x2.sas:
computes the population attributable risk and C.I. using Walter's
limits, Leung-Kupper limits, and the logit limits derived in the
text.
tab2x2.sas:
SAS Proc FREQ analysis of basic 2x2 tables.
Table23.sas:
SAS Proc FREQ analysis of basic 2x2 tables.
Chapter 3
his chapter describes the relationships between power and sample
size in general and for the test for proportions. It also describes
Pitman efficiency and the ARE of competing tests. In later chapters,
these principles are used for the determination of sample size and the
derivation of asymptotically efficient tests. Programs for sample size
and power are also presented in later chapters for tests developed in
those chapters. Here programs are grouped according to the topic.
Precision of an estimate:
NCon1Pr.SAS:
Sample size N to provide a desired precision for a confidence interval
for a single probability.
Prec1Prb.SAS:
The precision for a confidence interval for a single probability
evaluated over the range 0.5-1.0 with plots for a given sample size.
NConProb.SAS:
N for a confidence interval for the difference between probabilities for
2 groups.
Test for proportions:
NPwrProb.SAS:
Sample size based on the power function for the test of 2
proportions.
SSNProb.SAS:
Sample size based on power for the test of 2 proportions over a
specified range of control group probabilities and relative risks.
PowrProb.SAS:
Power for the test of 2 proportions over a specified range of control
group probabilities and relative risks.
PPwrPlot.SAS:
Plot the power function for the test for two proportions for different
sample sizes as a function of the risk difference, relative risk or odds
ratios for different sample sizes.
Test for means:
TpwrPlot.SAS:
Plot the power function for the test for two means for different sample
sizes as a function of the non-central factor for different sample
sizes. A variation of this was used to generate Figure 3.2.
Non-centrality parameter for proportions
SSNProbK.sas:
Used to generate Table 3.1 which presents the non-centrality parameter
for the test for two proportions as a function of the probabilities in
each group.
Chapter 4
Fixed Effects Model
Freqanal.sas:
The principal macro %freqanal that is used for the computation of
fixed-effects model estimates and tests in a stratified analysis of 2x2
tables. These include the Mantel-Haenszel estimates and confidence
limits (using the Robins, Breslow and Greenland variance), MVLE's and
confidence limits, Cochran and Mantel-Haenszel tests, Radhakrishna
tests, the Gastwirth MERT, and the Wei-Lachin test of stochastic
ordering. The program then calls SAS PROC FREQ to perform a
Cochran-Mantel-Haenszel analysis. Then the Cochran test of homogeneity
is computed for each scale. This routine should be
"submitted" or compiled before running another job, like
KTab2x2.sas, to invoke the macro for a specific set of tables.
Freqmh.sas:
A version of Freqanal that also computes the Hauck estimates of the
variance of the Mantel-Haenszel estimators along with those of Robins,
Breslow and Greenland. This also computes the Breslow-Day test of
homogeneity of odds ratios and the corrected test of Tarrone. This does
not compute the MVLE's etc.
homogen.sas:
A PROC IML macro that computes the MVLE and a Wald test based on the
MVLE and its estimated variance for a parameter estimate within
independent strata, and also computes the contrast Wald test of the
hypothesis of homogeneity. Here the parameter estimates could be any
quantity. The estimate and is variance within each stratum are input
using "cards."
Random Effects Model
freqrand.sas:
This is the principal macro %freqrand that computes the random effects
model one-step adjusted estimates for multiple 2x2 tables. The macro
starts with the computation of the variance component estimate for each
scale, followed by the random effects model computations. The macro
%freqanal must be invoked before this macro. See Ktab2x2.sas for an
example.
randiter.sas:
The macro %random that performs a fixed-point iterative random effects
model analysis of multiple 2x2 tables. The analysis is performed
simultaneously for each of the three scales until convergence on all
scales is reached. This is a stand-alone macro and does not require that
freqanal or freqrand be run beforehand.
KTab2x2.sas:
This job invokes the above macros for the analysis of the data in
Examples 4.1, 4.6 and 4.24 using fixed and random effects models.
tab4-4.sas:
The SAS program that appears in Table 4.4.
Power and Sample Size
CMHTstSN.sas:
computes the sample size for the Cochran-Mantel-Haenszel test for
stratified 2x2 tables and was used for the computations in Example 4.25
and Table 4.11. The program also computes the sample size from the
unconditional marginal parameters, but without an adjustment for
bias.
CMHTstpw.sas:
is the companion program that computes the power of the the
Cochran-Mantel-Haenszel test for stratified 2x2 tables for fixed total
sample size.
Chapter 5
Retrospective Studies
parretro.sas:
A macro %retropar to compute the Population Attributable Risk for a
case-control 2x2 Table and its confidence limits.
Matched Pairs
The frequency matched data presented in Example 5.2 are included as
part of Ktab2x2.sas used in Chapter 4.
matchtab.sas:
SAS PROC FREQ analysis of matched 2 x 2 tables as described in Table 5.1
and applied to the data from Examples 5.3 and 5.4.
pair2x2.sas:
The macro %paired for the analysis of a matched or paired 2x2 table with
McNemar's test, the conditional odds ratio and its confidence limits,
and the population averaged relative risk and its limits. This is used
for Examples 5.7 and 5.8
Kpair2x2.sas:
The macro %Kpaired for the stratified analysis of K paired or matched
2x2 tables. This is used in Examples 5.12 and 5.13.
KpairTab.sas:
Analysis of the data in Example 5.12 using the macro %Kpaired for the
stratified-adjusted analysis of K matched or paired 2x2 tables.
Breslow-Day Mack et al. Data
These data are employed both in Chapter 5 and again in Chapter 7.
Here we only employ matched pairs consisting of each case and the first
matched control (among 4). There are different versions of the data set
available. That used herein was downloaded from Norman Breslow's web
site in the fall of 1999 and is contained in the data sets directory
cited above.
MackSetP.sas:
generates the SAS data set jml/biostatmethods/datasets/pairs with the
information for each pair. The stratification variable is the
hypertension status at baseline. Run this job on your platform to
generate a saved SAS data set.
MackPair.sas:
generates the marginal summary 2x2 table shown in Example 5.8 and calls
macro %paired, and also generates the tables within strata defined by
the baseline hypertension status as described in Example 5.13. The
program calls %Kpaired but the analysis fails due to a zero frequency
within one stratum.
Power and Sample Size for McNemar's Test
McNemarN.sas:
Sample size calculations for McNemar's test for paired or matched 2x2
tables using the unconditional power function, as illustrated in Example
5.9.
McNemUPw.sas:
Power for McNemar's test for paired or matched 2x2 tables using the
unconditional power function. This calculation is not illustrated in the
book.
McNemCPw.sas:
Power for McNemar's test for paired or matched 2x2 tables using the
conditional power function, as illustrated in Example 5.10.
Additional programs are provided in the supplement that performs all
of the various approaches to the computation of sample size and power
described in Lachin (1992b).
Chapter 6
Newton.sas:
The macro %Newton that provides the iterative solution of a scalar
function using either Newton-Raphson or Fisher scoring. See Fisher
Recomb.sas for the use of this macro to solve for the recombination
fraction in the Fisher maize data using either Newton-Raphson or Fisher
scoring. You must also specify: 1) A macro %FOFX(x) which evaluates the
function of x to be solved and returns the value using the variable name
"yofx". FOFX must be such that the value is zero at the
solution. 2) A macro %DFOFX(x) which evaluates the derivative of the
function at x and returns the value using the variable name
"dyofx". For Newton-Raphson the hessian is used, for Fisher
scoring the negative expected information is used. 3) A macro variable
ERROR which gives the desired error of the result which when reached
terminates the iteration. 4) A starting value in the variable x0 (not a
macro variable). 5) A macro variable maxitn that fixes the maximum
number of iterations at which the program aborts. Macro variables are
defined using a %let statement, e.g. %let error=0.00000001.
Recomb.sas:
calls the macro %Newton to obtain an iterative estimate of the
recombination fraction for Fisher's maize data presented in Example 6.6.
Newton-Raphson and Fisher scoring are illustrated, each starting from
the moment estimate starting value and also from the null value. This
generates the computations described in Tables 6.1 and 6.2.
Khypr2x2.sas:
computes MLE of conditional odds ratio for K 2x2 tables using the Ulcer
clinical trial data as presented in Example 6.7. This program uses a
modification of the macro %Newton that performs the Newton-Raphson
iterative solution for a scalar function. In this version, %FOFX also
evaluates the derivative of the function at x and returns the value
using the variable name "dyofx". A macro DFOFX is not
used.
Chapter 7
Logistic Regression for i.i.d. Observations
CornfIML.sas:
applies the binomial logit model to the Cornfield data as shown in
Example 7.1 using the SAS PROC IML sample library routine LOGIT. The
output was used to generate the iterative solution presented in Table
7.2.
Cornf.sas:
The SAS Program for the Framingham CHD Data presented in Table 7.3. The
output from this program appears in Table 7.4. This also contains the
interaction model described in Example 7.10;
Renal.sas:
The SAS program and data set presented in Table 7.5 for the analysis of
a subset of the DCCT nephropathy data used in Example 7.4. This is used
to generate the output presented in Table 7.6. This also includes the
call of PROC GENMOD presented in Example 7.7 and the output presented in
Table 7.7. Further it includes the computaion of the sums of squares
(through Proc Univariate) described in Example 7.14.
This renal data set is a subset of the DCCT nephropathy data set. The
complete nephropathy data set is available from the National Technical
Information Service (see DCCT link for Chapter 1 above).
DickSton.sas:
Program for the logistic regression analysis of the unmatched
retrospective study of Dick and Stone (1973) presented in Example
7.5.
LogiScor.sas:
computes the score vector, and the estimated expected information under
the null hyothesis and computes the model score test as illustrated in
Example 7.6. First run Renal.sas to set up the renal data set.
LogiRob.sas:
computes the robust information sandwich estimate of the covariance
matrix of the coefficient estimates and robust confidence intervals and
Wald tests for the renal data as shown in Example 7.8. This also
computes the score vector and the information sandwich estimate under
the null hypothesis, and the robust efficient score test. First run
Renal.sas to set up the renal data set.
Sample Size and Power for Logistic Models
LogiBeta.sas:
For a given design matrix and sampling fractions, this routine computes
the expected logistic model parameters (betas) using two approaches.
The first is to use the iterative maximum likelihood solution using a
large set of hypothetical observations. The other is the WLS or GSK
method described by Rochon (1989) based on a one-step WLS computation.
This is used for Example 7.9.
LogiSSN.sas:
computes sample size for logistic regression based on the power of a
Wald test for a given design matrix, sample fractions and set of
coefficients. This is used for the computations described in Example
7.9. In addition to the 1 df test described in the Example, the
determination of sample size for the model Wald test is also
illustrated.
LogiPwr.sas:
computes the power of a Wald test in logistic regression for a given
sample size as described in Example 7.9.
Interactions
The Cornfield analysis in Example 7.10 is contained in Cornf.sas.
UlcrLogi.sas:
fits the interaction model to the ulcer clinical trial data presented in
Example 7.11.
RenalMod.sas:
fits the logistic regression models with interactions for the renal data
as shown in Example 7.12. The job renal.sas must be run to set-up the
renal data before running this job.
RenalORs.sas:
performed the computations of the odds ratio for a unit increase in
HbA1c as a function of the level of systolic blood pressure presented in
Figure 7.1 of Example 7.13.
Conditional Logistic Regression Models
HLBwt.sas:
Low birthweight data from Hosmer and Lemeshow (1989) and the SAS program
presented in Table 7.8 that generates the output presented in Table 7.9.
See the documentation in DataSets.
Problems
Prostate.sas:
includes the prostate cancer data set from Collett (1991) presented in
Table 7.11. This data set was obtained (downloaded) from the SAS online
data sets for the SAS book Logistic Regression Examples Using the SAS
System, where it appears on p. 17-18.
MackLogi.sas:
uses the Mack et al. data described in Breslow and Day (1980) from a
matched case control study. The data set is provided by Norman Breslow
on his web site at the University of Washington Department of
Biostatistics. The data set downloaded from his site in the fall of 1999
is contained in the file /jml/biostatmethods/datasets/bresmack.dat and
the documentation provided by Dr. Breslow is provided in
/jml/biostatmethods/datasets/bresmack.text. This program computes
additional variables used in Problem 7.18.
Chapter 8
Rates and Relative Risks
rate.sas:
Two macros for the computation of rates and relative risks with an
adjustment for over-dispersion. The macro %rate computes event rates and
relative risks, with an adjustment for over-dispersion using the moment
estimator for the dispersion variance. The macro %adjrate conducts a
stratified analysis of relative risks over strata, computing the MVLE
with the adjustment for overdispersion. The latter is not illustrated in
the text. This program must be submitted before calling the macros.
Instructions for use of the macros are provided in the this program.
rate0.sas:
The macro %rate0 that computes the variance of the mean rate within each
group under the null hypothesis with an adjustment for over-dispersion
within each group, and the z test of the differences between rates.
Hyporrs.sas:
The analysis of the DCCT rates of hypoglycemia as in Examples 8.1, 8.2
and 8.3 using the macros %rate and %rate0. The programs for these macros
must be submitted before submitting this program, or %include statements
must be added. This program also conducts a stratified-adjusted
analysis, stratifying by adult versus adolescent, using the macro
%adjrate. These results are not shown in text. The analysis uses the
data set dccthypo.dat with the path
/jml/biostatmethods/datasets/hypoglycemia/dccthypo.dat.
Note that the data in Table 8.1 were not used as the basis for the
computations. Rather the event rate for each subject is computed in the
data statement, as is the years of follow-up, so that the computations
use greater precision for each number for each subject.
This hypoglycemia data set is a subset of the complete DCCT data set
that is available from the National Technical Information Service (see
DCCT link for Chapter 1 above).
Table81.sas:
generates the data set displayed in Table 8.1 derived from the data set
dccthypo.dat. Note that all computations are based on the latter
data set, not the data summarized in Table81.dat. The latter is not
provided as a data set.
RandRate.sas:
is a supplemental program that uses a fixed point algorithm to compute
convergent estimates of the over-dispersion parameters, the mean rates
and their variances, both under the alternative hypothesis and also
under the null hypothesis. The latter could be used as the basis for a
Z-test of the difference between two groups. These computations are not
described in the text, but follow from those described in Section
4.10.2. The program is applied to the DCCT data used in Example 8.3.
Poisson Regression Models
HypoPois.sas:
fits the Poisson regression models using the SAS program shown in Table
8.2 that generates the output shown in Tables 8.3, 8.4 and 8.5. The
output generated differs slightly from that shown in the tables. This
program uses the class variable for treatment group defined using the
categories "exp" (experimental) versus "std"
(standard), older names for the treatment groups that are labeled as
intensive and conventional respectively. Since "std" has the
higher alphabetical order, this group is used as the reference category
for the model effects shown in Tables 8.4 and 8.5. These group labels
in the tables were later changed to intensive and conventional,
respectively. Thus, I edited the program output and manually changed
"Exp" to "Int" and "Std" to
"Conv".
Note that if the statements in the program are changed such as (if
grp=1 then group='Int ';) and (if grp=0 then group='Conv';) then in this
case the "Int" group has the higher order and it, rather than
"Conv", is used as the reference category. In this case, the
sign of the group coefficient is opposite that shown in the tables.
Additional computations in the program compute the log likelihoods
for the null, full and saturated models, deviances for the null and full
model, and additional computations that provide the measures of
association described in Example 8.5.
HypoRob.sas:
fits the over-dispersed quasi-likelihood robust Poisson regression
models described in Example 8.6, and the robust information sandwich
analysis using PROC GENMOD that is described in Example 8.7. Additional
computatons using a PROC IML program, as in LogiRob.sas, provide the
robust sandwich estimate plus robust inferences;
Problems
GailCnt.sas:
This reads the data from Gail, Santner and Brown, 1980 that are used in
Problem 8.4. In their paper, the actual times start at 60. The data here
start with 60 as time zero. A11 animals were exposed for 122 days
(182-60).
FrCh.sas:
reads the data from Frome and Checkoway, 1985, that are used in Problem
8.6.
FHCnt.sas:
reads the data file containing the detailed information on the recurrent
infections from Fleming and Harrington (1991) and computes the numbers
of events and exposure time for each individual. Additional computations
can then be performed.
FHCnt.dat:
Is a data file constructed from the output of the above program. This
can be input or accessed directly using a filename and %include
statements.
FHCntAn.sas:
reads the file fhcnt.dat and defines additional variables that
can be used for analyses of numbers of events using rates and Poisson
models as requested in Problem 8.10.
Chapter 9
Survival Function and Tests of Significance
Survanal.sas is a set of survival
analysis macros that provides the basic computations of survival
probabilities and hazard rates in each of two groups, either for data in
continuous time or using the actuarial method, with generalized
Mantel-Haenszel or G-rho tests for differences between groups.
Instructions for use are given in the program.
SWL-anal.sas uses the macro
%survanal to perform survival analyses of the Lagakos squamous cell data
in the complete cohort, and within subgroups defined by performance
status. The analysis within the non-ambulatory subgroup is presented in
Example 9.1 (Tables 9.1-3 and Figure 9.1) and Example 9.4.
DCCTneph.sas. This program uses
the data set nephdata.dat with the macros provided in survanal.sas to
perform the modified Kaplan-Meier lifetable analyses presented tables
9.5-5 of Example 9.2 and to compute the tests presented in Example 9.5.
The dataset contains additional variables not mentioned in the book and
could be used for supplemental exercises.
Proportional Hazards Model
SWL-PH.sas conducts the various
proportional hazards regression analyses presented in Example 9.6,
including those with covariate interactions, nested effects, and
stratified analyses.
SWL-R2.sas computes the Schemper
and Kent-O'Quigley R-square measures of explained variation in the
proportional hazards regression analysis of the Lagakos data presented
in Example 9.6.
SWL-IS.sas computes the robust
information sandwich estimate of the covariance matrix of the
coefficient estimates for the proportional hazards regression analysis
of the Lagakos data presented in Example 9.6.
SWL-Gof.sas assesses the PH model
assumptions using tests of interaction with time, plots of the
log(-log(survival)) function within strata, and the Lin (1991) test of
the PH assumption using the SAS macro %gofcox. This performs the
computations described in Example 9.8 and generates the functions
plotted in Figure 9.3.
GofCox.sas is a macro written by
Dr. Oliver Bautista to evaluate the proportional hazards assumption for
each covariate in the model using the method of Lin (1991). Instructions
for use of the macro are provided in the program as comments.
NephA1c.sas fits the PH model for
the effect of the current annual mean HbA1c on the risk of developing
microalbuminuria in the conventional treatment group of the secondary
cohort of the DCCT. This was used to generate the results presented in
Example 9.9.
This program uses the data set dcctneph in the datasets
directory. The analyses presented in the text used the raw SAS DCCT data
file, rather than the "flat" file dcctneph Thus the
numerical results obtained using these data are slightly different from
those presented in the text.
Note that the data set also contains additional variables not
employed in the text that could be used for supplemental exercises.
Sample Size and Power
exsslogl.sas determines the
sample size to provide a desired level of power for a test of the
equality of the survival distributions for two groups under an
exponential model assuming uniform entry over the period (0,R) and
continued follow-up up to T>R years in all subjects. This program
uses the log hazard ratio as the basis for the test, rather than
difference in hazards. The power function is presented in equation (2.4)
of Lachin and Foulkes (1986). The program also provides for losses to
follow-up that are also exponentially distributed, using (4.1) of Lachin
and Foulkes. This is used for computations presented in Example 9.10.
expwlogl.sas is the companion
program to exsslogl.sas that determines the level of power
provided by a specified sample size for a test of the equality of the
survival distributions for two groups under an exponential model
assuming uniform entry over the period (0,R) and continued follow-up up
to T>R years in all subjects. This program uses the log hazard ratio
as the basis for the test, rather than difference in hazards. The power
function is presented in equation (2.4) of Lachin and Foulkes (1986).
The program also provides for losses to follow-up that are also
exponentially distributed, using (4.1) of Lachin and Foulkes. This is
used for computations presented in Example 9.10. Note that the power
values stated in the book for a 40% and 33% reduction are those with no
losses to follow-up. The power with losses is slightly less than that
stated.
The supplemental programs include the program expwplot.sas
that plots the power function over a range of relative risks for a given
sample size, for a test using the log hazard ratio. The programs
exsndiff.sas and expwdiff.sas are comparable to the above
programs except that the power function is based on the test of the
difference in hazards, rather than the test of the log hazard ratio.
These supplemental programs provide power estimates slightly less, and
sample size estimates slightly larger, than those provided by the above
programs. Two additional programs ssnopdok.sas and
pwropdok.sas provide the determination of sample size and power,
respectively, for a stratified analysis of differences in hazards as
described by Lachin and Foulkes (1986).
Recurrent Events
KernelSm.sas contains the macro
smooth that computes the kernel smoothed estimate of a hazard
function or intensity function for a counting process based on possibly
recurrent event time data. The macro uses the Epanechnikov kernel. The
estimate, its variance and the kernel are presented in equations
(9.137)-(9.139). The program allows for computations for two groups of
subjects. The usage is described in the macro; see hypokrnl.sas
as an example. Macro variables specify the band width, the number of
distinct event times and the axis specifications for the plots. The
input data must consist of a single observation with 5 arrays of
variables contining the successive time values, numbers of events and
numbers at risk in each group at each time. The input data set may be
constructed using the program hypotime.sas.
hypotime.sas is the program used
to generate the data set hytimes that contains a single
observation with the numbers at risk and numbers of events at each
distinct event time. This data set is then used with the macro
smooth in the program kernelsm.sas to generate kernel
smoothed estimates of the intensity function over time, or the macro
AGtests in AGtests.sas to compute the estimates of the
cumulative intensity functions and the Aalen-Gill tests for recurrent
event processes. This program uses the input data set hyevents
that has one observation for each event for each subject, or a single
observation if no events.
AGtimes.sas is a more general
macro that can be used to generate the appropriate data set with the
arrays for event times, numbers of events and numbers at risk from any
data set with recurrent events. In FG-CGDtm.sas, this program is
applied to the Fleming and Harrington (1991) CGD data described in
Problem 9.18. The macro requires that the data set contain a patient id
variable, a group variable (1=experimental, 2=control), both of which
are specified as macro variables, and additional variables to represent
an event time (etime), the maximum follow-up time (ftime),
and an indicator delta to indicate whether an event was observed
at the current time (1) or the observation is right censored (0) at that
time.
hypokrnl.sas. This program uses
the macro smooth in kernelsm.sas to compute the kernel
smoothed estimate of the intensity intensity function for the recurrent
hypoglycemia counting process described in Example 9.11 and presented in
Figure 9.4.
AGtests.sas. This program contains
the macro AGtests that computes the estimated cumulative
intensity for possibly recurrent event times in two groups as presented
in (9.132). These are plotted. The program then computes the Aalen-Gill
test statistics for possibly recurrent event times with an allowance for
ties, as in (9.149)-(9.150). The logrank and Gehan-Wilcoxon tests are
presented. The usage is described in the macro; see hypotest.sas
as an example. A macro variable specifies the number of distinct event
times. The input data must consist of a single observation with 5 arrays
of variables contining the successive time values, numbers of events and
numbers at risk in each group at each time. The input data set may be
constructed using the program
hypotime.sas.
hypotest.sas. This program uses
the macro AGtests in AGtests.sas to compute the
cumulative intensities and the Aalen-Gill tests for the recurrent
hypoglycemia counting process described in Example 9.11.
hypomim.sas fits the multiplicative
intensity model to the recurrent hypoglycemia events in the intensive
group patients of the secondary cohort of the DCCT as described in
Example 9.12. The data set hypomimi in the datasets
directory contains the data from the intensive group subjects.
The data set hypomimc contains the data from the conventional
group subjects. The latter may be used for additional problems. Both
data sets contain a number of additional baseline covariates for each
subject.
Problems
Table9-8.sas reads the data in
Table 9.8 in a format suitable for use with the other macros provided or
with SAS procedures.
Prentice.sas reads the data set
in Table 9.9 from Prentice (1973) in a format suitable for use with the
other macros provided or with SAS procedures.
VACURG85.sas reads the data from
the VA Cooperative Urology Research Group study of prostate cancer
described by Byar (1985) that are used in Problem 9.17. The variables
described in Problem 9.17 are computed from the raw data file
VACURG85.dat that is provided in the datasets directory.
The data are also available on StatLib as Table 46 of the book edited by
Andrews and Herzberg.
FH-CGDph.sas reads the data from
Fleming and Harrington (1991) with the times of recurrent infections
among children in a clinical trial of interferon versus placebo. This
uses the data set FH-cgd.dat that is suitable for use with PHREG
to fit multiplicative intensity proportonal intensity models.
FH-CGDtm.sas uses the macro
%timeset in AGtimes.sas to generate the data set
CGDtimes in a format required to compute the kernel smoothed
intensity estimates and the Aalen-Gill tests. This data set may then be
used with the other macros and programs to perform additional analyses
as specified in Problem 9.18.
FH-CGDag.sas reads the data from
Fleming and Harrington (1991) with the times of recurrent infections
among children in a clinical trial of interferon versus placebo. This
uses the data set cgdtimes that is suitable for use with the macro
AGtests to compute Aalen-Gill test statistics for recurrent
events.
Data Sets
Mack, et al. (Breslow-Day)
BresMack.text:
is the documentation to Appendix III of Breslow and Day (1980) that
presents data from the matched case-control study of endometrial cancer
described in Mack et al. (1976). This file was downloaded in the fall
of 1999 from Norman Breslow's web site at the Department of
Biostatistics of the University of Washington. It describes the
variables and corrections to the descriptions in Breslow and Day.
BresMack.dat:
is the data set downloaded from Norman Breslow's web site. The programs
described in Chapters 5 and 7 use this data set.
Hosmer-Lemeshow Low Birthweight Data
HosLem.sas:
generates the data set used as the basis for the analyses in the text,
Tables 7.8 and 7.9. The original source of the data was a table in the
SAS Technical Report P-229, p. 465-6. The data were scanned and used in
this program to generate the data set HLData.dat that is used in
the program HLBwt.sas for the analyses shown in Chapter 7.
Because the data were scanned from a secondary source, the data and
analyses may differ from those shown by Hosmer and Lemeshow.
DCCT Hypoglycemia
dccthypo.dat:
is a flat (.dat) file used in Examples 8.1-8.3. This is used as the
basis for the data presented in Table 8.1, see the program
Table81.sas in Chapter 8.
Fleming-Harrington CGD Data
FH-CGD.dat. Fleming and Harrington
(1991) present the data from a clinical trial of gamma interferon versus
placebo in the treatment of children with chronic granulamatous disease
(CGD) to reduce the incidence of recurrent pyogenic infections. The data
set includes multiple records for each subject to record the time of
each successive infection or the date of right censoring. The variables
include
| id |
the patient ID |
| IDT |
either the date of onset of a serious infection, or the date
follow-up ended |
| Z2 |
Inheritance pattern: X-linked (1) versus autosomal recessive
(2) |
| Z4 |
Height (cm) |
| Z6 |
Corticosteroid use on entry: yes (1) versus no (2) |
| Z8 |
Gender: male (1) versus female (2); and |
| T1 |
elapsed time from randomization to the value of IDT in
the current observation, i.e. the time to an infection or the
number of days of follow-up (IDT-RDT) |
| d |
indicator (1) for an infection at the date IDT or (2)
for censoring at that time (end of follow-up) |
|
|
| RDT |
the date of randomization into the study, in mmddyy
format |
| Z1 |
treatment group: interferon (1) versus placebo (2) |
| Z3 |
Age (years) |
| Z5 |
Weight (kg) |
| Z7 |
Antibiotic use on entry: yes (1) versus no (2) |
| Z9 |
Type of hospital: NIH (1), other US (2), Amsterdam (3), other
European (4) |
| T2 |
the start time for the current interval of risk, either 0 for
the first record of each subject, or the time IDT+1 from
the previous infection time for that subject |
| S |
the sequence number for the current infection (if any) for
this subject. This is used for analyses of recurrent events in
Chapter 9. |
|
Note: FHcnt.dat is a data set
created by FHcnt.sas that has one
record per subject contianing the additonal variables nevents:
number of severe infections experienced, and futime: the number
of days of follow-up. This is used for analyses using Poisson regression
in Chapter 8.
Lagakos Squamous Cell Carcinoma
Lagakos.sas reads the data from
Lagakos (1978) and creates a SAS data set that is used for the analyses
in Chapter 9. This job should be run on your platform to create the SAS
data set. The data set was originally used by Lagakos to describe an
approach to the analysis of competing risks, there being two modes or
causes of failure (spread of disease) - metastatic versus not. For the
analyses herein, however, a single outcome is employed - spread of
disease of any cause.
DCCT Nephropathy (Microalbuminuria) Data
nephdata.dat contains
data related to the onset of microalbuminuria in the DCCT. These data
are used for simple survival analyses as presented in Example 9.2. The
data set, however, contains additional variables that could be used for
supplemental exercises. See DCCTneph.sas. The variables in the
data set are
| Patient |
ID number (a dummy number to mask the patient's
identity) |
| primary |
for primary prevention cohort (1) versus secondary
intervantion cohort/td>
|
| neur |
for neuropathy present on entry (1) versus not (0) |
| neph2flg |
the indicator for the development of microalbuminuris during
the study (1) versus censored |
| duration |
the months duration of diabetes on entry |
| age |
in years |
| bcval5 |
the entry level of stimulated C-peptide, a measure of residual
endogenous insulin secretory function |
| bmi |
a measure of obesity calculated as weight/(height**2), and the
array of variables |
|
|
| int |
for intensive (1) versus conventional (0) treatment |
| etdpatb |
the baseline ETDRS grade of retinopathy severity (see DCCT,
1995) |
| aer0 |
the entry level of albumin excretion rate (mg/24 h) |
| neph2vis |
the quarterly visit number at which microalbuminuria was first
observed or the last observation visit |
| female |
(1) versus male (0) |
| adult |
(1) (>17 years of age) versus adolescent (0) |
| hbael |
the baseline level of HbA1c |
| mhba1-mhba9 |
that represent the current mean HbA1c over the period since
randomization up to the current annual visit (1-9). |
|
DCCT Hypoglycemia Recurrent Event Data
Due to their size, the four data sets in this section are provided
as a single SAS export file. You can download this file as an
uncompressed file
(17.6 Mbytes), as a
gzip-compressed file
(952 Kbytes), or as a
zip-compressed file
(938 Kbytes).
Please run the program
impthypo.sas
to generate the following SAS data sets on your platform.
Dataset hyevents contains one record per hypoglycemia event for
each subject. The variables are
| ETIME |
the day number since randomization when an event occurred
missing if no event in this observation |
| EVENTDAY |
the calendar date of the event in MMDDYY8. format |
| FTIME |
the total follow-up time of the subject |
| NEVENTS |
the total number of events for this subject |
| RANDSAS |
the calendar date of randomization into the study in MMDDYY6.
format. |
|
|
| EVENT |
an indicator for whether an event occurred at this time
(1=yes, .=no) |
| EVNUM |
the cumulative event number since randomization |
| INTGROUP |
an indicator for intensive (1) versus conventional (2)
treatment group |
| PATIENT |
the patient ID number (masked) |
|
Dataset hytimes This data set contains a single observation
with six sets of array variables:
| MAXJ |
is the number of elements in the array that equals the total
number of distinct event times in the data set (1565 in this
case) |
| XE1-XE1565 |
are the numbers of events in the intensive (experimental)
group at each time |
| YE1-YE1565 |
are the numbers of subjects at risk in the intensive
(experimental) group at each time |
| Y1-Y1565 |
are the total numbers of subjects at risk in both groups at
each time. |
|
|
| T1-T1565 |
are the times at which events occurred |
| XC1-XC1565 |
are the numbers of events in the conventional group at each
time |
| YC1-YC1565 |
are the numbers of subjects at risk in the conventional group
at each time |
|
Dataset hypomimi Contains DCCT intensive group recurrent
hypoglycemia event observations with time dependent covariate data as
described in Example 9.12. Each observation is defined in terms of start
and stop times, the associated time dependent covariate (mhba) and the
number of events at the stop time, if any. The covariates in the data
set are
| ADULT |
Adult >=18 (0=no/1=yes) |
| AGE |
Age at entry |
| CALORIES |
Calories (kcal) per day |
| DURATION |
Duration of IDDM (months) at Baseline |
| FAMIDDM |
Family History of IDDM (0=no/1=yes) |
| FULLIQ |
Full Scale IQ |
| HBAEL |
HbA1c at Eligibility |
| HYPOFLG |
1 if had a hypoglycemia event at this time |
| LAER00 |
Log of Baseline AER |
| LHBA1C |
time dependent Log of the current mean HbA1c since
randomization |
| MARRIED |
Marital Status (0=NOT Married,1=Married) |
| NEUR0FLG |
Clinical Neuropathy at Baseline (0,1) |
| PATIENT |
Patient ID number (masked) |
| PRIMARY |
Base retinopathy strata (0=Scnd, 1=Prim) for primary versus
secondary cohort |
| RET20FLG |
Baseline ETDRS 20/20 (0=No,1=Yes) |
| RET43FLG |
Baseline ETDRS 43/<43 + (0=No,1=Yes) |
| SMOKER |
Smoking Status at Baseline (0=No,1=Yes) |
| STOPS |
End of Interval (in study time) |
| WPMEAN |
Within-Profile Mean Blood Glucose(mg/dl). All covariates are
DCCT baseline values except lhba1c and nprior which are time
dependent covariates. |
|
|
| AER00 |
Albumin Excretion Rate (mg/24hr) at Baseline |
| BMI |
Body Mass Index (kg/m**2) |
| CPEPTIDE |
Stimulated C-Peptide(pmol/ml) |
| EDUCAT |
Mean Education (Years) - Form 013 |
| FEMALE |
Female (0=no/1=yes) |
| GROUP |
treatment group 'EXPERIMENTAL' for intensive or 'STANDARD' for
conventional |
| HDL |
HDL Cholesterol (serum,mg/dl) |
| INSULIN |
Total Insulin Dosage Units/Weight (kg) |
| LDL |
LDL Cholesterol (serum,mg/dl) |
| LHBAEL |
Log of HbA1c at Eligibility |
| MBP |
Mean Arterial Pressure |
| NPRIOR |
time dependent cumulative number of hypoglycemia events since
randomization prior to the current interval |
| PHASE2 |
Randomization in phase 2 (1) versus phase 3 (0) |
| PROTEIN |
Dietary Protein (gm) |
| RET35FLG |
Baseline ETDRS 35/<=35 (0=No,1=Yes) |
| RETBASE |
Baseline Retinopathy Strata 'PRIM' for primary or 'SCND' for
secondary |
| STARTS |
Start of Interval (in study time) |
| TRG |
Triglycerides (serum,mg/dl) |
|
Dataset hypomimc Contains DCCT conventional group recurrent
hypoglycemia event observations with time dependent covariate data as
described in Example 9.12. Each observation is defined in terms of start
and stop times, the associated time dependent covariate (mhba) and the
number of events at the stop time, if any. The variables in the data set
are the same as those described above.
Veterans Administration Cooperative Urological Research Group
VACURG85.dat presents
the data from the VACURG study of prostate cancer described by Byar in
the book edited by Andrews and Herzberg (1985) which gives the variable
descriptions. These data have been used by many, including Thall and
Lachin (1986). The data are also available from StatLib in a slightly
different format as Table46.dat of the file Andrews. The
variables included are
| patid |
patient number |
| rx |
treatment group 1=placebo, 2=0.2 mg. estrogen, 3=1 mg., 4=5
mg. |
| mosfu |
months of follow-up |
| age |
in years, 89=>88 |
| pf |
performance status 0=normal, 1=<50% time in bed,
2=(50-<100%) time, 3=confined to bed |
| sbp |
systolic blood pressure/10 mm/hg (e.g. 118 recorded as
12) |
| ekg |
EKG 0=normal, 1=benign, 2=rhythmic disturbances and
electrolyte changes, 3=heart blocks or conduction defects,
4=heart strain, 5=old myocardial infarct (MI), 6=recent
MI |
| sz |
tumor size in cm**2 (0=none palpable) |
| stage |
tumor stage |
| startm startd starty |
date of randomization (m d y) |
|
|
| ap |
alkaline phosphatase in King-Armstrong units *10, a measure of
liver function |
| status |
survival status 0=alive, 1=dead from prostate cancer, 2=dead
from heart/vascular disease, 3=dead from cerebrovascular
disease, 4=dead from pulmonary embolus, 5=dead from other
cancer, 6=dead from respiratory disease, 7=dead from other
specific non-cancer cause, 8=dead from unspecified non-cancer
cause, 9=dead from unknown cause |
| wt |
weight index = kg - cm. height +200 |
| hx |
history of cardiovascular disease 0=no, 1=yes |
| dbp |
diastolic BP/10 |
| hg |
serum hemoglobin in g/100 ml *10 |
| sg |
combined index of tumor stage and histology grade |
| bm |
bone metastases 0=no, 1=yes |
|
|