John M. Lachin
John Wiley and Sons, 2000
ISBN: 0-471-36996-9
n 1993 to 1994 I led the effort to establish a graduate program in
biostatistics at the George Washington University. The program, which I now
direct, was launched in 1995 and is a joint initiative of the Department of
Statistics, the Biostatistics Center (which I have directed since 1988) and the
School of Public Health and Health Services. Biostatistics has long been a
specialty of the statistics faculty, starting with Samuel Greenhouse, who joined
the faculty in 1946. When Jerome Cornfield joined the faculty in 1972, he
established a two-semester sequence in biostatistics (Statistics 225-6) as an
elective for the graduate program in statistics (our 200 level being equivalent
to the 600 level in other schools). Over the years these courses were taught by
many faculty as a lecture course on current topics. With the establishment of
the graduate program in biostatistics, however, these became pivotal courses in
the graduate program and it was necessary that Statistics 225 be structured so
as to provide students with a review of the foundations of biostatistics.
Thus I was faced with the question "what are the foundations of
biostatistics?" In my opinion, biostatistics is set apart from other
statistics specialties by its focus on the assessment of risks and relative
risks through clinical research. Thus biostatistical methods are grounded in the
analysis of binary and count data such as in 2x2 tables. For example, the
Mantel-Haenszel procedure for stratified 2x2 tables forms the basis for many
families of statistical procedures such as the G-rho family of modern
statistical tests in the analysis of survival data. Further, all common medical
study designs, such as the randomized clinical trial and the retrospective
case-control study, are rooted in the desire to assess relative risks. Thus I
developed Statistics 225, and later this text, around the principle of the
assessment of relative risks in clinical investigations.
In doing so, I felt that it was important first to develop basic concepts
and derive core biostatistical methods through the application of classical
mathematical statistical tools, and then to show that these and comparable
methods may also be developed through the application of more modern,
likelihood-based theories. For example, the large sample distribution of the
Mantel-Haenszel test can be derived using the large sample approximation to the
hypergeometric and the Central Limit Theorem, and also as an efficient score
test based on a hypergeometric likelihood.
Thus the first five chapters present methods for the analysis of single and
multiple 2x2 tables for cross-sectional, prospective and retrospective
(case-control) sampling, without and with matching. Both fixed and random
effects (two-stage) models are employed. Then, starting in Chapter 6 and
proceeding through Chapter 9, a more modern likelihood or model-based treatment
is presented. These chapters broaden the scope of the book to include the
unconditional and conditional logistic regression models in Chapter 7, the
analysis of count data and the Poisson regression model in Chapter 8, and the
analysis of event time data including the proportional hazards and
multiplicative intensity models in Chapter 9. Core mathematical statistical
tools employed in the text are presented in the Appendix. Following each chapter
problems are presented that are intended to expose the student to the key
mathematical statistical derivations of the methods presented in that chapter,
and to illustrate their application and interpretation.
Although the text provides a valuable reference to the principal literature,
it is not intended to be exhaustive. For this purpose, readers are referred to
any of the excellent existing texts on the analysis of categorical data,
generalized linear models and survival analysis. Rather, this manuscript was
prepared as a textbook for advanced courses in biostatistics. Thus the course
(and book) material was selected on the basis of its current importance in
biostatistical practice and its relevance to current methodological research and
more advanced methods. For example, Cornfield's approximate procedure for
confidence limits on the odds ratio, though brilliant, is no longer employed
because we now have the ability to readily perform exact computations. Also, I
felt it was more important that students be exposed to over-dispersion and the
use of the information sandwich in model-based inference than to residual
analysis in regression models. Thus each chapter must be viewed as one
professor's selection of relevant and insightful topics.
In my Statistics 225 course, I cover perhaps two-thirds of the material in
this text. Chapter 9, on survival analysis, has been added for completeness, as
has the section in the Appendix on quasi-likelihood and the family of
generalized linear models. These topics are covered in detail in other courses.
My detailed syllabus for Statistics 225, listing the specific sections covered
and exercises assigned, is available at the Biostatistics Center web site
(www.bsc.gwu.edu/jml/biostatmethods). Also, the data sets employed in the text
and problems are available at this site or the web site of John Wiley and Sons,
Inc. (www.wiley.com).
Although I was not trained as a mathematical statistician, during my career
I have learned much from those with whom I have been blessed with the
opportunity to collaborate (chronologically): Jerry Cornfield, Sam Greenhouse,
Nathan Mantel, and Max Halperin, among the founding giants in biostatistics; and
also Robert Smythe, L.J. Wei, Peter Thall, K.K. Gordon Lan and Zhaohai Li, among
others, who are among the best of their generation. I have also learned much
from my students, who have always sought to better understand the rationale for
biostatistical methods and their application.
I especially acknowledge the collaboration of Zhaohai Li, who graciously
agreed to teach Statistics 225 during the fall of 1998, while I was on
sabbatical leave. His detailed reading of the draft of this text identified many
areas of ambiguity and greatly improved the mathematical treatment. I also thank
Costas Cristophi for typing my lecture notes, and Yvonne Sparling for a careful
review of the final text and programming assistance. I also wish to thank my
present and former statistical collaborators at the Biostatistics Center, who
together have shared a common devotion to the pursuit of good science: Raymond
Bain, Oliver Bautista, Patricia Cleary, Mary Foulkes, Sarah Fowler, Tavia
Gordon, Shuping Lan, James Rochon, William Rosenberger, Larry Shaw, Elizabeth
Thom, Desmond Thompson, Dante Verme, Joel Verter, Elizabeth Wright, and Naji
Younes, among many.
Finally, I especially wish to thank the many scientists with whom I have had
the opportunity to collaborate in the conduct of medical research over the past
30 years: Dr. Joseph Schachter, who directed the Research Center in Child
Psychiatry where I worked during graduate training; Dr. Leslie Schoenfield, who
directed the National Cooperative Gallstone Study; Dr. Edmund Lewis, who
directed the Collaborative Study Group in the conduct of the Study of
Plasmapheresis in Lupus Nephritis and the Study of Captropil in Diabetic
Nephropathy; Dr. Thomas Garvey, who directed the preparation of the New Drug
Application for treatment of gallstones with ursodiol; Dr. Peter Stacpoole, who
directed the Study of Dichloroacetate in the Treatment of Lactic Acidosis; and
especially Drs. Oscar Crofford, Saul Genuth and David Nathan, among many others,
with whom I have collaborated since 1982 in the conduct of the Diabetes Control
and Complications Trial, the study of the Epidemiology of Diabetes Interventions
and Complications, and the Diabetes Prevention Program. The statistical
responsibility for studies of such great import has provided the dominant
motivation for me to continually improve my skills as a biostatistician.
John M. Lachin
Rockville, Maryland
|