Modelling Binary Data Collett Pdf Download

• Modeling Binary Data. Collett Review by: Potter C.

Chang Journal of the American Statistical Association, Vol. 422 (Jun., 1993), pp. 706-707 Published by: American Statistical Association Stable URL:. Accessed: 01:12 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at.. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive.

We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.. American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association.

Modelling Binary Data Collett Pdf Download

This content downloaded from 195.78.108.60 on Sun, 15 Jun 2014 01:12:07 AM All use subject to JSTOR Terms and Conditions • 706 Journal of the American Statistical Association, June 1993 the ARIMA orders of the input processes (pi, qi, i = 1,..., p), the noise process (p, q), and the orders of the numerator and denominator components of the transfer functions (ri, h, = 1,..., p), as well as the pure delays (b,, i = 1,..., p). My poor understanding of the various procedures by which one arrives at such specifications led me to restrict discussions in Shumway (1988) to a nonparametric approach to transfer function estimation in the frequency domain or to simple state-space or multivariate autoregressive approaches in the time domain.

I am happy to report that the lucid discussion of the transfer function procedures in Pankratz's book has largely eliminated many of my misgivings about this methodology. Even though the discussion is on a practical level, one still can see the theoretical underpinnings well enough to understand what kinds of developments are needed to put the procedure on a relatively rigorous basis (see, for example, Brockwell, Davis, and Salehi 1990).

Logit Models for Binary. We now turn our attention to regression models for dichotomous data, in- cluding logistic regression and probit analysis. These models are appropriate when the response takes one of only two possible values representing success and failure, or more generally the presence or absence of an.

It is clear that what Box and Jenkins originally called model identification and what is sometimes called model selection is the most difficult part of the procedure. It is here that Pankratz makes a bold choice. Instead of choos- ing the cross-correlation function between the prewhitened input and trans- formed output process as the basic tool for identifying the transfer function structure, he chooses to use the multiple regression coefficients relating the lagged input processes to the output (see Liu and Hudak 1986).

This ap- proach, known as the linear transfer function (LTF) method, seems to provide an improvement over the usual Box-Jenkins approach in that it works better when there are multiple inputs. The residuals are then used to build a simple ARMA model for the noise, after which one can reestimate the linear transfer function coefficients.

These coefficients are then used to identify a parsimonious ratio of poly- nomials that can approximate the transfer function of the LTF. This is perhaps the most difficult step in the identification procedure for the reader, and the book provides many examples and graphs that illustrate various common ratios used to describe transfer functions. The use of a corner table based on Pade approximations is introduced as an aid for choosing b, r, and h, in the rational polynomial model.

In simple cases, a first-order poly- nomial in the denominator can emulate exponential decrease; combined with a pure delay, this often provides an economical description of the transfer function. Once a tentative identification has been made for the ratio of polynomials and for the ARMA form of the noise process, the input series x,, can be analyzed to determine the best ARMA model for each. These kinds of analyses are assumed to be carried out using the SCA software (see Liu and Hudak 1986), which can also be used to estimate the parameters of the final model by conditional or exact maximum likelihood (least squares). The forecasts are computed as 'finite past' approximations to the 'infinite past' minimum mean square error estimators as in Box and Jenkins (1976). Professor Pankratz provides examples that show this identification, esti- mation and forecasting sequence in action for a number of classical time series; examples are federal government receipts, electricity demand, housing sales and starts, and industrial production, stock prices and vendor perfor- mance.

Tools for solving the identification problem are the autocorrelation function (ACF), partial autocorrelation (PACF), and the extended auto- correlation (EACF). More modern model selection techniques such as the Akaike information criterion (AIC) (see Brockwell and Davis 1991) are not applied except in the last chapter on multivariate ARMA processes. Pankratz tends to gravitate towards particular forms such as first-order multiplicative seasonal and ordinary autoregressive models for the noise and towards ex- ponentially decaying models with a delay for the rational polynomial. It would be interesting to see whether AIC would discriminate between these simple models and variations that might be suggested by the diagnostic tools. From a theoretical viewpoint, it is important to note that the transfer function model has been formulated in state-space form by Brockwell and Davis (1991) and by Brockwell et al.

This approach has strong appeal when the series are short or when some of the series may have missing values. In these cases the forecasting approximations Pankratz uses in this book may be very poor; furthermore, there is no discussion at all of the common case where some or all of the series may contain missing values (see Shumway 1988). The innovations form of the likelihood is still easy to compute in those cases, and one gets an exact treatment of the estimation problem using the Gaussian likelihood of y,, x,,,..., xqp, t = 1,..., n.

The Kalman filter also yields the exact linear predictor of y,+,, instead of the large-sample approximation used in this book. A rather unique feature of this book is its extended coverage of intervention analysis (Chaps. Again, the author gives a detailed discussion of this procedure designed for modeling changes in time series as responses of rational polynomial systems to pulse or step interventions.

The intervention analysis then becomes a special case of the preceding material where the inputs x,, are known deterministic functions. Chapter 8 extends the discussion to outlier detection, where the outliers can be additive or innovational (vari- ance changing) in effect. Any assessment of this book's overall importance will be tempered by one's personal view as to how the Box-Jenkins technology fits into modem time series analysis.

Some will opt for the more general state-space model and, in particular, the additive structural models of the British School (see, for example, Harrison and Stevens 1976 and Harvey 1989) for applications in economics and the physical sciences. It is intuitively appealing to break series into their components and to identify and directly estimate factors such as seasonals and trends. Such models are appealing to statisticians because they are time series generalizations of random and fixed effect models already in common use. State-space models with multivariate ARMA com- ponents are also possible, and even the transfer function model can be put into state-space form (see, for example, Brockwell et al. In general, Forecasting With Dynamic Regression Models fulfills its pur- pose admirably; in my opinion, it is the clearest and most readable exposition of the Box-Jenkins transfer function methodology currently available. I often recommend this book to graduate students from other fields as an excellent text for self-study.

In conjunction with Pankratz (1982), it might serve best in an instructional setting as the second of a two-quarter sequence in time series analysis in a traditional MBA program. The level of mathematical sophistication required may be too high for undergraduates outside of the sciences. This stems not from any formal mathematical requirements that the book imposes, which are limited to simple algebra, but more from the level of sophistication required for understanding the practical implications of the rational polynomial transfer function model. SHUMWAY University of California, Davis REFERENCES Box, G. P., and Jenkins, G.

(1976), '=?ime Series Analysis: Forecasting and Control, San Francisco: Holden-Day. Brockwell, P. A Series Of Unfortunate Events The Bad Beginning Pdf.

J., and Davis, R. (1991), Time Series: Theory and Methods (2nd ed.), New York: Springer-Verlag.

Brockwell, P. J., Davis, R. A., and Salehi, H. (1990), 'A State-Space Approach to Transfer Function Modelling,' in Inference From Stochastic Processes, eds.

Basawa and N. Prabhu, New York: Marcel Dekker. J., and Stevens, C. (1976), 'Bayesian Forecasting' (with discussion), Journal of the Royal Statistical Society, Ser.

B, 38, 205-247. (1989), Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge, U.K.: Cambridge University Press. Liu, L.-M., and Hudak, G. (1986), The SCA Statistical System, Reference Manual for Forecasting and Time Series Analysis, Version III, Lisle, IL: Scientific Computing Associates.

(I1982), Forecasting With Univariate Box-Jenkins Models: Concepts and Cases, New York: John Wiley. (1988), Applied Statistical Time Series Analysis, Englewood Cliffs, NJ: Prentice-Hall. C., and Box, G. (1981), 'Modeling Multiple Time Series With Appli- cations,' Journal of the American Statistical Association, 81, 228-237. Modeling Binary Data. New York: Chapman and Hall, 1991. Xiii + 369 pp.

$85 (cloth); $39.95 (paper). This is a valuable companion volume to Cox and Snell (1989). It requires a lower level of mathematical sophistication and covers many topics pertinent to application. The mathematical preparation required for this text is similar to that required for Dobson (1990). The focus of this text, on the other hand, is binary response and logistic model. Chapters 5, 6, 7, 8, and 9 are particularly interesting.

Chapter 5 reviews topics on model diagnostics, in- cluding definitions of different types of residuals, illustrations of residual plots, examination of adequacy of the form of linear predictors, assessment of appropriateness of link functions, and detection of outliers. The discussion of influential observations is especially lucid.

Chapter 6 discusses problems regarding overdispersion (e.g., variability in response probabilities, correlation between binary responses, and random effect). Chapter 7 is devoted to one of the most important applications of logistic model: data of epidemiological studies. The chapter examines the importance of controlling confounders and the meaning of 'adjusted odds ratio' (sec. 7.3) and clearly explains modeling data from case-control (sec.

7.6) and pair-matched case-control studies (sec. Chapter 9 covers some advanced topics, including analysis of proportions when their denominators are not known, quasi-likelihood, cross-over studies, error in explanatory variables, multivariate binary data, and so on.

The chapter consists of short notes that cover material in the literature as recent as 1991. The final chapter presents helpful introductions of some computer programs for logistic models, explaining, for instance, how qualitative explanatory variables are handled by GLIM, Genstat, SAS, BMDP, SPSS, and EGRET. The author is careful to indicate the versions This content downloaded from 195.78.108.60 on Sun, 15 Jun 2014 01:12:07 AM All use subject to JSTOR Terms and Conditions • Book Reviews 707 of the packages he refers to. One special feature of this text is the many real- life examples. CHANG University of California, Los Angeles REFERENCES Cox, D. R., and Snell, E.

(1989), Analysis of Binary Data, (2nd ed.), New York: Chapman and Hall. (1990), An Introduction to Generalized Linear Models, Item Response Theory: Parameter Estimation Techniques. New York: Marcel Dekker, 1992. Viii + 440 pp. One of the greatest services a researcher can perform for his colleagues is to gather information from various journal articles, conference papers, un- published manuscripts, and computer manuals and provide a lucid, in-depth summary in book form.

This is by no means a simple task. Making such a synthesis readily available goes a long way towards advancing understanding and literacy. This is exactly what Baker has done for researchers in the area of item response theory (IRT), focusing on item and ability parameter es- timation. As Baker points out, because of the computational demands of estimation procedures, IRT is not practical without the computer. But in our computer age, when complex algorithms and overwhelming computations can be done so easily and quickly, many people tend to analyze their data without fully understanding the appropriateness of the models they are using. This is especially true for IRT.

Estimation programs such as PC BILOG (Mislevy and Bock 1986) and MULTILOG (Thissen 1986) let the practitioner obtain item and ability parameter estimates quite readily with little understanding of procedures used to fit the model to the data. One might even forget that these are indeed programs to produce estimates. Although it is not always necessary to write down the code that produced the results, it is essential to be able to interpret the results, both quantitatively and substantively. To do so, one needs a basic understanding of the statistical logic of the estimation paradigm and the underlying formulation. Here, too, Baker has done the IRT practitioners a great service.

This book describes the most currently used unidimensional IRT models and furnishes detailed explanations of algorithms that can be used to estimate the parameters in each model. All chapters but the introductory chapter focus on estimating item and/ or ability parameters in different IRT models, including the graded and nominal response models. Each chapter begins with an introduction that outlines the material to be covered, and each chapter's summary highlights the main points. Throughout the text the author presents interesting back- ground information from other nonmeasurement disciplines that have de- veloped related estimation techniques. The reader will find numerous ex- amples drawn from various research studies that help illustrate the different concepts or relationships.

Chapter One traces the roots of IRT from the work of Binet and Simon in the early 1900s, through the ground-breaking work of Lawley, and finally to the seminal development of IRT attributable to Lord. The chapter details the development and assumptions underlying IRT and the item characteristic curve as formulated by both the normal ogive and logistic models. Chapter Two focuses on estimating item parameters when given examinee abilities. In this chapter, as well as subsequent chapters, the author supplies complete equations and formulations used in the estimation process. Several calibration algorithms are discussed, including maximum likelihood esti- mation, minimum chi-square, and minimum transform chi-square. Chapter Three discusses maximum likelihood estimation of examinee ability with item parameters known.

This chapter also clarifies the infor- mation function-IRT's method of assessing the measurement precision of an item and a test. Chapter Four covers the iterative approach of joint maximum likelihood estimation (JMLE) procedures for computing both ability and item parameter estimates for the two-parameter logistic model. Because the JMLE method combines the material presented in Chapters 2 and 3, the algorithms are presented in more general terms. Many examples are treated for clarification, using the computer program LOGIST (Wood, Wingersky, and Lord 1976) that takes this approach. Important related issues, such as placing bounds on the parameter estimates, accounting for missing data, determining the ability metric, and improving consistency of estimation, are also discussed. Chapter Five is devoted to the historical development of the Rasch model and its underlying theory.

The last part of the chapter presents the JMLE procedure of estimating item and ability parameters and evaluating goodness of fit to the model. Chapter Six is a modified version of the didactic article by Harwell, Baker, and Zwarts (1986) on marginal maximum likelihood estimation and the EM algorithm. Several examples from the computer program PC-BILOG (Mislevy and Bock 1986), which uses these algorithms, are discussed. Chapter Seven is a modification of the journal article by Harwell and Baker (1991) on various Bayesian procedures, including Bayes modal (MAP) and expected a posteriori (EAP) estimation. The chapter explains the effects of applying different priors to help estimate item and ability parameters in a clear, concise manner. The book's last two chapters are devoted to estimation routines of IRT models for polychotomous data. Chapter Eight focuses on Samejima's graded response model, and Chapter Nine discusses the estimation of item and ability parameters for Bock's nominal response model.

Both chapters ex- plicitly detail the mathematics of the estimation process. An added bonus is provided in the seven appendixes.

Each appendix contains BASIC computer code that parallels the estimation algorithms dis- cussed in the book, along with annotated output. It must be noted that the book's intended readership does not include students in measurement or testing practitioners who are just beginning to learn IRT. A good background in mathematical statistics, matrix algebra, and calculus is necessary to work through the detailed equations. But Item Response Theory: Parameter Estimation Techniques is an excellent resource for the serious investigator doing research involving estimation of IRT model parameters. About the only negative feature of the book is its high price. ACKERMAN University of Illinois REFERENCES Harwell, M. R., and Baker, F.

(1991), 'The Use of Prior Distributions in Marginalized Bayesian Item Parameter Estimation: A Didactic,' Applied Psychological Measure- ment, 15, 375-390. R., Baker, F. B., and Zwarts, M. (1988), 'Item Parameter Estimation Via Marginal Maximum Likelihood and an EM Algorithm: A Didactic,' Journal of Educational Statistics, 13, 243-271. J., and Bock, R. (1986), PC-BILOG: Item Analysis and Test Scoring With Binary Logistic Models, Mooresville, IN: Scientific Software, Inc. (1986), MULTILOG Version 5 User's Guide, Mooresville, IN: Scientific Software, Inc.

L., Wungersky, M. S., and Lord, F. (1976), LOGIST: A Computer Program for Estimating Examinee Ability and Item Characteristic Curve Parameters (RM- 76-6), Princeton, NJ: Educational Testing Service. The Design and Analysis of Sequential Clinical Trials (2nd ed.) John Whitehead. New York: Ellis Horwood, 1992. This is the second edition of a well-written book about sequential clinical trials.

The book begins with several chapters on general principles of such trials: 1. 'Clinical Trials' 2. 'Allocating Patients to Treatments' 3. 'Measurement of Treatment Differences' The clear models presented are worthwhile reading for any statistician, whether or not he or she plans to participate in clinical trials. I found myself marking the margins with exclamation points or 'amen' at many points in these chapters.

Chapter 3 begins to introduce ideas of sequential testing. The next two chapters introduce the material on sequential trials. Chapter 4 covers designing a sequential trial using the boundaries approach. It in- troduces several procedures, including the triangular test, the truncated se- quential probability test, and the restricted procedure.

The chapter concludes with a theoretical discussion of these procedures. Chapter 5 covers material related to the analysis of a sequential trial, including continuous and discrete monitoring, overrunning and underrunning, and estimation of treatment effects. Chapter 6, 'Alternative Approaches to Design and Analysis of Se- quential Clinical Trials,' covers repeated significance testing, the spending approach of Lan and DeMets, and Bayesian approaches, among others. Chapter 7 covers prognostic factors, including stratification, covariate ad- justment, and generalized linear models.

Chapter 8 discusses comparisons of more than two treatments: orthogonal comparisons and dose-response. The last chapter, Chapter 9, discusses implementation of these ideas. The author has developed a computer program, PEST2, that implements these procedures. Chapter 9 also provides analysis of several examples. The book is clearly written and a pleasure to read.

I found no typographical errors. The reader needs a solid introductory year of statistics (of course, more never hurts). This content downloaded from 195.78.108.60 on Sun, 15 Jun 2014 01:12:07 AM All use subject to JSTOR Terms and Conditions Article Contents p.

707 Issue Table of Contents Journal of the American Statistical Association, Vol. 422 (Jun., 1993), pp.

389-718 Front Matter [pp. ] Editors' Report for 1992 [pp. 389] Applications and Case Studies Regional Trends in Sulfate Wet Deposition [pp. 390-399] On the Use of Cause-Specific Failure and Conditional Failure Probabilities: Examples From Clinical Oncology Data [pp. 400-409] Experimental Design for Clonogenic Assays in Chemotherapy [pp.

410-420] Statistical Modeling of Expert Ratings on Medical Treatment Appropriateness [pp. 421-427] A Bayesian Sequential Experimental Study of Learning in Games [pp. 428-435] An Algorithm for the Detection and Measurement of Rail Surface Defects [pp. 436-440] Predicting Shifts in the Mean of a Multivariate Time Series Process: An Application in Predicting Business Failures [pp. 441-449] Theory and Methods Integration of Multimodal Functions by Monte Carlo Importance Sampling [pp. 450-456] Survival Trees by Goodness of Split [pp. 457-467] Multivariate Normal Mixtures: A Fast Consistent Method of Moments [pp.

468-476] A Semiparametric Bootstrap Technique for Simulating Extreme Order Statistics [pp. 477-485] Linear Model Selection by Cross-Validation [pp.

486-494] Smoothing Spline Density Estimation: A Dimensionless Automatic Algorithm [pp. 495-504] Robust Singular Value Decompositions: A New Approach to Projection Pursuit [pp. 505-514] Unmasking Outliers and Leverage Points: A Confirmation [pp. 515-519] Comparison of Smoothing Parameterizations in Bivariate Kernel Density Estimation [pp. 520-528] Adaptive Smoothing and Density-Based Tests of Multivariate Normality [pp. 529-537] Nonlinear Experiments: Optimal Design and Inference Based on Likelihood [pp. 538-546] Assessing Influence in Variable Selection Problems [pp.

547-556] Linear Regression Analysis for Highly Stratified Failure Time Data [pp. 557-565] On Estimating a Survival Curve Subject to a Uniform Stochastic Ordering Constraint [pp. 566-572] Identifiability of Bivariate Survival Curves from Censored Data [pp. 573-579] The Accuracy of Elemental Set Approximations for Regression [pp. 580-589] Maximum Likelihood Estimation of Regression Models With Stochastic Trend Components [pp. 590-595] Testing for a Moving Average Unit Root in Autoregressive Integrated Moving Average Models [pp. 596-601] Nonparametric Estimation of Rate Equations for Nutrient Uptake [pp.

602-614] Shrinkage Estimators of Relative Potency [pp. 615-621] On Optimal Screening Ages [pp. 622-628] Hypothesis Testing With Complex Survey Data: The Use of Classical Quadratic Test Statistics With Particular Reference to Regression Problems [pp. 629-641] Experimental Designs for Model Discrimination [pp. 642-649] Combining Independent Tests in Linear Models [pp. 650-655] A New Confidence Interval Method Based on the Normal Approximation for the Difference of Two Binomial Probabilities [pp.

656-661] Estimating Hot Numbers and Testing Uniformity for the Lottery [pp. 662-668] Bayesian Analysis of Binary and Polychotomous Response Data [pp. 669-679] Saddlepoint Approximations to the CDF of Some Statistics with Nonnormal Limit Distributions [pp. 680-686] Bayes and Empirical Bayes Procedures for Comparing Parameters [pp. 687-693] Book Reviews [List of Book Reviews] [pp.

694] Review: untitled [pp. 695] Review: untitled [pp. 695-696] Review: untitled [pp. 696-697] Review: untitled [pp. 697-698] Review: untitled [pp.

698-699] Review: untitled [pp. 699] Review: untitled [pp. 699] Review: untitled [pp. 699-700] Review: untitled [pp.

700-702] Review: untitled [pp. 702] Review: untitled [pp. 702-703] Review: untitled [pp. 703] Review: untitled [pp. 704] Review: untitled [pp. 704] Review: untitled [pp.

704-705] Review: untitled [pp. 705-706] Review: untitled [pp. 706-707] Review: untitled [pp. 707] Review: untitled [pp. 707-708] Review: untitled [pp. 708-710] Review: untitled [pp.

710] Review: untitled [pp. 710] Review: untitled [pp. 710-711] Review: untitled [pp. 711-713] Publications Received [pp. 713-715] Letters to the Editor [pp. 716-717] Correction: Bandwidth Choice for Average Derivative Estimation [pp.

718] Back Matter [pp.

More Since the original publication of the bestselling Modelling Binary Data, a number of important methodological and computational developments have emerged, accompanied by the steady growth of statistical computing. Mixed models for binary data analysis and procedures that lead to an exact version of logistic regression form valuable additions to the statistician's toolbox, and author Dave Collett has fully updated his popular treatise to incorporate these important advances. Modelling Binary Data, Second Edition now provides an even more comprehensive and practical guide to statistical methods for analyzing binary data. Along with thorough revisions to the original material-now independent of any particular software package- it includes a new chapter introducing mixed models for binary data analysis and another on exact methods for modelling binary data. The author has also added material on modelling ordered categorical data and provides a summary of the leading software packages. All of the data sets used in the book are available for download from the Internet, and the appendices include additional data sets useful as exercises.