Regression Models for Categorical, Count, and Related Variables: An Applied Approach

Ebook634 pages4 hours

Regression Models for Categorical, Count, and Related Variables: An Applied Approach

Name: Regression Models for Categorical, Count, and Related Variables: An Applied Approach
Author: Dr. John P. Hoffmann
ISBN: 9780520965492

By Dr. John P. Hoffmann

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Social science and behavioral science students and researchers are often confronted with data that are categorical, count a phenomenon, or have been collected over time. Sociologists examining the likelihood of interracial marriage, political scientists studying voting behavior, criminologists counting the number of offenses people commit, health scientists studying the number of suicides across neighborhoods, and psychologists modeling mental health treatment success are all interested in outcomes that are not continuous. Instead, they must measure and analyze these events and phenomena in a discrete manner.

This book provides an introduction and overview of several statistical models designed for these types of outcomes—all presented with the assumption that the reader has only a good working knowledge of elementary algebra and has taken introductory statistics and linear regression analysis.

Numerous examples from the social sciences demonstrate the practical applications of these models. The chapters address logistic and probit models, including those designed for ordinal and nominal variables, regular and zero-inflated Poisson and negative binomial models, event history models, models for longitudinal data, multilevel models, and data reduction techniques such as principal components and factor analysis.

Each chapter discusses how to utilize the models and test their assumptions with the statistical software Stata, and also includes exercise sets so readers can practice using these techniques. Appendices show how to estimate the models in SAS, SPSS, and R; provide a review of regression assumptions using simulations; and discuss missing data.

A companion website includes downloadable versions of all the data sets used in the book.

Skip carousel

Social Science

LanguageEnglish

PublisherUniversity of California Press

Release dateAug 16, 2016

ISBN9780520965492

Author

Dr. John P. Hoffmann

John P. Hoffmann is Professor of Sociology at Brigham Young University. Before arriving at BYU, he was a senior research scientist at the National Opinion Research Center (NORC), a nonprofit firm affiliated with the University of Chicago. He received a master’s in Law and Justice from American University and a doctorate in Criminal Justice from SUNY–Albany. He also received a master’s in Public Health with emphases in Epidemiology and Behavioral Sciences at Emory University. His research addresses drug use, juvenile delinquency, and the sociology of religion.

Related authors

Skip carousel

Related to Regression Models for Categorical, Count, and Related Variables

Related ebooks

Skip carousel

Methods and Applications of Longitudinal Data Analysis
Ebook
Methods and Applications of Longitudinal Data Analysis
byXian Liu
Rating: 0 out of 5 stars
0 ratings
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
Ebook
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
byJim Frost
Rating: 5 out of 5 stars
5/5
Data Mining for the Social Sciences: An Introduction
Ebook
Data Mining for the Social Sciences: An Introduction
byPaul Attewell
Rating: 0 out of 5 stars
0 ratings
Time Series Analysis in the Social Sciences: The Fundamentals
Ebook
Time Series Analysis in the Social Sciences: The Fundamentals
byYouseop Shin
Rating: 0 out of 5 stars
0 ratings
Biostatistics and Computer-based Analysis of Health Data using Stata
Ebook
Biostatistics and Computer-based Analysis of Health Data using Stata
byChristophe Lalanne
Rating: 0 out of 5 stars
0 ratings
Statistics in Psychology Using R and SPSS
Ebook
Statistics in Psychology Using R and SPSS
byDieter Rasch
Rating: 0 out of 5 stars
0 ratings
SPSS for you
Ebook
SPSS for you
byA Rajathi
Rating: 4 out of 5 stars
4/5
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
Ebook
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
byLee Baker
Rating: 0 out of 5 stars
0 ratings
Applied Time Series Analysis: A Practical Guide to Modeling and Forecasting
Ebook
Applied Time Series Analysis: A Practical Guide to Modeling and Forecasting
byTerence C. Mills
Rating: 5 out of 5 stars
5/5
Simulation for Data Science with R
Ebook
Simulation for Data Science with R
byMatthias Templ
Rating: 0 out of 5 stars
0 ratings
Essentials of Applied Econometrics
Ebook
Essentials of Applied Econometrics
byAaron D. Smith
Rating: 0 out of 5 stars
0 ratings
Surviving Statistics: A Professor's Guide to Getting Through
Ebook
Surviving Statistics: A Professor's Guide to Getting Through
byLuther Maddy
Rating: 0 out of 5 stars
0 ratings
SPSS for Applied Sciences: Basic Statistical Testing
Ebook
SPSS for Applied Sciences: Basic Statistical Testing
byCole Davis
Rating: 3 out of 5 stars
3/5
Survival Analysis Using SAS: A Practical Guide, Second Edition
Ebook
Survival Analysis Using SAS: A Practical Guide, Second Edition
byPaul D. Allison
Rating: 0 out of 5 stars
0 ratings
Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Effects Modeling Approaches
Ebook
Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Effects Modeling Approaches
byHulin Wu
Rating: 0 out of 5 stars
0 ratings
Biostatistics and Computer-based Analysis of Health Data Using SAS
Ebook
Biostatistics and Computer-based Analysis of Health Data Using SAS
byChristophe Lalanne
Rating: 0 out of 5 stars
0 ratings
Thinking Statistically
Ebook
Thinking Statistically
byAnthony Banfield
Rating: 5 out of 5 stars
5/5
Principles of Data Management and Presentation
Ebook
Principles of Data Management and Presentation
byDr. John P. Hoffmann
Rating: 5 out of 5 stars
5/5
JMP for Basic Univariate and Multivariate Statistics: Methods for Researchers and Social Scientists, Second Edition
Ebook
JMP for Basic Univariate and Multivariate Statistics: Methods for Researchers and Social Scientists, Second Edition
byAnn Lehman, PhD
Rating: 0 out of 5 stars
0 ratings
Applied Statistical Modeling and Data Analytics: A Practical Guide for the Petroleum Geosciences
Ebook
Applied Statistical Modeling and Data Analytics: A Practical Guide for the Petroleum Geosciences
bySrikanta Mishra
Rating: 5 out of 5 stars
5/5
Statistics II for Dummies
Ebook
Statistics II for Dummies
byDeborah J. Rumsey
Rating: 4 out of 5 stars
4/5
An Introduction to Probability and Statistical Inference
Ebook
An Introduction to Probability and Statistical Inference
byGeorge G. Roussas
Rating: 0 out of 5 stars
0 ratings
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
Ebook
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
byJens K. Perret
Rating: 0 out of 5 stars
0 ratings
JMP for Mixed Models
Ebook
JMP for Mixed Models
byRuth Hummel
Rating: 0 out of 5 stars
0 ratings
Demographic Forecasting
Ebook
Demographic Forecasting
byFederico Girosi
Rating: 0 out of 5 stars
0 ratings
Logistic Regression Using SAS: Theory and Application, Second Edition
Ebook
Logistic Regression Using SAS: Theory and Application, Second Edition
byPaul D. Allison
Rating: 4 out of 5 stars
4/5
Chi Squared for Beginners
Ebook
Chi Squared for Beginners
byStephanie Glen
Rating: 0 out of 5 stars
0 ratings
Linear Regression with Multiple Covariates
Ebook
Linear Regression with Multiple Covariates
byBrett Kottmann
Rating: 0 out of 5 stars
0 ratings
Handbook of Statistical Analysis and Data Mining Applications
Ebook
Handbook of Statistical Analysis and Data Mining Applications
byRobert Nisbet
Rating: 4 out of 5 stars
4/5
Current Topics in Survey Sampling: Proceedings of the International Symposium on Survey Sampling Held in Ottawa, Canada, May 7-9, 1980
Ebook
Current Topics in Survey Sampling: Proceedings of the International Symposium on Survey Sampling Held in Ottawa, Canada, May 7-9, 1980
byD. Krewski
Rating: 0 out of 5 stars
0 ratings

Social Science For You

Skip carousel

Becoming Cliterate: Why Orgasm Equality Matters--And How to Get It
Ebook
Becoming Cliterate: Why Orgasm Equality Matters--And How to Get It
byDr. Laurie Mintz
Rating: 4 out of 5 stars
4/5
Fervent: A Woman's Battle Plan to Serious, Specific, and Strategic Prayer
Ebook
Fervent: A Woman's Battle Plan to Serious, Specific, and Strategic Prayer
byPriscilla Shirer
Rating: 5 out of 5 stars
5/5
Greek Mythology: The Gods, Goddesses, and Heroes Handbook: From Aphrodite to Zeus, a Profile of Who's Who in Greek Mythology
Ebook
Greek Mythology: The Gods, Goddesses, and Heroes Handbook: From Aphrodite to Zeus, a Profile of Who's Who in Greek Mythology
byLiv Albert
Rating: 4 out of 5 stars
4/5
Read People Like a Book: How to Analyze, Understand, and Predict People’s Emotions, Thoughts, Intentions, and Behaviors
Ebook
Read People Like a Book: How to Analyze, Understand, and Predict People’s Emotions, Thoughts, Intentions, and Behaviors
byPatrick King
Rating: 4 out of 5 stars
4/5
My Secret Garden: Women's Sexual Fantasies
Ebook
My Secret Garden: Women's Sexual Fantasies
byNancy Friday
Rating: 4 out of 5 stars
4/5
The Like Switch: An Ex-FBI Agent's Guide to Influencing, Attracting, and Winning People Over
Ebook
The Like Switch: An Ex-FBI Agent's Guide to Influencing, Attracting, and Winning People Over
byJack Schafer
Rating: 4 out of 5 stars
4/5
A People's History of the United States
Ebook
A People's History of the United States
byHoward Zinn
Rating: 4 out of 5 stars
4/5
You're Not Listening: What You're Missing and Why It Matters
Ebook
You're Not Listening: What You're Missing and Why It Matters
byKate Murphy
Rating: 4 out of 5 stars
4/5
The Art of Witty Banter: Be Clever, Quick, & Magnetic
Ebook
The Art of Witty Banter: Be Clever, Quick, & Magnetic
byPatrick King
Rating: 4 out of 5 stars
4/5
The Encyclopedia of Misinformation: A Compendium of Imitations, Spoofs, Delusions, Simulations, Counterfeits, Impostors, Illusions, Confabulations, Skullduggery, ... Conspiracies & Miscellaneous Fakery
Ebook
The Encyclopedia of Misinformation: A Compendium of Imitations, Spoofs, Delusions, Simulations, Counterfeits, Impostors, Illusions, Confabulations, Skullduggery, ... Conspiracies & Miscellaneous Fakery
byRex Sorgatz
Rating: 4 out of 5 stars
4/5
Verbal Judo, Second Edition: The Gentle Art of Persuasion
Ebook
Verbal Judo, Second Edition: The Gentle Art of Persuasion
byGeorge J. Thompson, PhD
Rating: 4 out of 5 stars
4/5
One Nation Under Blackmail - Vol. 1: The Sordid Union Between Intelligence and Crime that Gave Rise to Jeffrey Epstein, VOL.1
Ebook
One Nation Under Blackmail - Vol. 1: The Sordid Union Between Intelligence and Crime that Gave Rise to Jeffrey Epstein, VOL.1
byWhitney Alyse Webb
Rating: 5 out of 5 stars
5/5
100 Amazing Facts About the Negro with Complete Proof
Ebook
100 Amazing Facts About the Negro with Complete Proof
byJ. A. Rogers
Rating: 4 out of 5 stars
4/5
All About Love: New Visions
Ebook
All About Love: New Visions
bybell hooks
Rating: 4 out of 5 stars
4/5
Come As You Are: Revised and Updated: The Surprising New Science That Will Transform Your Sex Life
Ebook
Come As You Are: Revised and Updated: The Surprising New Science That Will Transform Your Sex Life
byEmily Nagoski
Rating: 4 out of 5 stars
4/5
King, Warrior, Magician, Lover: Rediscovering the Archetypes of the Mature Masculine
Ebook
King, Warrior, Magician, Lover: Rediscovering the Archetypes of the Mature Masculine
byRobert Moore
Rating: 4 out of 5 stars
4/5
Ghosts of the Tsunami: Death and Life in Japan's Disaster Zone
Ebook
Ghosts of the Tsunami: Death and Life in Japan's Disaster Zone
byRichard Lloyd Parry
Rating: 4 out of 5 stars
4/5
Women Don't Owe You Pretty
Ebook
Women Don't Owe You Pretty
byFlorence Given
Rating: 4 out of 5 stars
4/5
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Ebook
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
byMargot Lee Shetterly
Rating: 4 out of 5 stars
4/5
One Nation Under Blackmail – Vol. 2: The Sordid Union Between Intelligence and Organized Crime that Gave Rise to Jeffrey Epstein Vol. 2
Ebook
One Nation Under Blackmail – Vol. 2: The Sordid Union Between Intelligence and Organized Crime that Gave Rise to Jeffrey Epstein Vol. 2
byWhitney Alyse Webb
Rating: 5 out of 5 stars
5/5
Nickel and Dimed: On (Not) Getting By in America
Ebook
Nickel and Dimed: On (Not) Getting By in America
byBarbara Ehrenreich
Rating: 4 out of 5 stars
4/5
Hunt, Gather, Parent: What Ancient Cultures Can Teach Us About the Lost Art of Raising Happy, Helpful Little Humans
Ebook
Hunt, Gather, Parent: What Ancient Cultures Can Teach Us About the Lost Art of Raising Happy, Helpful Little Humans
byMichaeleen Doucleff
Rating: 4 out of 5 stars
4/5
The Sun Does Shine: How I Found Life and Freedom on Death Row (Oprah's Book Club Selection)
Ebook
The Sun Does Shine: How I Found Life and Freedom on Death Row (Oprah's Book Club Selection)
byAnthony Ray Hinton
Rating: 4 out of 5 stars
4/5
Men Who Hate Women: From Incels to Pickup Artists: The Truth about Extreme Misogyny and How it Affects Us All
Ebook
Men Who Hate Women: From Incels to Pickup Artists: The Truth about Extreme Misogyny and How it Affects Us All
byLaura Bates
Rating: 4 out of 5 stars
4/5
Dreamland: The True Tale of America's Opiate Epidemic
Ebook
Dreamland: The True Tale of America's Opiate Epidemic
bySam Quinones
Rating: 4 out of 5 stars
4/5
The Woman They Could Not Silence: One Woman, Her Incredible Fight for Freedom, and the Men Who Tried to Make Her Disappear (Women's History Month, True Story about an Inspirational Woman)
Ebook
The Woman They Could Not Silence: One Woman, Her Incredible Fight for Freedom, and the Men Who Tried to Make Her Disappear (Women's History Month, True Story about an Inspirational Woman)
byKate Moore
Rating: 4 out of 5 stars
4/5
The Human Condition
Ebook
The Human Condition
byHannah Arendt
Rating: 4 out of 5 stars
4/5
Generations: The Real Differences Between Gen Z, Millennials, Gen X, Boomers, and Silents—and What They Mean for America's Future
Ebook
Generations: The Real Differences Between Gen Z, Millennials, Gen X, Boomers, and Silents—and What They Mean for America's Future
byJean M. Twenge
Rating: 4 out of 5 stars
4/5
Just Mercy: a story of justice and redemption
Ebook
Just Mercy: a story of justice and redemption
byBryan Stevenson
Rating: 5 out of 5 stars
5/5
The Denial of Death
Ebook
The Denial of Death
byErnest Becker
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Collective Accuracy: Agent Based & Emergent vs Statistical and Assumed: Conference Agent-Based Modeling in Philosophy
Podcast episode
Collective Accuracy: Agent Based & Emergent vs Statistical and Assumed: Conference Agent-Based Modeling in Philosophy
byCenter for Advanced Studies (CAS) Research Focus Reduction and Emergence (LMU)
0 ratings
0% found this document useful
Alignment Newsletter #173: Recent language model results from DeepMind: Recent language model results from DeepMind
Podcast episode
Alignment Newsletter #173: Recent language model results from DeepMind: Recent language model results from DeepMind
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
058R_An adaptive learning process for developing and applying sustainability indicators with local communities (research summary)
Podcast episode
058R_An adaptive learning process for developing and applying sustainability indicators with local communities (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
Nonhomogenous Fluids: Modellansatz 189
Podcast episode
Nonhomogenous Fluids: Modellansatz 189
byModellansatz
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Business, Management, and Marketing
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Public Policy
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Education
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Economics
0 ratings
0% found this document useful
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
Podcast episode
Nick Huntington-Klein, "The Effect: An Introduction to Research Design and Causality" (CRC Press, 2021): An interview with Nick Huntington-Klein
byNew Books in Sociology
0 ratings
0% found this document useful
149R_Urban futures: systemic or system changing interventions? A literature review using Meadows’ leverage points as analytical framework (research summary)
Podcast episode
149R_Urban futures: systemic or system changing interventions? A literature review using Meadows’ leverage points as analytical framework (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
Integrating AAC into Behavioral Programming
Podcast episode
Integrating AAC into Behavioral Programming
bySLP Nerdcast
0 ratings
0% found this document useful
Editing Large Language Models: Problems, Methods, and Opportunities: Recent advancements in deep learning have precipitated the emergence of large language models (LLMs) which exhibit an impressive aptitude for understanding and producing text akin to human language. Despite the ability to train highly capable LLMs, t...
Podcast episode
Editing Large Language Models: Problems, Methods, and Opportunities: Recent advancements in deep learning have precipitated the emergence of large language models (LLMs) which exhibit an impressive aptitude for understanding and producing text akin to human language. Despite the ability to train highly capable LLMs, t...
byPapers Read on AI
0 ratings
0% found this document useful
B. Fong and D. I. Spivak, "An Invitation to Applied Category Theory: Seven Sketches in Compositionality" (Cambridge UP, 2019): Fong and Spivak have written a marvelous and timely new textbook that, as its title suggests, invites readers of all backgrounds to explore what it means to take a compositional approach and how it might serve their needs....
Podcast episode
B. Fong and D. I. Spivak, "An Invitation to Applied Category Theory: Seven Sketches in Compositionality" (Cambridge UP, 2019): Fong and Spivak have written a marvelous and timely new textbook that, as its title suggests, invites readers of all backgrounds to explore what it means to take a compositional approach and how it might serve their needs....
byNew Books in Mathematics
0 ratings
0% found this document useful
Stephen E. Nadeau, “The Neural Architecture of Grammar” (MIT Press, 2012): Although there seems to be a trend towards linguistic theories getting more cognitively or neurally plausible, there doesn’t seem to be an imminent prospect of a reconciliation between linguistics and neuroscience.
Podcast episode
Stephen E. Nadeau, “The Neural Architecture of Grammar” (MIT Press, 2012): Although there seems to be a trend towards linguistic theories getting more cognitively or neurally plausible, there doesn’t seem to be an imminent prospect of a reconciliation between linguistics and neuroscience.
byNew Books in Psychology
0 ratings
0% found this document useful
Bringing it All Together: Chaining Procedures in AAC
Podcast episode
Bringing it All Together: Chaining Procedures in AAC
bySLP Nerdcast
0 ratings
0% found this document useful
Stephen E. Nadeau, “The Neural Architecture of Grammar” (MIT Press, 2012): Although there seems to be a trend towards linguistic theories getting more cognitively or neurally plausible, there doesn’t seem to be an imminent prospect of a reconciliation between linguistics and neuroscience.
Podcast episode
Stephen E. Nadeau, “The Neural Architecture of Grammar” (MIT Press, 2012): Although there seems to be a trend towards linguistic theories getting more cognitively or neurally plausible, there doesn’t seem to be an imminent prospect of a reconciliation between linguistics and neuroscience.
byNew Books in Language
0 ratings
0% found this document useful
Assessment and the Status Quo: The Pros and Cons of the Standardized Assessment
Podcast episode
Assessment and the Status Quo: The Pros and Cons of the Standardized Assessment
bySLP Nerdcast
0 ratings
0% found this document useful
Heterogeneous Treatment Effects: When data scientists use a linear regression to l…
Podcast episode
Heterogeneous Treatment Effects: When data scientists use a linear regression to l…
byLinear Digressions
0 ratings
0% found this document useful
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
Podcast episode
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
byUVA Data Points
0 ratings
0% found this document useful
Democratizing Causality - Aleksander Molak
Podcast episode
Democratizing Causality - Aleksander Molak
byDataTalks.Club
0 ratings
0% found this document useful
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
Podcast episode
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
byLinear Digressions
0 ratings
0% found this document useful
Simple synthetic data reduces sycophancy in large language models: Sycophancy is an undesirable behavior where models tailor their responses to follow a human user's view even when that view is not objectively correct (e.g., adapting liberal views once a user reveals that they are liberal). In this paper, we study t...
Podcast episode
Simple synthetic data reduces sycophancy in large language models: Sycophancy is an undesirable behavior where models tailor their responses to follow a human user's view even when that view is not objectively correct (e.g., adapting liberal views once a user reveals that they are liberal). In this paper, we study t...
byPapers Read on AI
0 ratings
0% found this document useful
136 — Does the language of L&D matter?: In Learning & Development, we love a good buzzword: 'blended learning', 'micro learning', 'learning management systems'... anything with 'learning', really. Is this a problem? Or just a time-wasting argument? This week on The GoodPractice Podcast,...
Podcast episode
136 — Does the language of L&D matter?: In Learning & Development, we love a good buzzword: 'blended learning', 'micro learning', 'learning management systems'... anything with 'learning', really. Is this a problem? Or just a time-wasting argument? This week on The GoodPractice Podcast,...
byThe Mind Tools L&D Podcast
0 ratings
0% found this document useful
Turbulence: Modellansatz 155
Podcast episode
Turbulence: Modellansatz 155
byModellansatz - English episodes only
0 ratings
0% found this document useful
Lost in the Middle: How Language Models Use Long Contexts
Podcast episode
Lost in the Middle: How Language Models Use Long Contexts
byDeep Papers
0 ratings
0% found this document useful
A Survey on Large Language Model based Autonomous Agents: Autonomous agents have long been a prominent research topic in the academic community. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from the human l...
Podcast episode
A Survey on Large Language Model based Autonomous Agents: Autonomous agents have long been a prominent research topic in the academic community. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from the human l...
byPapers Read on AI
0 ratings
0% found this document useful
Bridging the Research-to-Practice Gap Part 1: It’s Not Your Fault
Podcast episode
Bridging the Research-to-Practice Gap Part 1: It’s Not Your Fault
bySLP Nerdcast
0 ratings
0% found this document useful
Varsity A/B Testing: When you want to understand if doing something ca…
Podcast episode
Varsity A/B Testing: When you want to understand if doing something ca…
byLinear Digressions
0 ratings
0% found this document useful
Cerebral Fluid Flow: Modellansatz 134
Podcast episode
Cerebral Fluid Flow: Modellansatz 134
byModellansatz - English episodes only
0 ratings
0% found this document useful
Diffusion Model-Based Image Editing: A Survey: Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse ...
Podcast episode
Diffusion Model-Based Image Editing: A Survey: Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse ...
byPapers Read on AI
0 ratings
0% found this document useful

Skip carousel

Guidelines for Reading & Interpreting Sports Science Research
UltraRunning Magazine
Article
Guidelines for Reading & Interpreting Sports Science Research
Oct 29, 2021
5 min read
How Quickly Do Large Language Models Learn Unexpected Skills?
Nautilus
Article
How Quickly Do Large Language Models Learn Unexpected Skills?
Mar 8, 2024
4 min read
The National Academies Illustrates the More Nuanced Value of Transparency in Science
Union of Concerned Scientists
Article
The National Academies Illustrates the More Nuanced Value of Transparency in Science
May 13, 2019
4 min read
The Stereotypes That Distort How Americans Teach and Learn Math
The Atlantic
Article
The Stereotypes That Distort How Americans Teach and Learn Math
Nov 12, 2013
5 min read
For Google, Everything Is a Popularity Contest
The Atlantic
Article
For Google, Everything Is a Popularity Contest
Jun 27, 2017
8 min read
Researchers Gain New Understanding From Simple AI
Nautilus
Article
Researchers Gain New Understanding From Simple AI
Apr 15, 2022
In the last two years, artificial intelligence programs have reached a surprising level of linguistic fluency. The biggest and best of these are all based on an architecture invented in 2017 called the transformer. It serves as a kind of blueprint fo
5 min read
Q&A
Rotman Management
Article
Q&A
May 1, 2022
You describe yourself as an anthropologist working the context of innovation. Describe the links between the two fields. At the highest level, both Anthropology and innovation involve an innate sense of curiosity that allows us to question assumption
6 min read
Beyond Big Data
Business Today
Article
Beyond Big Data
Sep 4, 2019
1 min read
We Reviewed More Than 150 Papers on Water Management. Here’s What We Learned.
Union of Concerned Scientists
Article
We Reviewed More Than 150 Papers on Water Management. Here’s What We Learned.
Aug 23, 2023
A new study synthesizes learnings, findings, and policy recommendations from numerous scientific papers that applied hydroeconomic modeling.
2 min read
Findings Show Replicability In Social-behavior Sciences
Futurity
Article
Findings Show Replicability In Social-behavior Sciences
Nov 15, 2023
With best practices, high replicability is achievable in social-behavior sciences research, researchers report. Roughly two decades ago, a community-wide reckoning emerged concerning the credibility of published literature in the social-behavioral sc
3 min read
'The Cloud' and Other Dangerous Metaphors
The Atlantic
Article
'The Cloud' and Other Dangerous Metaphors
Jan 20, 2015
4 min read
Open Access: Should One Model Ever Fit All?
AQ: Australian Quarterly
Article
Open Access: Should One Model Ever Fit All?
Jun 30, 2019
10 min read
Better Together: Behavioural Science + Data Science
Rotman Management
Article
Better Together: Behavioural Science + Data Science
May 1, 2020
IMAGINE THIS SCENARIO: You are designing a new customer experience to drive a shift in customer behaviour. You have reviewed the reports and dashboards describing current behaviour. You have asked customers how they felt and incorporated their feedba
5 min read
Re-Framing Innovation: Integrating Behavioural Science and Design
Rotman Management
Article
Re-Framing Innovation: Integrating Behavioural Science and Design
May 1, 2019
11 min read
The Problem With A New Study On Mentorship In Science
Nautilus
Article
The Problem With A New Study On Mentorship In Science
Dec 2, 2020
5 min read
The Most Disrespected Document in Higher Education
The Atlantic
Article
The Most Disrespected Document in Higher Education
Aug 21, 2023
4 min read
Updated Restricted Science Rule Spells Reanalysis Paralysis for the EPA
Union of Concerned Scientists
Article
Updated Restricted Science Rule Spells Reanalysis Paralysis for the EPA
Nov 12, 2019
7 min read
With So Many COVID-19 Models, Which Is Best?
Futurity
Article
With So Many COVID-19 Models, Which Is Best?
May 8, 2020
3 min read
Opinion: Patients Vs. Paywalls: Is The U.S. Ready For Open-access Publishing?
STAT
Article
Opinion: Patients Vs. Paywalls: Is The U.S. Ready For Open-access Publishing?
Sep 14, 2018
As U.S. scientific publishing moves toward an open-access model, publishing companies will need to adapt and innovate.
5 min read
One Simple Trick to Reduce Your Carbon Footprint
Union of Concerned Scientists
Article
One Simple Trick to Reduce Your Carbon Footprint
Nov 14, 2017
3 min read
Think Like a Researcher
UltraRunning Magazine
Article
Think Like a Researcher
Nov 26, 2021
When I was asked to start writing this column in 2015, I had just started as an assistant professor at The College of Idaho after moving from the Bay Area. I had just injured myself and was struggling to regain the peak running form I achieved a few
6 min read
Lego Robot Squirts Liquids in DIY Lab for Kids
Futurity
Article
Lego Robot Squirts Liquids in DIY Lab for Kids
Mar 27, 2017
A new set of liquid-handling robots—made from the Lego Mindstorms robotics kit and a cheap, easy-to-find syringe—can transfer precise amounts of fluids among flasks, test tubes, and experimental dishes. The robot approaches the performance of automat
3 min read
Code A Cataloguing Application In Python
Linux Format
Article
Code A Cataloguing Application In Python
Nov 15, 2022
Credit: www.djangoproject.com Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://github.com/mat
8 min read
Online Bettors Can Sniff Out Weak Psychology Studies
The Atlantic
Article
Online Bettors Can Sniff Out Weak Psychology Studies
Aug 27, 2018
6 min read
The Case Against Lectures
Nautilus
Article
The Case Against Lectures
Apr 11, 2018
4 min read
A Feel-Good Report Card Won’t Help Children
Manhattan Institute
Article
A Feel-Good Report Card Won’t Help Children
Oct 13, 2020
4 min read
A Graduate Researcher’s (Brief) Guide to: Creating a Student Science Policy Group
Union of Concerned Scientists
Article
A Graduate Researcher’s (Brief) Guide to: Creating a Student Science Policy Group
Apr 18, 2018
5 min read
Greenwashing in Graphs: an ExxonMobil Story
Union of Concerned Scientists
Article
Greenwashing in Graphs: an ExxonMobil Story
Apr 9, 2024
Research Scientist Carly Phillips takes a look at ExxonMobil's latest climate report to see if it bears up to scientific scrutiny (spoiler: nope).
4 min read
Let Your Supply Chains Be Sustainable Too: SOUND ADVICE From LUXEMBOURG
The European Business Review
Article
Let Your Supply Chains Be Sustainable Too: SOUND ADVICE From LUXEMBOURG
Mar 31, 2020
4 min read
Tweaked Science Textbook Diagrams Boost Student Understanding
Futurity
Article
Tweaked Science Textbook Diagrams Boost Student Understanding
Apr 18, 2024
Life cycle diagrams are ubiquitous in science textbooks, and they may be due for some updates, according to a new study. The findings show that simple design changes in science textbook diagrams can have a dramatic impact on the ability of undergradu
2 min read

Related categories

Skip carousel

Reviews for Regression Models for Categorical, Count, and Related Variables

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Regression Models for Categorical, Count, and Related Variables - Dr. John P. Hoffmann

Regression Models for Categorical, Count, and Related Variables

An Applied Approach

JOHN P. HOFFMANN

UC Logo

UNIVERSITY OF CALIFORNIA PRESS

University of California Press, one of the most distinguished university presses in the United States, enriches lives around the world by advancing scholarship in the humanities, social sciences, and natural sciences. Its activities are supported by the UC Press Foundation and by philanthropic contributions from individuals and institutions. For more information, visit www.ucpress.edu.

University of California Press

Oakland, California

Library of Congress Cataloging-in-Publication Data

Names: Hoffmann, John P. (John Patrick), 1962– author.

Title: Regression models for categorical, count, and related variables : an applied approach / John P. Hoffmann.

Description: Oakland, California : University of California Press, [2016] | Includes bibliographical references and index.

Identifiers: LCCN 2016030975 (print) | LCCN 2016032215 (ebook) | ISBN 9780520289291 (pbk. : alk. paper) | ISBN 0520289293 (pbk. : alk. paper) | ISBN 9780520965492 (epub)

Subjects: LCSH: Regression analysis—Mathematical models. | Regression analysis—Computer programs. | Social sciences—Statistical methods.

Classification: LCC HA31.3 .H64 2016 (print) | LCC HA31.3 (ebook) | DDC 519.5/36—dc23

LC record available at https://lccn.loc.gov/2016030975

Stata® is a registered trademark of StataCorp LP, 4905 Lakeway Drive, College Station, TX 77845 USA. Its use herein is for informational and instructional purposes only.

Manufactured in the United States of America

25 24 23 22 21 20 19 18 17 16

10 9 8 7 6 5 4 3 2 1

For Brandon

Preface

Acknowledgments

1 Review of Linear Regression Models

A Brief Introduction to Stata

An OLS Regression Model in Stata

Checking the Assumptions of the OLS Regression Model

Modifying the OLS Regression Model

Examining Effect Modification with Interaction Terms

Assessing Another OLS Regression Model

Final Words

2 Categorical Data and Generalized Linear Models

A Brief Review of Categorical Variables

Generalized Linear Models

Link Functions

Probability Distributions as Family Members

How Are GLMs Estimated?

Hypothesis Tests with ML Regression Coefficients

Testing the Overall Fit of ML Regression Models

Example of an ML Linear Regression Model

Final Words

3 Logistic and Probit Regression Models

What Is an Alternative? Logistic Regression

The Multiple Logistic Regression Model

Model Fit and Testing Assumptions of the Model

Probit Regression

Comparison of Marginal Effects: dprobit and Margins Using dy/dx

Model Fit and Diagnostics with the Probit Model

Limitations and Modifications

Final Words

4 Ordered Logistic and Probit Regression Models

The Ordered Logistic Regression Model

The Ordered Probit Model

Multiple Ordered Regression Models

Model Fit and Diagnostics with Ordered Models

Final Words

5 Multinomial Logistic and Probit Regression Models

The Multinomial Logistic Regression Model

The Multiple Multinomial Logistic Regression Model

The Multinomial Probit Regression Model

Examining the Assumptions of Multinomial Regression Models

Final Words

6 Poisson and Negative Binomial Regression Models

The Multiple Poisson Regression Model

Examining Assumptions of the Poisson Model

The Extradispersed Poisson Regression Model

The Negative Binomial Regression Model

Checking Assumptions of the Negative Binomial Model

Zero-Inflated Count Models

Testing Assumptions of Zero-Inflated Models

Final Words

7 Event History Models

Event History Models

Example of Survivor and Hazard Functions

Continuous-Time Event History Models with Censored Data

The Cox Proportional Hazards Model

Discrete-Time Event History Models

Final Words

8 Regression Models for Longitudinal Data

Fixed- and Random-Effects Regression Models

Generalized Estimating Equations for Longitudinal Data

Final Words

9 Multilevel Regression Models

The Basic Approach of Multilevel Models

The Multilevel Linear Regression Model

Checking Model Assumptions

Group-Level Variables and Cross-Level Interactions

Multilevel Generalized Linear Models

Multilevel Models for Longitudinal Data

Cross-Level Interactions and Correlational Structures in Multilevel Models for Longitudinal Data

Checking Model Assumptions

A Multilevel Poisson Regression Model for Longitudinal Data

Final Words

10 Principal Components and Factor Analysis

Principal Components Analysis

Factor Analysis

Creating Latent Variables

Factor Analysis for Categorical Variables

Confirmatory Factor Analysis Using Structural Equation Modeling (SEM)

Generalized SEM

A Brief Note on Regression Analyses Using Structural Equation Models

Final Words

Appendix A: SAS, SPSS, and R Code for Examples in Chapters

Section 1: SAS Code

Section 2: SPSS Syntax

Section 3: R Code

Appendix B: Using Simulations to Examine Assumptions of OLS Regression

Appendix C: Working with Missing Data

Notes

References

Index

PREFACE

About a dozen years ago I published a book titled Generalized Linear Models: An Applied Approach (Hoffmann, 2004). Based on a detailed set of lecture notes I had used for several years in a graduate sociology course, its goal was to provide a straightforward introduction to various regression models designed to answer research questions of interest to social and behavioral scientists. Although I did not keep track of how often the book was used in university and college courses, I did receive the occasional email asking me for assistance with the data sets used in the book or with matters of clarification. Moreover, it has garnered several hundred citations—admittedly a modest amount in these days of an ever-increasing number of print and online books and research articles—on Google Scholar, so I assume it has served as a useful guide to at least a few students, instructors, and researchers.

The book relied on the statistical software Stata® for most of its examples. As many readers may know, Stata is user-friendly yet sophisticated software for conducting quantitative research. I have now used Stata for more than 20 years and continue to be impressed by its evolution and depth. It contains a remarkable range of tools for applied statisticians and researchers from many disciplines. Yet its evolution has taken it far beyond what was presented in my earlier book.¹ Moreover, some of the material I provided there has been superseded by innovations in the statistical sciences. Therefore, I opted to entirely revamp the material presented in that book with the material presented in this book. Although there are similarities, this book takes a broader view of the types of models used by social and behavioral scientists. Whereas I was primarily concerned with generalized linear models (GLMs) such as linear, logistic, and Poisson regression in my earlier work, I’ve found that most researchers need a more general toolkit of approaches to data analyses that involve multiple variables.

In particular, I continue to see studies in my home discipline and related disciplines that rely on categorical or discrete variables: those that are grouped into nonoverlapping categories that measure some qualitative condition or count some phenomenon. Although most studies recognize the need to use specific models to predict such outcomes—or to use them as explanatory variables—there is no concomitant attention to many of the nuances in using such models. This includes models that fall under the general category of GLMs, but it also involves models, such as those designed for data reduction, that do not fit neatly under this category. Yet, I see the need for a more general treatment of categorical variables than is typically found in GLM and related books.

Therefore, this book provides an overview of a set of models for predicting outcome variables that are measured, primarily, as categories or counts of some phenomenon, as well as the timing of some event or events. At times, though, continuous outcome variables are used to illustrate particular models. In addition, Chapter 9 presents multilevel models, which are included since they have gained popularity in my home fields of sociology and criminology, and offer such an intriguing way of approaching socially relevant research questions. Finally, since data reduction techniques such as factor analysis are, implicitly, regression models designed to predict a set of variables simultaneously with one or more latent variables, the final chapter (Chapter 10) reviews these models, with sections that show how to estimate them with categorical variables. At the conclusion of each chapter, a set of exercises is provided so that readers may gain practice with the models. A solutions manual and the data sets are available on the publisher’s website (http://www.ucpress.edu/go/regressionmodels).

I have to admit, though, that I do not address an issue that has become more prominent in the social and behavioral sciences in recent years. Specifically, this book takes a traditional approach to modeling by relying on frequentist assumptions as opposed to Bayesian assumptions.² Frequentists assume that we may, theoretically, take repeated random samples from some predetermined population and make inferences to that population based on what we find in the samples. In general, they assume that the parameters—quantities representing something about a population—are fixed. Of course, in the social and behavioral sciences, we rarely have the luxury of taking repeated random samples so we usually rely on a single sample to infer characteristics of the population.

Bayesians typically assume that the data are fixed and the parameters may be described probabilistically. Bayesian statistics also differ from frequentist statistics in their focus on using prior beliefs to inform one’s statistical models. In frequentist statistics, we rely mainly on patterns in the available data to show statistical relationships. In Bayesian approaches, the data are combined with these prior beliefs to provide posterior beliefs that are then used for making inferences about some phenomenon. Frequentist and Bayesian statistical frameworks are often described, perhaps unfairly, as relying on objective versus subjective probability.

Although this presentation, like my earlier book, rests firmly on a frequentist foundation, I hope readers will consider in more detail the promise (some argue advantage) of Bayesian methods for conducting statistical analyses (see Gelman and Hill, 2007, and Wakefield, 2013; both books provide an impressive balance of information on regression modeling from Bayesian and frequentist perspectives). Perhaps, if I feel sufficiently energetic and up-to-date in a few years, I’ll undertake a revision of this presentation that will give equal due to both approaches to statistical analyses. But, until that time and with due apologies to my Bayesian friends, let’s get to work.

ACKNOWLEDGMENTS

I am indebted to many students and colleagues for their help in preparing this book and other material that presaged it. First, the many students I have taught over the past 20 years have brought wisdom and joy to my life. They have taught me much more than I have ever taught them. Some former students who were especially helpful in preparing this material include Scott Baldwin, Colter Mitchell, Karen Spence, Kristina Beacom, Bryan Johnson, and Liz Warnick. I am especially indebted to Kaylie Carbine, who did a wonderful job writing Appendix A, and to Marie Burt for preparing the index. I would also like to thank Seth Dobrin, Kate Hoffman, Stacy Eisenstark, Lia Tjandra, S. Bhuvaneshwari, and Mansi Gupta for their excellent assistance and careful work in seeing this book to print. Several confidential reviewers provided excellent advice that led to significant improvements in the book. My family has, as always, been immensely supportive of all my work, completed and unfinished, and I owe them more than mere words can impart. This book is dedicated to my son, Brandon, who lights up every life he comes across.

ONE

Review of Linear Regression Models

As you should know,¹ the linear regression model is normally characterized with the following equation:

Consider this equation and try to answer the following questions:

• What does the yi represent? The β? The x? (Which often include subscripts i—do you remember why?) The εi?

• How do we judge the size and direction of the β?

• How do we decide which xs are important and which are not? What are some limitations in trying to make this decision?

• Given this equation, what is the difference between prediction and explanation?

• What is this model best suited for?

• What role does the mean of y play in linear regression models?

• Can the model provide causal explanations of social phenomena?

• What are some of its limitations for studying social phenomena and causal processes?

Researchers often use an estimation technique known as ordinary least squares (OLS) to estimate this regression model. OLS seeks to minimize the following:

The SSE is the sum of squared errors, with the observed y and the predicted y (y-hat) utilized in the equation. In an OLS regression model² that includes only one explanatory variable, the slope (β1) is estimated with the following least squares equation:

Notice that the variance of x appears in the denominator, whereas the numerator is part of the formula for the covariance (cov(x,y)). Given the slope, the intercept is simply

Estimation is more complicated in a multiple OLS regression model. If you recall matrix notation, you may have seen this model represented as

The letters are bolded to represent vectors and matrices, with Y representing a vector of values for the outcome variable, X indicating a matrix of explanatory variables, and β representing a vector of regression coefficients, including the intercept (β0) and slopes (βi). The OLS regression coefficients may be estimated with the following equation:

A vector of residuals is then given by

Often, the residuals are represented as e to distinguish them from the errors, ε. You should recall that residuals play an important role in linear regression analysis. Various types of residuals also have a key role throughout this book. Assuming a sample and that the model includes an intercept, some of the properties of the OLS residuals are (a) they sum to zero (Σεi = 0), (b) they have a mean of zero (E[ε] = 0), and (c) they are uncorrelated with the predicted values of the outcome variable .

Analysts often wish to infer something about a target population from the sample. Thus, you may recall that the standard error (SE) of the slope is needed since, in conjunction with the slope, it allows estimation of the t-values and the p-values. These provide the basis for inference in linear regression modeling. The standard error of the slope in a simple OLS regression model is computed as

Assuming we have a multiple OLS regression model, as shown earlier, the standard error formula requires modification:

Consider some of the components in this equation and how they might affect the standard errors. The matrix formulation of the standard errors is based on deriving the variance-covariance matrix of the OLS estimator. A simplified version of its computation is

Note that the numerator in the right-hand-side equation is simply the SSE since or e1. The right-hand-side equation is called the residual variance or the mean squared error (MSE). You may recognize that it provides an estimate—albeit biased, but consistent—of the variance of the errors. The square roots of the diagonal elements of the variance–covariance matrix yield the standard errors of the regression coefficients. As reviewed subsequently, several of the assumptions of the OLS regression model are related to the accuracy of the standard errors and thus the inferences that can be made to the target population.

OLS results in the smallest value of the SSE, if some of the specific assumptions of the model discussed later are satisfied. If this is the case, the model is said to result in the best linear unbiased estimators (BLUE) (Weisberg, 2013). It is important to note that this says best linear, so we are concerned here with linear estimators (there are also nonlinear estimators). In any event, BLUE implies that the estimators, such as the slopes, from an OLS regression model are unbiased, efficient, and consistent. But what does it mean to say they have these qualities? Unbiasedness refers to whether the mean of the sampling distribution of a statistic equals the parameter it is meant to estimate in the population. For example, is the slope estimated from the sample a good estimate of an analogous slope in the population? Even though we rarely have more than one sample, simulation studies indicate that the mean of the sample slopes from the OLS regression model (if we could take many samples from a population), on average, equals the population slope (see Appendix B). Efficiency refers to how stable a statistic is from one sample to the next. A more efficient statistic has less variability from sample to sample; it is therefore, on average, more precise. Again, if some of the assumptions discussed later are satisfied, OLS-derived estimates are more efficient—they have a smaller sampling variance—than those that might be estimated using other techniques. Finally, consistency refers to whether the statistic converges to the population parameter as the sample size increases. Thus, it combines characteristics of both unbiasedness and efficiency.

A standard way to consider these qualities is with a target from, say, a dartboard. As shown in figure 1.1, estimators from a statistical model can be imagined as trying to hit a target in the population known as a parameter. Estimators can be unbiased and efficient, biased but efficient, unbiased but inefficient, or neither. Hopefully, it is clear why having these properties with OLS regression models is valuable.

FIGURE 1.1

You may recall that we wish to assess not just the slopes and standard errors, but also whether the OLS regression model provides a good fit to the data. This is one way of asking whether the model does a good job of predicting the outcome variable. Given your knowledge of OLS regression, what are some ways we may judge whether the model is a good fit? Recall that we typically examine and evaluate the R², adjusted R², and root mean squared error (RMSE). How is the R² value computed? Why do some analysts prefer the adjusted R²? What is the RMSE and why is it useful?

A BRIEF INTRODUCTION TO STATA³

In this presentation, we use the statistical program Stata to estimate regression models (www.stata.com). Stata is a powerful and user-friendly program that has become quite popular in the social and behavioral sciences. It is more flexible and powerful than SPSS and, in my judgment, much more user-friendly than SAS or R, its major competitors. Stata’s default style consists of four windows: a command window where we type the commands; a results window that shows output; a variables window that shows the variables in the data file; and a review window that keeps track of what we have entered in the command window. If we click on a line in the review window, it shows up in the command window (so we don’t have to retype commands). If we click on a variable in the variables window, it shows up in the command window, so we do not have to type variable names if we do not want to.

It is always a good idea to save the Stata commands and output by opening a log file. This can be done by clicking the brown icon in the upper left-hand corner (Windows) or the upper middle portion (Mac) of Stata or by typing the following in the command window:

This saves a log file to the local drive listed at the bottom of the Stata screen. To suspend the log file, type log off in the command window; or to close it completely type log close.

It is also a good idea to learn to use .do files. These are similar to SPSS syntax files or R script files in that we write—and, importantly, save—commands in them and then ask Stata to execute the commands. Stata has a do-editor that is simply a notepad screen for typing commands. The Stata icon that looks like a small pad of paper opens the editor. But we can also use Notepad++, TextEdit, WordPad, Vim, or any other text-editing program that allows us to save text files. I recommend that you use the handle .do when saving these files, though. In the do-editor, clicking the run or do icon feeds the commands to Stata.

AN OLS REGRESSION MODEL IN STATA

We will now open a Stata data file and estimate an OLS regression model. This allows us to examine Stata’s commands and output and provide guidance on how to test the assumptions of the model. A good source of additional instructions is the Regression with Stata web book found at http://www.ats.ucla.edu/stat/stata/webbooks/reg. Stata’s help menu (e.g., type help regress in the command window) is also very useful.

To begin, open the GSS data file (gss.dta). This is a subset of data from the biennial General Social Survey (see www3.norc.org/GSS+Website). You may use Stata’s drop-down menu to open the file. Review the content of the Variables window to become familiar with the file and its contents. A convenient command for determining the coding of variables in Stata is called codebook. For example, typing and entering codebook sei returns the label and some information about this variable, including its mean, standard deviation, and some percentiles. Other frequently used commands for examining data sets and variables include describe, table, tabulate, summarize, graph box (boxplot), graph dotplot (dot plot), stem (stem-and-leaf plot), hist (histogram), and kdensity (kernel density plot) (see the Chapter Resources at the end of this chapter). Stata’s help menu provides detailed descriptions of each. As shown later, several of these come in handy when we wish to examine residuals and predicted values from regression models.

Before estimating an OLS regression model, let’s check the distribution of sei with a kernel density graph (which is also called a smoothed histogram). The Stata command that appears below opens a new window that provides the graph in figure 1.2. If sei follows a normal distribution, it should look like a bell-shaped curve. Although it appears to be normally distributed until it hits about 50, it has a rather long tail that is suggestive of positive skewness. We investigate some implications of this skewness later.

kdensity sei

FIGURE 1.2

To estimate an OLS regression model in Stata, we may use the regress command.⁴ The Stata code in Example 1.1 estimates an OLS regression model that predicts sei based on sex (the variable is labeled female). The term beta that follows the comma requests that Stata furnish standardized regression coefficients, or beta weights, as part of the output. You may recall that beta weights are based on the following equation:

Whereas unstandardized regression coefficients (the Coef. column in Stata) are interpreted in the original units of the explanatory and outcome variables, beta weights are interpreted in terms of z-scores. Of course, the z-scores of the variables must be interpretable, which is not always the case (think of a categorical variable like female).

Example 1.1

The results should look familiar. There is an analysis of variance (ANOVA) table in the top-left panel, some model fit statistics in the top-right panel, and a coefficients table in the bottom panel. For instance, the R² for this model is 0.002, which could be computed from the ANOVA table using the regression (Model) sum of squares and the total sum of squares (SS(sei)): 2,002/1,002,219 = 0.002. Recall that the R² is the squared value of the correlation between the predicted values and the observed values of the outcome variable. The F-value is computed as MSReg/MSResid or 2,002/360 = 5.56, with degrees of freedom equal to k and {n − k − 1}. The adjusted R² and the RMSE⁵—two useful fit statistics—are also provided.

The output presents coefficients (including one for the intercept or constant), standard errors, t-values, p-values, and, as we requested, beta weights. Recall that the unstandardized regression coefficient for a binary variable like female is simply the difference in the expected means of the outcome variable for the two groups. Moreover, the intercept is the predicted mean for the reference group if the binary variable is coded as {0, 1}. Because female is coded as {0 = male and 1 = female}, the model predicts that mean sei among males is 48.79 and mean sei among females is 48.79 − 1.70 = 47.09. The p-value of 0.018 indicates that, assuming we were to draw many samples from the target population, we would expect to find a slope of −1.70 or one farther from zero about 18 times out of every 1,000 samples.⁶

The beta weight is not useful in this situation because a one z-score shift in female makes little sense. Perhaps it will become more useful as we include other explanatory variables. In the next example, add years of education, race/ethnicity (labeled nonwhite, with 0 = white and 1 = nonwhite), and parents’ socioeconomic status (pasei) to the model.

The results shown in Example 1.2 suggest that one or more of the variables added to the model may explain the association between female and socioeconomic status (or does it?—note the sample sizes of the two models). And we now see that education, nonwhite, and parents’ status are statistically significant predictors of socioeconomic status. Whether they are important predictors or have a causal impact is another matter, however.

Example 1.2

The R² increased from 0.002 to 0.353, which appears to be quite a jump. Stata’s test command provides a multiple partial (nested) F-test to determine if the addition of these variables leads to a statistically significant increase in the R². Simply type test and then list the additional

Enjoying the preview?

Page 1 of 1

Regression Models for Categorical, Count, and Related Variables: An Applied Approach

About this ebook

Dr. John P. Hoffmann

Related authors

Related to Regression Models for Categorical, Count, and Related Variables

Related ebooks

Social Science For You

Related podcast episodes

Related articles

Related categories

Reviews for Regression Models for Categorical, Count, and Related Variables

What did you think?

Book preview

Regression Models for Categorical, Count, and Related Variables - Dr. John P. Hoffmann

CONTENTS

Preface

Acknowledgments

Appendix A: SAS, SPSS, and R Code for Examples in Chapters

Appendix B: Using Simulations to Examine Assumptions of OLS Regression

Appendix C: Working with Missing Data

Notes

References

Index

PREFACE

ACKNOWLEDGMENTS

Review of Linear Regression Models