Logistic Regression

Logistic regression is a statistical method for analyzing a dataset in which there are one or more
independent variables that determine an outcome. The outcome is measured with a dichotomous variable
(in which there are only two possible outcomes).
In logistic regression, the dependent variable is binary or dichotomous, i.e. it only contains data
coded as 1 (TRUE, success, pregnant, etc.) or 0 (FALSE, failure, non-pregnant, etc.).
The goal of logistic regression is to find the best fitting (yet biologically reasonable) model to describe the
relationship between the dichotomous characteristic of interest (dependent variable = response or outcome
variable) and a set of independent (predictor or explanatory) variables. Logistic regression generates the
coefficients (and its standard errors and significance levels) of a formula to predict a logit transformation of
the probability of presence of the characteristic of interest:
where p is the probability of presence of the characteristic of interest. The logit transformation is defined as
the logged odds:
and
Rather than choosing parameters that minimize the sum of squared errors (like in ordinary regression),
estimation in logistic regression chooses parameters that maximize the likelihood of observing the sample
values.
In statistics, logistic regression, or logit regression, or logit model[1] is a regression model
where the dependent variable (DV) is categorical. This article covers the case of a binary
dependent variablethat is, where it can take only two values, "0" and "1", which represent
outcomes such as pass/fail, win/lose, alive/dead or healthy/sick.
Logistic regression is used in various fields, including machine learning, most medical fields, and
social sciences. For example, the Trauma and Injury Severity Score (TRISS), which is widely
used to predict mortality in injured patients, was originally developed by Boyd et al. using logistic
regression.
Example: Probability of passing an exam versus hours of study[edit]

The reason for using Logistic Regression for this problem is that the dependent variable pass/fail
represented by "1" and "0" are not cardinal numbers. If the problem were changed so that pass/fail
was replaced with the grade 0100 (cardinal numbers), then simple regression analysis could be
used.
A group of 20 students spend between 0 and 6 hours studying for an exam.

How does the number of hours spent studying affect the probability that
the student will pass the exam?
The graph shows the probability of passing the exam versus the number of hours studying, with the
logistic regression curve fitted to the data.
Graph of a logistic regression curve showing probability of passing an exam versus hours studying
The logistic regression analysis gives the following output.
Coefficient Std.Error z-value P-value (Wald)
Intercept 4.0777 1.7610 2.316 0.0206
Hours 1.5046 0.6287 2.393 0.0167
The output indicates that hours studying is significantly associated with the probability of passing the
exam (p=0.0167, Wald test). The output also provides the coefficients for Intercept = -4.0777 and
Hours = 1.5046. These coefficients are entered in the logistic regression equation to estimate the
probability of passing the exam:
Probability of passing exam =1/(1+exp(-(-4.0777+1.5046* Hours)))
For example, for a student who studies 2 hours, entering the value Hours = 2 in the equation gives
the estimated probability of passing the exam of p = 0.26:
Probability of passing exam =1/(1 + exp((4.0777 + 1.50462))) = 0.26.
Similarly, for a student who studies 4 hours, the estimated probability of passing the exam is
p=0.87:
Probability of passing exam =1/(1 + exp((4.0777 + 1.50464))) = 0.87.
This table shows the probability of passing the exam for several values of hours studying.
Hours of study Probability of passing exam
1 0.07
2 0.26
3 0.61
4 0.87
5 0.97
Applications :
Adaptive websites
An adaptive website adjusts the structure, content, or presentation of information in response to
measured user interaction with the site, with the objective of optimizing future user interactions.
A model or models are created of user interaction using artificial intelligence and statistical methods.
Affective computing
Affective computing is the study and development of systems and devices that can recognize,
interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer
science, psychology, and cognitive science
Bioinformatics
Bioinformatics /ba.onfrmtks/ is an interdisciplinary field that develops methods
i
and software tools for understanding biological data. As an interdisciplinary field of science,
bioinformatics combines computer science, statistics, mathematics, and engineering to analyze and
interpret biological data.
Brain-machine interfaces
A braincomputer interface (BCI), sometimes called a mind-machine interface (MMI), direct
neural interface (DNI), or brainmachine interface (BMI), is a direct communication pathway
between an enhanced or wired brain and an external device. BCIs are often directed at researching,
mapping, assisting, augmenting, or repairing human cognitive or sensory-motor functions.
Classifying DNA sequences

Computational anatomy
Computer vision, including object recognition
Detecting credit card fraud
Game playing
Natural language processing (NLP) is a field of computer science, artificial intelligence,
and computational linguistics concerned with the interactions between computers and human
(natural) languages
Information retrieval
Internet fraud detection
Marketing
Machine perception
Medical diagnosis
Economics

Logistic Regression

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Logistic Regression

Hochgeladen von

Copyright:

Verfügbare Formate

Logistic regression is a statistical method for analyzing a dataset in which there are one or more

Example: Probability of passing an exam versus hours of study[edit]

A group of 20 students spend between 0 and 6 hours studying for an exam.

The logistic regression analysis gives the following output.

Coefficient Std.Error z-value P-value (Wald)

Intercept 4.0777 1.7610 2.316 0.0206

Hours 1.5046 0.6287 2.393 0.0167

Probability of passing exam =1/(1+exp(-(-4.0777+1.5046* Hours)))

Classifying DNA sequences

Das könnte Ihnen auch gefallen