Where Does Logistic Regression Come From?
The question is really: why does logistic regression take the form that it does? Why is the link function, for that is what it is called in the world of generalized linear models, a sigmoid? How did this framework come about?
Let's go back to linear regression for a moment. The old model for linear regression is
where epsilon is a random normal variable centered at 0. As long as y is a continuous real variable this works fine. The exact relationship between inputs and outputs may not be linear, but it's often close enough. But what if y is a binary categorical (i.e. 0-1) variable. Then the output needs to be some probability between 0 and 1 and there is no way to guarantee that with the above formulas.
Enter the notion of odds, one of the few places where o gets used as a variable. The odds of something that happens with probability p is
There's no upper bound on this value. if p is very small, what we call long odds, then o can be arbitrarily large. How about this for a regression formula?
Recommended by LinkedIn
That still doesn't quite work since o can't be negative and there's no good way to force the right side to be positive. One little trick will get us back to o being any real number. Instead of odds, use log-odds.
From here, it's a matter of simple algebra to get to
which is the sigmoid function. A few ad-hoc modifications are done at this point. The epsilon term is dropped and the whole thing is reformulated as the logistic loss MLE problem for the probability p. But that was all a posteriori work. The original justification was simply to shoehorn a linear model (which was all their computing power could handle in those days) into a probability context.