Where Does Logistic Regression Come From?

The question is really: why does logistic regression take the form that it does? Why is the link function, for that is what it is called in the world of generalized linear models, a sigmoid? How did this framework come about?

Let's go back to linear regression for a moment. The old model for linear regression is


Article content

where epsilon is a random normal variable centered at 0. As long as y is a continuous real variable this works fine. The exact relationship between inputs and outputs may not be linear, but it's often close enough. But what if y is a binary categorical (i.e. 0-1) variable. Then the output needs to be some probability between 0 and 1 and there is no way to guarantee that with the above formulas.

Enter the notion of odds, one of the few places where o gets used as a variable. The odds of something that happens with probability p is


Article content

There's no upper bound on this value. if p is very small, what we call long odds, then o can be arbitrarily large. How about this for a regression formula?


Article content

That still doesn't quite work since o can't be negative and there's no good way to force the right side to be positive. One little trick will get us back to o being any real number. Instead of odds, use log-odds.


Article content

From here, it's a matter of simple algebra to get to

Article content

which is the sigmoid function. A few ad-hoc modifications are done at this point. The epsilon term is dropped and the whole thing is reformulated as the logistic loss MLE problem for the probability p. But that was all a posteriori work. The original justification was simply to shoehorn a linear model (which was all their computing power could handle in those days) into a probability context.

To view or add a comment, sign in

More articles by Daniel Morton PhD

  • Mislabeled Data - Still Not as Bad as You'd Think

    This is a followup to a previous article about mislabeled data. https://www.

  • The Derivative of sin(x)

    How do you derive the derivative of sin(x). Most of the answers you're likely to come up with (i.

  • Mislabeled Data - Not as Bad as You'd Think

    Suppose I gave you a nice set of training data. Twenty features.

  • Claude and the TAs Nightmare

    Back in my teaching assistant days there were two types of homework I liked to grade. There was the rare student who…

  • Claude fails Sideways Arithmetic

    Sideways Arithmetic from Wayside School. I worked through that book when I was ten.

  • Two Issues About Object Detection Accuracy

    Object detection answers two questions simultaneously. What is it? And where is it? For the most part, computer vision…

  • ClaudeAI and ChatGPT try a Brain Tickler

    Today's NYT Brain Ticker. Add two W's to each word and anagram the result to get a new word: 1.

  • How to Lie About Model Accuracy

    I'm still looking at the Larch Casebearer Data. At this point I've produced four models that are at least close to best…

  • Another One Bytes the Dust

    At this point there's not even much to say. I think we all know that Claude, and to a lesser extent ChatGPT, can do…

  • Into the Forest I Go (Again) - Part 1

    This is an update of something I worked on a few years back. At the time Colab's pricing was still reasonable and…

Insights from the community

Others also viewed

Explore topics