Do you know popular approaches for automatic power transforms?

Do you know popular approaches for automatic power transforms?

Many #machinelearning algorithms prefer or perform better when numerical variables have a #Gaussian probability distribution. Some algorithms like #linearregression and #logisticregression explicitly assume the real-valued variables have a Gaussian distribution. Other nonlinear algorithms may not have this assumption, yet often perform better when variables have a Gaussian distribution.

There are data preparation techniques that can be used to #transform each variable to make the distribution Gaussian, or if not Gaussian, then more Gaussian-like. These transforms are most effective when the data distribution is nearly-Gaussian to begin with and is afflicted with a skew or outliers. I know two popular approaches for such automatic #powertransforms; they are:

1.    #BoxCox Transform

2.    #YeoJohnson Transform

I will be happy if you have any experience in this regard, share it with me.

Arash Bakhtiary

Data Science | Applied AI | Machine Learning | LLMs & Generative AI | Expert in Python, MLOps, Docker and Kubernetes

2y

It should be noted that, unlike the Box-Cox transform, Yeo-Johnson Transform does not require the values for each input variable to be strictly positive.

Like
Reply

To view or add a comment, sign in

More articles by Arash Bakhtiary

Insights from the community

Others also viewed

Explore topics