Do you know popular approaches for automatic power transforms?
Many #machinelearning algorithms prefer or perform better when numerical variables have a #Gaussian probability distribution. Some algorithms like #linearregression and #logisticregression explicitly assume the real-valued variables have a Gaussian distribution. Other nonlinear algorithms may not have this assumption, yet often perform better when variables have a Gaussian distribution.
There are data preparation techniques that can be used to #transform each variable to make the distribution Gaussian, or if not Gaussian, then more Gaussian-like. These transforms are most effective when the data distribution is nearly-Gaussian to begin with and is afflicted with a skew or outliers. I know two popular approaches for such automatic #powertransforms; they are:
1. #BoxCox Transform
2. #YeoJohnson Transform
I will be happy if you have any experience in this regard, share it with me.
Data Science | Applied AI | Machine Learning | LLMs & Generative AI | Expert in Python, MLOps, Docker and Kubernetes
2yIt should be noted that, unlike the Box-Cox transform, Yeo-Johnson Transform does not require the values for each input variable to be strictly positive.