This paper uses a case based study – “product sales estimation” on real-time data to help us understand
the applicability of linear and non-linear models in machine learning and data mining. A systematic
approach has been used here to address the given problem statement of sales estimation for a particular set
of products in multiple categories by applying both linear and non-linear machine learning techniques on
a data set of selected features from the original data set. Feature selection is a process that reduces the
dimensionality of the data set by excluding those features which contribute minimal to the prediction of the
dependent variable. The next step in this process is training the model that is done using multiple
techniques from linear & non-linear domains, one of the best ones in their respective areas. Data Remodeling
has then been done to extract new features from the data set by changing the structure of the
dataset & the performance of the models is checked again. Data Remodeling often plays a very crucial and
important role in boosting classifier accuracies by changing the properties of the given dataset. We then try
to explore and analyze the various reasons due to which one model performs better than the other & hence
try and develop an understanding about the applicability of linear & non-linear machine learning models.
The target mentioned above being our primary goal, we also aim to find the classifier with the best possible
accuracy for product sales estimation in the given scenario.