Introduction to Data Science

Introduction to Data Science

Data science is a process to extract value from data in all its forms. Under this process, data from all its forms is compared and fine data is fetched for further action.

It basically refers to the collective processes, scientific theories, technologies, analysis, knowledge base, and tools.

Using Data Science approach, scientists apply machine learning algorithms to numbers, text, audio, video, images and more to produce artificial intelligence systems.

There is an overall process of data science where engineering is performed over the raw data by manipulating and cleansing it to make it valuable and then validated the model is deployed by using this processed data.

Data Science Pipeline

  • Obtaining the data

Data science can’t do anything without data. There is some thumb rule that must be taken into consideration while obtaining the data.

Must identify all of the available datasets which can be from the internet or external/internal databases, then extract the data into a usable format such as .csv, json, XML, etc.

  • Scrubbing / Cleaning of data

This phase of the data science pipeline requires more efforts and time. Most of the time data come with its own anomalies.

So it becomes very important that we do a cleanup exercise and take only the information which is important to the problem asked because the results and output of your machine learning model are only as good as what you put into it.

No alt text provided for this image
  • Exploring / Visualizing the data will help us to find patterns or trends.

Now during the exploration phase, we try to understand what patterns and values our data has. We’ll be using different types of visualizations and statistical testings to back up our findings.

This is where we will be able to derive hidden meanings behind our data through various graphs and analysis.

  • Modeling the data will give us the predictive power

Machine learning models are generic tools. You can access many tools and algorithms and use them to accomplish different business goals. The better features you use the better your predictive power will be.

After cleaning the data and finding out the features that are most important for a given business problem by using relevant models as a predictive tool will enhance the decision-making process.

  • Interpreting our data

Interpreting the data is more like communicating your findings to the interested parties. If you can’t explain your findings to someone believe me, whatever you have done is of no use. Hence, this step becomes very crucial.

The objective of this step is to first identify the business insight and then correlate it to your data findings.

Javier Sanabria IT Director

IT Latin America / Center of Excellence / Senior Director / IT Director / Senior Manager / IT Manager / IT Service Delivery Manager / Business Intelligence Manager / IT Strategy Manager / Project Manager Officer

6y

As good as is it for me, I like your synthesis skills. And, what do you think about the Data Lakes, is this a concept arriving from Azure/Microsoft covering data science gaps?

Like
Reply

To view or add a comment, sign in

More articles by Sonal V.

Insights from the community

Others also viewed

Explore topics