What is Data Science?
Data science is a field of study and practice that’s focused on obtaining insights from data
Practitioners of data science use programming skills, statistics knowledge, and machine learning techniques to mine large data sets for patterns that can be used to analyze the past or even predict the future.
Math Meets Programming: A Quick History
To get a better idea of what data science really is, it’s helpful to take a quick look at where it comes from. In many ways, data science is the result of a merger between two fields that have been around for decades: statistics and computer science.
Statisticians, of course, have been crunching numbers for centuries. But the dawn of computer science in the mid 20th century provided statisticians with a new tool for analyzing data faster than had previously been possible.
As early as the 1960s, statisticians like John W. Tukey were theorizing about how computers could revolutionize the field, but their impact at the time was minimal — they were simply too slow and too expensive. In the 1980s, the rise of personal computers made digital data collection possible, and companies started collecting what they could. By the 1990s, some were successfully making use of that data to design marketing strategies. Analyzing these new digital data sets data required both the statistics knowledge of a statistician and the programming skills of a computer scientist.
By the early 2000s, thanks in part to the advent of the internet, many companies had access to mountains of data. At the same time, computer processing power had advanced to the point that complex analyses of huge data sets was possible, and more advanced techniques like predictive analytics with machine learning were coming into reach.
Both business and academia began to recognize the value of having experts with the programming skills required to collect, manipulate, and analyze digital data and the statistics skills required to select the type of analysis needed to accurately answer questions and gain meaningful insights. “Data Science,” a term that had been around for decades by that point, became the mainstream phrase of choice to describe this confluence of skills.
What Do Data Scientists Do?
In day to day work, data scientists are often responsible for everything that happens to data, from collecting it all the way through analyzing it and reporting on the results. Although every data science job is different, here’s one way to visualize the data science workflow, with some examples of typical tasks a data scientist might perform at each step.
It works like this:
Capture data. For example: pulling the data from a company database, scraping it from a website, accessing an API, etc.
Manage data. For example: properly storing the data, and will almost always involve cleaning the data.
Exploratory Analysis. For example: performing different analyses and visualizing the data in various ways to look for patterns, questions, and opportunities for deeper study.
Final Analysis. For example: digging deeper into the data to answer specific business questions, and fine-tuning predictive models for the most accurate results.
Reporting. For example: presenting the results of analysis to management, which might include writing a report, producing visualizations, and making recommendations based on the results of analysis. Reporting might also mean plugging the results of analysis into a data product or dashboard so that other team members or clients can easily access it.
All of that said, what data scientists do from day to day can vary tremendously, in no small part because different companies make use of data science in different ways.
How Businesses Benefit From Hiring Data Scientists
At the highest level, data science is what allows companies to convert data into actual business value.
Consider, for example, a specialty ecommerce retailer. Such a company might get tens of thousands of pageviews every day, and hundreds of orders. With each pageview it can use automated tools to collect a large amount of data about who the visitors are and what actions they’ve taken on the site. And with each order, their sales system can easily collect a variety of data points about actual customers.
This data piles up quickly, however, and it doesn’t have any inherent value on its own. To get value from it, the company will need to analyze it, looking for patterns and insights that suggest future business strategies and tactics. The more actionable, forward-looking insight this data provides, the more value it has to the company.
Because there are many different types of data companies can collect, there are a wide variety of ways data scientists can add value. Here are just a few examples of how data science adds value at businesses across the globe:
Improving decision-making — data science gives management actionable intelligence that leaders can use to shape short- and long-term strategies.
Improving hiring — data science can help more objectively evaluate candidates and root out inefficiencies and biases.
Predicting the future — using machine learning algorithms, data scientists can find patterns in data that humans would not be able to, and forecast future results with a higher level of accuracy.
Improving targeting — data science can help companies find new target markets, better understand existing customers, and more accurately predict what customers want.
Identifying new opportunities — by exploring data and looking for patterns, data scientists can identify new business opportunities that might not otherwise be apparent.
Improving risk assessment — data science often makes it possible to “test” risky ideas by running the numbers before putting them into action, allowing companies to avoid potentially costly risks and mistakes.
Fostering data-first culture — a data scientist or data science team can help facilitate data-based decision-making in every team across the company by providing them with data tools like dashboards and the training necessary to understand them..