Unbearable Lightness of Being a Data Scientist

"In general, we look for a new law by the following process. First, we guess it (audience laughter), no, don’t laugh, that’s really true. Then we compute the consequences of the guess, to see what, if this is right, if this law we guess is right, to see what it would imply and then we compare the computation results to nature, or we say compare to experiment or experience, compare it directly with observations to see if it works. If it disagrees with experiment, it’s wrong. In that simple statement is the key to science. It doesn’t make any difference how beautiful your guess is; it doesn’t matter how smart you are who made a guess, or what his name is… If it disagrees with experiment, it’s wrong. That’s all there is to it.”

 I first came across this definition while first-year economics and political science student. Since then numerous times, much more than I would have wanted to, I have been involved in rather pointless philosophical arguments, if this or the other discipline science or not. However, the definition of a scientific method provided by Richard Feynman always served me as a good guideline to sanity through many different situations I have encountered.

   While this description is right, one, quite often, has a problem in applying it in a non-academic environment. With the rise of the digital world, marketing and digital products living the scientific revolution which is in a way similar to the one financial services went through about 20 years ago. The Quant jocks methods brought to financial trading by PhD's in physics and mathematics some time ago now are coming to the marketing world. Spending about 20 years of my professional career in different startups within internet segment, I have been lucky enough to see the progress from the first computer applications to the advanced machine learning algorithms. Along with it, the introduction of new disciplines to Data Science from Applied Mathematics, Computer Science, Particle Physics just to name the few. The change I have seen through these years is breathtaking. Data Science has made a huge leap from merely being one of the areas of econometrics/statistics to a discipline spanning across several scientific branches with numerous applications and billions of dollars of revenues. However, as one well-known biological mutation has said with great power comes great responsibility, and for me, one of the areas where scientists must have an apparent and distinct pattern of behaviour is the application of the scientific method.

So where is the problem lies with something as precise as scientific method although in an entirely different environment far from academic lab or library? Well, first of all, we always have to remember that we are scientists and our work governed by the definition of the scientific method. In practice and real life, it means several rules to follow.

There are no shallow questions. When you are approached by this or the other stakeholder with an issue that seems shallow to you, remember that this person isn't a scientist and doesn't pretend to be one. However, they have a right for a guess, which you as a scientist must be able to prove or disprove through your carefully designed experiment. Things had to be written most carefully and added to the specification of a test. Quite often your stakeholders wouldn't write a proper spec for what should be done, however, as a scientist, you can do it yourself and then send an email asking if that is what they meant. Data scientists quite often complain that industry asks from them a quick answer, which contradicts with the concept of academic research. From the other side, stakeholders complain about too much time taken by data scientists to get a simple answer. The truth is always in the middle and as I like to say the balance is the art. Again from my personal experience things are much simpler than they are portrayed, and what seems like a procrastination is quite often just not properly communicated estimated time of accomplishment. Stakeholders are ready to wait for a 'right' answer, but they want to know when it arrives. Equally, scientists must be given a possibility to learn and apply new methods if they believe that these methods might bring better results, reduce uncertainty or help to obtain answers faster in future. Stakeholders have to remember if scientists don't learn anything they wither like roses without water and then again if there is nothing left to learn then one doesn't need a scientist anymore. In any case, this process must be communicated and mutually agreed regarding time and resources.      

Quite often, data scientists complain that they are treated as data monkeys, and all that is required from them is just to get the data. That is the matter which solemnly depends on one as a scientist. Wherever you approached with a request to obtain this or the other dataset, try to find out what for this data is required. First of all, you might suggest different dataset, second during the discussion you might raise important points, which might alter the request and the third accompany the dataset with a spec and conclusion of your own. You are a scientist and you are paid for providing analysis so do it don't just send out a dataset and then cry that you have been treated as a data monkey. The other important thing no matter what kind of research you have done even the most simple one, even if it is only one slide/email it always must bear your name and have conclusion remember - always. You are a scientist you have to take credit or stick for what you are doing. By submitting a slide with a chart describing this or the other dataset without conclusion and your name on it, you are treating yourself as data monkey. The rules are simple, and they all come from the definition of the scientific method you don't need to break your head thinking about you don't need to reinvent the wheel it is all there just follow it.

To view or add a comment, sign in

More articles by Michael Leznik

Insights from the community

Others also viewed

Explore topics