How can a data scientist bring value?

I often get asked this wonderful question by aspiring minds: “how can a data scientist bring value?” , and often a more explicit variant “isn’t data science quickly getting commoditised?”

Summing up my experience so far, the answer is like an onion.

Outermost layer of the onion: You are given a standard problem. You use off-the-shelf tools to churn out standard analytics and run-of-the-mill reports with minimal coding or algorithmic intervention. This layer is already commoditised, and as a DS you add little value if your work is restricted to this layer. Although, this isn’t a bad place to start your career as a DS.

Second layer: You are given a unique data-science problem along with the data necessary to solve it. You unleash your creativity and build a solution for it after iterative experimentation, good dose of learning, common sense, perseverance and disciplined coding. You may draw upon your existing technical skills, or you may pick up anew along the way. The end result may be an algorithm that goes into a product, or may be an insight that forms core of a sought-after report. The core value you bring is through deep technical experiments in developing the algorithm or extracting the valuable insights or both. Majority of data science projects would fall in this category. Upwork, Elance, Experfy and their kinds have began commoditising this layer in last few years.

Third layer: You are given a business problem, or often you discover a business problem yourself after talking with customers and domain experts. You translate the business problem to a DS problem. You gather, clean, format and prepare the data (this is almost always the longest step in the process). You build solution for the problem using data science wizardry. You translate the solution / insights back to the business context. You communicate the solution with the right people in the right way, which help make your solution go live. You follow through until it delivers the impact it was designed to deliver to the end user. In this layer, in addition to your tech wizardry (as in Layer 2), you draw upon your domain knowledge, communication skills, attention to detail and discipline. The core value you bring here is in translating from and to the business context, data munging, and following the solution through till the end.  This layer isn’t at risk of getting commoditised any time soon.

Fourth layer: You may have been given a business problem, or you discover it yourself. You may have gotten shining clean data on a silver platter, or you toil through the data gathering and preparation exercise as usual. You may know very well the data science technique that need to be applied, or you may pick them up on the fly.

Those are mere details.

What makes this layer most valuable is the fact that the insights / outcomes of your work clash in powerful ways with your stakeholders' expectations / world view. Example: Your company culture takes pride in doing things a certain way, but your experiments show conclusively that an alternative is more beneficial to the company. Your client has plotted the manufacturing strategy while implicitly betting on success of material/process A but looking at the data you find that the alternative B works much better. You are a statistician auditor hired by a Big Pharma to get an external evaluation of effectiveness a new pill that they have invented, but when you crunch the numbers its proves no more effective than a placebo. You are told to quantify impact of a proposed acquisition that the CEO is very gung-ho about, but your forecast indicates that the acquisition may be a bad move.

The opportunity for you is not only in communicating the troublesome findings, but in ensuring that they are embraced by the people at the helm and result in appropriate action. To achieve that you need to draw upon not only your technical and communication skills, but also on your courage, compassion, perseverance and faith in your ability to bring good. The very qualities that make us human. Here you deliver your value more by emotional labor and less by technical / skilled labor. Also, while most people and companies want to be data driven, they are usually driven by stories crafted around the data than the data itself. If you just show the data, smart people will quickly craft a story around it that fits their existing world view. So you need to tell the right stories at the right time and in the right way to drive the message home. Of course this is art not science and of course you may often fail in driving the message home. The inherent conflict, friction, uncertainty, risk, emotional labor is what makes this opportunity least pursued and yet this layer the most valuable. I will assert that this layer is *never* getting commoditised -- at least not until robots have the same EQ as humans.

Raja Sengupta

People Analytics Change Leader

8y

Defiantly Kalpit Energy Disaggregation, temperature monitoring are prime candidate for near complete automation or product ( as you refer ) However the extent of possible automation is domain specific, don’t your feel so. The explicitly available data cant always account for the required variance. Additional feature engineering can help so much in mapping noise. The rigid cross validation needs to be extremely robust here, a critical for automation other wise they at best are semi automated support systems. The ongoing challenges with driver-less cars and fly by wire autopilots attempting to oust human pilots out of cockpit are key examples of the manifestation of this problem As you referred, the best decisions are often based on "relevant" statistical hypothesis testing of representative data, under the guidance of a process domain expert. Coming from an industrial statistics background this is something that i have closely experienced Ofcouse no template here and the optimal approach is completely requirement specific. PMML is a very effective tool when used as a documentation medium for KDD work flows. Additionally innovative measures like replacing the PMML tags with "SQL like" tags manually superimposing the data treatment not covered yet by PMML and combining it with a descriptive statistics report of the treated data-set make it into a good knowledge repository Again the customized knowledge repository can be treated to computational linguistics classification itself, making it into an effective ontology ( a productivity tool in essence ) However as a standards exchange medium I completely agree, PMML as a long way to go before it can be of practical relevance in the production environment. there are many problems ( and severe limitations ) here. IBM SPSS Modeler had been in lead and a pioneer in this PMML space, KNIME of lately in the open space analytics platform. PMML is more oriented towards the component based work-flow environment, it continues to evolve.

Like
Reply
Raja Sengupta

People Analytics Change Leader

9y

Thanks for the comprehensively covered article Kalpit. I have a feeling with humongous number of onions peeled over the last half a decade or so there is a sufficiently ( and ever increasing ) repository of documented ontologies. PMML standard is a good example ( ofcourse a lot of shortcomings as yet! ) This in the coming days should make a typical analytics project life-cycle much shorter. On the other hand this might also reduce the intake of data science professorial in absolute numbers. Shouldn't it imply a logical shift in focus from data science to decision sciences, which I think your third layer is about. Dream-weaver ( along with thousands of preconfigured 3rd party templates ) literally reduced by half ,the demand for HTML and java script programmers in the early part of this century Might be an unusual thought, but I also I also feel that quantitative economics as a discipline ha-sent penetrated the data science field as much as it as ideally should have. Feel it to be one of the key reason for the disproportionate number of failures among data science enabled start-up, particularly in India.

Like
Reply
Deepak Bandyopadhyay

Data Science Group Leader | Experience across R&D pipeline | Innovator

9y

Very good article, and thanks for these insights, they will be valuable to those considering this field!

Like
Reply
Prashant Sachdev

Co-Founder @ NextBurb, Reforge & GrowthX Community Member, Angel Investor @ Startups (US & India)

9y

very interesting insights and clarification. the art of Data Science will be very valuable skills to serve many businesses in coming decade

Like
Reply

To view or add a comment, sign in

More articles by Dr Kalpit Desai

  • Choosing to become remarkably different

    This post from Steve Dennis struck a chord. An excerpt: In the inevitable battle between denial, defending the status…

    2 Comments

Insights from the community

Others also viewed

Explore topics