Avoiding strategic mistakes in Data Science

Avoiding strategic mistakes in Data Science

One of the most important aspect of any Data Science project is avoiding mistakes. Most of the data science involves coding and developing algorithms, while deploying very complex algorithms and delivering product or service value trough these.

One of the best ways to minimize the probability for mistakes is having strategic thinking on avoiding these mistakes. Here are some strategies i am sharing on how to avoid the mistakes in data science (tested trough my experience).

  1. No matter how interesting the Data Science part might be for us, the data science is generally to be integrated with other non-data science parts of projects. Most data scientists focus only on data science. Integrating data science with non-data science parts can often be a challenge so understanding both of these can be a big competitive advantage for a data scientist.
  2. One of the most important aspects of Data Science is evaluating uncertainty. Uncertainty may originate from many sources. It is the general wisdom to say - the most important metrics to report around uncertainty are confidence and predictive intervals. What many don't take into account are the latent variables. This should be kept in mind. Another aspect i like to advise on being careful with is the data sample. No matter how large or well sampled the dataset, full data is almost never there and there will always be a percentage of uncertainty around this fact. Take all these aspects into account when evaluating uncertainty. Generally the probability theory and stochastic processes tell us that there will generally be more uncertainty than estimated using simple procedures such as calculating parametric confidence intervals.
  3. Double or multiple check your code, depending on the time available for this. Strategically its the best to plan more time and reserve specific resources for double or multiple checking the code. Especially in large projects, probability of typos in the code increases - make sure to recheck for these. In larger Data science teams a good strategy is also to have a specific role just for rechecking codes.
  4. When testing each part the code, functional segments should be tested in real world situation. Most data scientists focus on the, EDA, developing the model, model metrics etc. first, but my opinion is that the focus should be on functionality first. Why? Because functionality tests resemble empirical, real world tests and this is important. There are many situations where data science models have great metrics, but they take too much to train, require more resources that available, don't integrate well with other moving parts of the project etc. Its not just about the model, its about how well can the model produce empirical results.
  5. Theory is great but bringing the real value with data science models is even better. Data Scientist should always have in mind that any project even the most data driven ones, might have overall goals which can have different values, research, business, financial etc. Having the big picture in mind is very important. Producing the final real value of a data science model is what counts.



To view or add a comment, sign in

More articles by Darko Medin

Insights from the community

Others also viewed

Explore topics