Data Science advises that I learn the Hard Way

Data Science advises that I learn the Hard Way

Some harsh lessons from my data science journey

Introduction

In my very short time of working on data science projects, there are a few lessons that I’ve learnt that I think are worth sharing.

These are lessons that I didn’t learn when I was a data science student; some of these sound obvious, but they were never taught to me directly. Hopefully, in sharing these, you won’t have to learn these the hard way!

Double and Triple Check Your Numbers

No alt text provided for this image

When doing anything in data science, it’s important to double and triple check your numbers.

There are often a lot of technical details that data scientists have to work with. Thus, it can be easy to forget what’s important: Extracting useful and valid actionable insight from data.

If the numbers don’t make sense when working on a data science project, it’s very possible that this could invalidate everything. And if this keeps happening in a working environment, then no one will want to take your work seriously anymore.

I’ve made this mistake at work even for very basic tasks, and honestly, it’s quite embarrassing. My main solution to prevent this from happening is by being more paranoid at each step.

I’d rather be overly paranoid than realise at the end that I’ll have to check everything from the beginning again.

If your calculations or formulae are complex, I’d advise making some notes or comments on how they’re calculated. This gives you something to fall back on if anything goes wrong, and my future self has always been thankful for it when someone asks me how I got a particular figure.

Organise Your Projects and Code

No alt text provided for this image

I used to have a habit of diving headfirst into the data, which I think a lot of other data scientists can relate to. This can leave you with over-complicated code and a project that isn’t as well structured as it could be.

In the early stages of a project, this might be fine. But everything should definitely be refined later. I’ve done plenty of projects where I’ve no clue what I did each time I revisit them. This wastes your time and anybody else’s time should they work on the same project.

The code should eventually be simplified with plenty of comments and notes to explicitly show what’s happening. If someone looks through your work, they should have a good idea of what you were trying to accomplish without too much difficulty.

The structure of your project also needs to be organised in a way that makes sense, especially if multiple people are working on the same project. There shouldn’t be a single folder where notebooks and CSV files are mixed together with random scripts. Keep each part of the project separated into its own folders as much as possible.

For more serious projects, the code needs to be modularised. If there are functions that are often re-used, don’t leave them in a notebook: create a separate folder that stores these functions in a single location. This lets you and others import functions as needed, and it becomes easier to keep consistency if a function needs tweaking.

Question the Question

No alt text provided for this image

When there’s data available, it’s often the question you ask that determines what you’ll do with the data.

A lot of the time, if people know you’re a data scientist, they’ll want to ask a lot of data questions out of curiosity. However, not every question is worth taking the time to answer.

What I’ve learnt is it’s always good to question the question. Why are they asking this? Is it worth looking into from a business perspective? Is this aligned with my own objectives?

It’s a good habit to get people to explain why it’s beneficial to look into something. If there’s no clear value compared to your other priorities, then it’s better to politely decline.

The reality is, most of a data scientist's time consists of data cleaning. It takes a long time just to get the data in the correct format to arrive at a single answer. If you say yes to too many things without questioning the question, you’ll most likely end up burdening yourself with too many requests. This is something that I’ve struggled with since the beginning, and it’s gotten better by paying closer attention to my own priorities and objectives.

Conclusion

As we walk our own data science journey, it’s inevitable that we make mistakes along the way. Hopefully, by taking what I’ve learnt in my short time as a data scientist, you’ll improve much faster than I did.

To view or add a comment, sign in

More articles by Ayush Chauhan

Insights from the community

Others also viewed

Explore topics