Today's "data engineer"

I get offers to apply for "data engineer"/equivalent position almost weekly through Linkedin and read those of course through... seems like roles "data engineer" or "machine learning engineer" are varying a lot! (let's not include architect/lead developer positions to make subject even more complex).

What does these roles mean? What kind of skills are needed for these roles? And on the other hand as applicant how do you know is certain position interesting enough for you?

Problem is that these roles can mean anything. I mean ANYTHING:

  • "Legacy" SQL/ETL-developer with or without cloud knowledge (MS SSIS, Informatica, DW/db, Snowflake, Redshift)
  • "Modern" Software engineer with or without cloud-native -tools experience (microservices, etc.)
  • "Legacy" bigdata engineer (hadoop, Java, hbase, etc.)
  • Data engineer with SQL-knowledge & coding experience (Spark, Python/Pandas, whatever)
  • Data engineer with cloud-native -tools experience (AWS Glue, Azure Data factory, etc.)
  • Partly Devops -engineer with/without cloud experience (Ansible, Puppet, Helm-charts, Kubernetes, Terraform/resource creation)
  • Streaming data engineer with/without cloud-native -experience (Kafka, Flink, AWS Kinesis, NoSQL-db's, etc.)
  • ML-engineer with enough algorithm knowledge to be able to support/productionize model inference with/without cloud experience in realtime/batches (tech might consist all above).

Then there are the hidden skills:

  • Basic continuos deployment/integration standards/tools (git, jenkins)
  • Software testing (unit, regression, etc.)
  • Data validation / analysis (sql-like skills)
  • Communication skills
  • Business understanding

In our current team we expect data engineers to handle something from each role and most of the hidden skills. 

This means two things:

  • generalists are not enough.
  • one technology specialists are not enough.

Of course recruitment gets really hard. I think key to success is to hire a team which has good balance between these areas: We need developers who already knows something well and are able to learn rest eq. generalists with specialist background. In our current team we have backgrounds like ETL/spark-specialist, devops specialist, longterm software engineers, software engineers with ML-background, hadoop developer.. but this works if and only if communication is working. Problems got solved much faster.

I think when you have done solutions for all/most of these roles you can consider yourself as good data engineer. Also I think that person shouldn't start doing realtime analytics without knowing how to do it with batches/microbatches. "Slower" means mostly also "cheaper".

Now read list of roles again. It looks like an ordered list, right? :)

There is a long way from ETL-developer to ML-engineer... and gap gets bigger all the time.


Timo I.

Process~Data~Analytics~Viz 🤸

5y

I agree! It is not a secret that many job postings on data analytics job have mixed up roles and requirements. Your list could be represented as a tree of roles with optional branches. Such a graph would help juniors/students/pros to review their career and skillset. I bet that recruiters would like it as well!

Askar Ibragimov

Cloud Architect and Sr. Developer | 9xAWS certified | Azure

5y

Are you sure that you want a massively proficient ML specialist mixed with DevOps? Some jobs you listed have a requirement of at least a diploma from a good university and some are... not so much. Sure, it is convenient to have one-stop-shop of a specialist, but if you'd waste a professor's level specialist configuring some VMs... then most likely next month he will be doing something advanced for your compeition, plain and simple. Recruitment may be hard. But it is a first step in crediting trust in each other in some  hope to arrive in healthy working relationship when the time passes. When person is hired, s/he is not "all yours". Instead, you test each other and employee tests you thoughout the entire career step. Granted, some similar-level skills can be a great mix, and granted, good specialist should have idea of most the bits and get things working on his/her own if really needed, but if you are tweaking the balance and asking your seniors to spend massive portion of their time on certain techincal, you are sending them a message that you'd like to lose them. Because if a person has an inclination for certain kind of activity - you'd like to keep him/her engaged.

Like
Reply
Markus Ansamaa

Data and Business Advisor & Strategist | Solopreneur

5y

Very good summary of the diversity of technologies within the field! Back in the old days it was simple as BI was ETLs and reports, programming was C++/Java and ML done in universities only. Today everything is mixed and that's a challenge for any techie to keep up with the development of technologies

To view or add a comment, sign in

More articles by Olli Salakari

  • Spark+AI Summit day 2.

    Okay second day then. I skipped the keynotes for today and came only for the tech sessions.

    1 Comment
  • Spark+AI Summit 2018: Day 1

    First of all this is my first visit to any international tech conference (Slush not counted here)..

    2 Comments
  • Technical competence and how I got it back

    A year ago my calendar was full of meetings. Too much projects, too much micromanaging, too much legacy.

    12 Comments

Insights from the community

Others also viewed

Explore topics