Today's "data engineer"
I get offers to apply for "data engineer"/equivalent position almost weekly through Linkedin and read those of course through... seems like roles "data engineer" or "machine learning engineer" are varying a lot! (let's not include architect/lead developer positions to make subject even more complex).
What does these roles mean? What kind of skills are needed for these roles? And on the other hand as applicant how do you know is certain position interesting enough for you?
Problem is that these roles can mean anything. I mean ANYTHING:
- "Legacy" SQL/ETL-developer with or without cloud knowledge (MS SSIS, Informatica, DW/db, Snowflake, Redshift)
- "Modern" Software engineer with or without cloud-native -tools experience (microservices, etc.)
- "Legacy" bigdata engineer (hadoop, Java, hbase, etc.)
- Data engineer with SQL-knowledge & coding experience (Spark, Python/Pandas, whatever)
- Data engineer with cloud-native -tools experience (AWS Glue, Azure Data factory, etc.)
- Partly Devops -engineer with/without cloud experience (Ansible, Puppet, Helm-charts, Kubernetes, Terraform/resource creation)
- Streaming data engineer with/without cloud-native -experience (Kafka, Flink, AWS Kinesis, NoSQL-db's, etc.)
- ML-engineer with enough algorithm knowledge to be able to support/productionize model inference with/without cloud experience in realtime/batches (tech might consist all above).
Then there are the hidden skills:
- Basic continuos deployment/integration standards/tools (git, jenkins)
- Software testing (unit, regression, etc.)
- Data validation / analysis (sql-like skills)
- Communication skills
- Business understanding
In our current team we expect data engineers to handle something from each role and most of the hidden skills.
This means two things:
- generalists are not enough.
- one technology specialists are not enough.
Of course recruitment gets really hard. I think key to success is to hire a team which has good balance between these areas: We need developers who already knows something well and are able to learn rest eq. generalists with specialist background. In our current team we have backgrounds like ETL/spark-specialist, devops specialist, longterm software engineers, software engineers with ML-background, hadoop developer.. but this works if and only if communication is working. Problems got solved much faster.
I think when you have done solutions for all/most of these roles you can consider yourself as good data engineer. Also I think that person shouldn't start doing realtime analytics without knowing how to do it with batches/microbatches. "Slower" means mostly also "cheaper".
Now read list of roles again. It looks like an ordered list, right? :)
There is a long way from ETL-developer to ML-engineer... and gap gets bigger all the time.
Process~Data~Analytics~Viz 🤸
5yI agree! It is not a secret that many job postings on data analytics job have mixed up roles and requirements. Your list could be represented as a tree of roles with optional branches. Such a graph would help juniors/students/pros to review their career and skillset. I bet that recruiters would like it as well!
Cloud Architect and Sr. Developer | 9xAWS certified | Azure
5yAre you sure that you want a massively proficient ML specialist mixed with DevOps? Some jobs you listed have a requirement of at least a diploma from a good university and some are... not so much. Sure, it is convenient to have one-stop-shop of a specialist, but if you'd waste a professor's level specialist configuring some VMs... then most likely next month he will be doing something advanced for your compeition, plain and simple. Recruitment may be hard. But it is a first step in crediting trust in each other in some hope to arrive in healthy working relationship when the time passes. When person is hired, s/he is not "all yours". Instead, you test each other and employee tests you thoughout the entire career step. Granted, some similar-level skills can be a great mix, and granted, good specialist should have idea of most the bits and get things working on his/her own if really needed, but if you are tweaking the balance and asking your seniors to spend massive portion of their time on certain techincal, you are sending them a message that you'd like to lose them. Because if a person has an inclination for certain kind of activity - you'd like to keep him/her engaged.
Data and Business Advisor & Strategist | Solopreneur
5yVery good summary of the diversity of technologies within the field! Back in the old days it was simple as BI was ETLs and reports, programming was C++/Java and ML done in universities only. Today everything is mixed and that's a challenge for any techie to keep up with the development of technologies