Operational Excellence - #1
Building high-quality software requires rigor and a firm commitment to excellence. My deep belief in the importance of quality assurance (QA) drives my teams to implement robust practices to ensure their products meet the highest standards before reaching customers. To maintain the highest bar of quality, my teams must employ diverse mechanisms: a comprehensive and robust workflow, a multitude of tests (unit and end-to-end), meticulous reviews (code, security, and performance), and bug COE (Correction Of Error).
I have been working and iterating with my teams on this topic during the last years and come up with an approach that works for us. Let's look closer at the workflow we adhere to. It is divided into distinct steps which are covering:
At first sight, this workflow may appear complicated and time-consuming, but this investment pays off daily. Thanks to this commitment to quality, we are continuously improving our software while encountering less than one issue per week among thousands of active users. This workflow also allows us to anticipate potential incoming problems or challenges to overcome.
Recommended by LinkedIn
Fixing issues is a major point which is considered in our workflow. Issues arise from various sources (e.g. users and ticketing systems). To ensure issues are fixed and prevent recurrence, we meticulously describe each issue, identify the scenarios and impacts, and schedule meetings with the involved engineers to discuss the COE. The main points are: What happened? Why? And how to avoid this happening again? Our aim is to identify the root cause, create generic solutions, and reduce the number of similar issues permanently. Identifying issues is a key point in our quest for quality, especially when our aim is to identify and address them before our customers do. To achieve this, we orchestrate monthly BugBash sessions, where the entire team collaborates to “break” the application. We've found this team-building exercise to foster team cohesion while purposefully challenging our product's integrity. All major findings are prioritized and addressed in the following days, if not hours.
In my next articles, I’ll share more details on each of the steps from setup to preproduction. I’ll also share details on the remaining part (from preproduction to production).
Thanks for reading!
PhD Student (AI/MDE) | SDE @ AWS-M2 BluAge | AI & Data Science enthusiast
4moThanks for sharing this thoughtful approach to operational excellence. I found the focus on proactive testing and knowledge sharing particularly compelling, as they are crucial for building resilient systems. The BugBash sessions are a great example of combining team-building with quality improvement. Looking forward to the next articles in the series to learn more!