Don't underestimate Cloud Observability!

Don't underestimate Cloud Observability!

Are you running applications in a cloud-native environment? If so, you should be aware of the importance of Cloud Observability. You may have adopted the methods and tools to work cloud native. Your teams work with scrum and take the operative responsibility by a “You build it - You Own it” approach. Usually, these teams implement automated tests and don’t ship critical or major bugs. But with the number of services, automated deployments, Infrastructure as Code (IaC), or a Service Mesh, the complexity of the platform increases. The number of configurations is huge and nearly each can cause significant trouble, you don’t see, but your customers are.

Cloud Observability allows you to monitor your live infrastructure, detect anomalies, and have declarative monitoring rules stored as code. In this article, we'll explore why cloud observability is essential and how to best implement it. In general, Cloud Observability monitors two parts: the infrastructure and the application or applications.

Es wurde kein Alt-Text für dieses Bild angegeben.


Monitoring the live infrastructure is crucial for ensuring the proper functioning of systems and detecting anomalies. Testing and familiar failure patterns can increase confidence, but cannot test every possible failure scenario. To be alerted of anomalies, active monitoring is necessary. There are many resources available for monitoring infrastructure in a cloud-native environment, including books and videos. The best approach is to have declarative monitoring rules stored as code and available in a self-service manner, while avoiding overcompensation for monitoring.

Monitoring and logging should be a basic assumption for running applications on the infrastructure, and their configuration should be declarative as code. The logging system should consolidate logs based on metadata, while monitoring looks holistically at applications for debugging and verifying desired states. Applications should have multiple instances running to avoid single points of failure. Alerting should be triggered based on metrics and SLO of the applications and is different from monitoring. If an application has 100 instances, the monitoring system should not trigger an alert for a single unhealthy instance. Proper setup and adjustments are key.

Cloud Observability can be extended in the fields of audit reports, financial operations (FinOps) and security. As with any method and tool, you should keep it simple, and stupid (KISS principle). More important than rich feature is the reliability of your implementation of Cloud Observability tools. The field of tools is large already. The CNCF landscape lists 145 observability tools. The Cloud-Hyperscalers have built-in tools (e.g. AWS CloudWatch ecosystem or Azure Monitor), which are more or less sophisticated. OpenTelemetry-based tool stacks (e.g. JaegerTracing) close some gaps. The full suite of features is provided by independent vendors, such as DataDog, New Relic or Instana. Which tool is the best, you may ask? As usual, it depends on the use cases you need and the balance of cost and value. (tarent can help you to make a good decision for your individual use case).

I hope this article was helpful to you. 👉 Follow or connect with me for more insights about #cloudnative, #technology#metaverse, and #trends!

👇 Do you have any awesome tool suggestions? Leave a comment!

Daniel Clasen

Driving Digital Impact Through Technology, Product & Agile Synergy | Practice Lead Custom Software Solutions @ Qvest

2y

May I just add 'proper logging' and 'meaningful metrics' on the application layer as first goto-tools? Rumors say that those might just pay off big-time in the end if done right in the first place. 😅 Fun aside, I heard some good stuff about Apptio and their products. 🔍 🙂 Also LeanIX with their SaaS Management Platform could be interesting if we are talking rather big numbers in terms of Systems involved. 📈 😯

To view or add a comment, sign in

More articles by Patrick Steinert

Insights from the community

Others also viewed

Explore topics