"Stream processing applications can experience downtime due to a variety of reasons, such as a Kafka broker or another part of the infrastructure breaking down, an unexpected record (known as a poison pill) that causes the processing logic to get stuck, or a poorly performed upgrade of the application that yields unintended consequences. Apache Kafka's native stream processing solution, Kafka Streams, has been successfully used with little or no downtime in many companies. This has been made possible by several robustness features built into Streams over the years and best practices that have evolved from many years of experience with production-level workloads. In this talk, I will present the unique solutions the community has found for making Streams robust, explain how to apply them to your workloads and discuss the remaining challenges. Specifically, I will talk about standby tasks and rack-aware assignments that can help with losing a single node or a whole data center. I will also demonstrate how custom exception handlers and dead letter queues can make a pipeline more resistant to bad data. Finally, I will discuss options to evolve stream topologies safely."