This article quickly describes how we handled the transit of external-generated customers events toward our internal Kafka cluster and how we built a reliable failover system using Flume-NG.
We started to work with Apache Flume on its 0.9 version since the beginning, because it was fitting well our need to make internet events landing into a first dumb backet, before being processed.
In the beginning of this story we used to cluster our Spark applications using Apache Yarn as our main Resource Manager. At that time we considered a RM more like a “Spark extension”, basically used to optimize Spark processes and nothing more. Our usage of Yarn then, never went beyond deploying those applications, monitoring them via web browser typing something like http://yarn.url:4040/<spark_app>.