Flink Forward San Francisco Session Preview: Scaling a real-time streaming warehouse with Apache Flink, Parquet and Kubernetes
Authors: Ramesh Shanmugam & Aditi Verma
Flink Forward San Francisco is a couple of days away! In case you haven’t booked your tickets yet, here’s a sneak preview of our session Scaling a real-time streaming warehouse with Apache Flink, Parquet and Kubernetes, on April 2, 2019, to give you some more insight into what you can expect at the conference next week.
If you haven’t registered already, make sure to book your last minute tickets while they last! Spots are limited so hurry up to secure your place at Flink Forward and learn more about the exciting world of Apache Flink!
Scaling a real-time streaming warehouse
with Apache Flink, Parquet and Kubernetes
Background
Branch is the industry-leading mobile measurement and deep linking platform. For this, we process more than 20 billion events and store several terabytes of data per day.
In this talk, we cover our learnings and challenges running and scaling an Apache Flink Parquet warehouse on Kubernetes. We share our challenges around memory management and failure recovery. We also talk in detail about our current Apache Flink infrastructure, recovery and auto-scaling mechanisms.
Topics covered
This talk covers a detailed overview of our challenges around writing columnar file formats with Flink. We also talk about the decisions taken and learnings around migrating Flink jobs from Mesos on Kubernetes. Then we talk about auto scaling Flink jobs on Kubernetes, as well as efficiently handling failure scenarios
Key takeaways
-
Learnings from running Apache Flink clusters on Mesos and Kubernetes
-
Takeaways from writing Parquet files with ApacheFlink
Make sure to secure your spot by registering on the Flink Forward website today. The event includes multiple tracks and it’s a unique opportunity to bring your knowledge and stream processing expertise to the next level! Sessions cover among other Flink use cases, technology deep dives, Apache Flink and stream processing ecosystem talks and deep dives so don’t miss out on the exciting conference schedule!
About the authors:
Aditi Verma
Aditi is a senior software engineer at Branch, working on developing and scaling their data platform, that processes tens of billion events per day. Prior to Branch, she worked at Yahoo to develop data systems that provide actionable insights and audience targeting from petabytes of data. She has a wide range of experience in the data domain, from stream and batch processing to resource management, scaling and monitoring.
Ramesh Shanmugam
Ramesh Shanmugam is a Senior Data Platform Engineer at Branch Metrics. At Branch, currently, he is building streaming and batch pipelines at a huge scale using Apache Flink, Spark, and Airflow. He has been creating distributed applications for more than 15 years. Passionate about building data-intensive applications.
From Kappa Architecture to Streamhouse: Making the Lakehouse Real-Time
From Kappa to Lakehouse and now Streamhouse, explore how each help addres...
Fluss Is Now Open Source
Fluss, a real-time streaming storage system for data analytics, is now op...
Announcing Ververica Platform: Self-Managed 2.14
Discover the latest release of Ververica Platform Self-Managed v.2.14, in...
Real-Time Insights for Airlines with Complex Event Processing
Discover how Complex Event Processing (CEP) and Dynamic CEP help optimize...