As the organizers of Flink Forward, at Ververica we take great pride in bringing together the Apache Flink® and streaming data communities. Every year, we appoint a Program Chair responsible for curating a diverse Program Committee. These committee members hail from diverse industries, each possessing extensive expertise in Apache Flink and streaming data technologies. They carefully evaluate talk submissions from the community to shape the Flink Forward program.
We’re proud to welcome Na Yang, Engineering Manager of Uber's Flink Platform team, as one of this year’s Flink Forward Program Committee members. Let's delve into her perspectives on the evolving landscape of real-time data analytics and the pivotal role Apache Flink plays within it.
My journey from MapR, PayPal and Uber covers batch processing, real-time messaging, and real-time data processing platforms. All of them are trying to solve the same type of problem - discovering and creating business value from data. No matter what technology the customer will adopt, their business goal remains the same. Therefore, good technology needs to make it easy for customers to achieve their business goals quickly and in a reliable way. That influenced my perspective on real-time stream data processing. In today's world, how to quickly and easily extract the most valuable information from massive data and apply it to make people’s lives better has become a key consideration of many companies. Real-time data processing is definitely playing a crucial role for these companies to reach their goal.
Sure, I’m looking for two key areas of interest. The first being a Simplified usage of stream processing tools, peered with user-friendly interfaces, allowing for a stream processing system that beginners and non-programmers could easily use. Secondly, batch and real-time streaming unification use cases in the industry. I’d like to see more successful batch and real-time streaming unification use cases used in large-scale data lake ingestion and other business areas.
Flink Forward provides an awesome knowledge-sharing opportunity for Flink developers from different companies to learn from each other. It helps boost innovations in real-time stream processing and speed up new feature development, maturity, and production adoption. Flink Forward helps both my team and the broader community quickly grow real-time processing professional knowledge and also provides us a good opportunity to contribute back to the continuous maturity of real-time data processing.
What excites me the most is the chance to actively address the issues and challenges I encountered in real-time data processing, having worked in it firsthand. Having a seat at the table enables me to shape and influence the broader community.
Having a good observability tool and auto-recovery system is crucial to managing real-time stream processing platforms at scale. Real-time stream processing is usually a long-run application that is required to be highly reliable and resilient to failures. A good observability tool helps detect issues or failures in a timely manner and triggers auto-recovery to keep data processing without data loss. This is a “must-have” for some real-time businesses like Uber. In addition, having a dynamic resource allocation and auto-scaling system to gracefully handle traffic spikes without service disruption is also crucial to managing real-time stream processing platforms at scale.
Apache Flink is widely used at Uber to support Uber’s various real-time business challenges and use cases. Typical use cases include Uber ride surging pricing support, fraud and security attack detection, Uber driver search and matching for Uber rides, real-time advertisement etc. From these implementations, the top 3 lessons learned were:
I think the key factors of making Apache Flink successful in enterprise environments include but are not limited to, highly reliable and highly scalable, simple user interface, ease of operation/maintenance, and a mature ecosystem. Flink Forward promotes innovation in those areas to make Apache Flink more mature and a better fit in the enterprise environment.
A common pitfall of operating Flink or other stream processing platforms is not properly allocating resources and/or tuning memory configurations, which leads to job failure or running in an inefficient manner. Another common pitfall of operating stream processing platforms is noisy neighbor issues in a multi-tenancy environment. Without good resource isolation, multi-tenancy could cause unexpected job failure or running in an inefficient manner.
Uber engineers aspire to integrate additional new features from Apache Flink, fortify its capabilities, and actively contribute to its enhancement within the open-source community. Recently, my team made some contributions to the Flink native Protocol Buffers Support. As the native Protocol Buffers support gains widespread adoption at Uber, we foresee opportunities to enhance its maturity through our contributions to its development. Through Flink Forward, Uber engineers will have the opportunity to learn more new Flink features and use cases for potential integration with Ube r, which could be potentially adopted at Uber and create opportunities for Uber engineers. Thus, fostering avenues for reciprocal contributions to open-source development. Concurrently, they will also share Uber's Flink use cases and insights, aiding other companies in the adoption of Flink's latest features and use cases.
I’d expect the future of real-time data processing would allow for a powerful and easy-to-plug-in tool that can be used anywhere and by anyone. Similar to iPhone adoption; accessible across various age groups and educational backgrounds . The Apache Flink community serves as a driving force behind the evolution of technology and product direction. This engagement further encourages enterprises to embrace and actively contribute to the development of new features and products. More activity and enthusiasm from enterprises promotes customers to adopt new features and products.
Speaking at Flink Forward Berlin 2024 is a great way to connect with hundreds of your peers, and broaden your knowledge on data streaming.
Be quick, Call For Presentation close 11:59 pm (CEST) on May 17th, 2024.