What are the benefits of stream processing with Apache Flink for modern application development?
This blog post is a Q&A session with Vino Yang, Senior Engineer at Tencent’s Big Data team. Below, we discuss the benefits of adopting stream processing and Apache Flink for modern application development.
Q: Tell us a bit about you and your role at Tencent?
Vino: I am a senior engineer from Tencent's big data team. I am currently involved in the development and maintenance of the Flink engine underneath the Tencent real-time streaming computing platform Oceanus. I participated in expanding the adoption of Flink within Tencent from the very early days to the current setup of nearly 20 trillion events processed per day. I am a long-time active contributor to the Flink project and one of Flink's early evangelists in China.
Q: When did you start working with Flink? How did you discover the framework and what was your first impression of Flink as a stream processing engine?
Vino: I started researching Flink in early 2016, and I first discovered the framework through an article mentioning that Flink was promoted to Apache's top-level projects. In the context of the time, I felt that Flink gave me the impression that it is technologically advanced compared to other streaming processing engines.
Q: What do you think about Apache Flink as a stream processing framework nowadays? Where do you see Flink going in the next few years?
Vino: I think that in the domain of streaming computing, Flink is still beyond any other framework, and it is still the first choice. As the community continues to grow and contribute new features, I could see Flink achieving the unification of streaming and batch, improving the domain library of graph computing, machine learning and so on. At the same time, providing that Flink remains connected to the wider ecosystem and other frameworks and programming languages, its prospect will be very optimistic.
Q: What is your favourite Apache Flink feature? How do you use it or have used it in the past?
Vino: My favourite Flink feature is "guarantee of correctness". To elaborate, it includes "event time" semantics, checkpoint alignment, "abs" checkpoint algorithm, flexible state backend, and so on.
On our Oceanus platform, most of the applications we create will turn on checkpointing so that are well fault-tolerant and ensure correctness of the results.
Q: Do you think that stream processing is changing the way companies run their business? If yes, how is stream processing changing the modern enterprise?
Vino: Obviously, the answer is: yes. With the development of big data, the companies' goal is not only to deal with the massive data, but to pay attention to the timeliness of data processing. Early studies have shown that the lower the delay of data processing, the higher its value. Stream processing is the best-known and lowest delay data processing way at the moment, and I believe it will have broad prospects.
Q: What do you think about the future of open source technologies?
Vino: I think open source technology is already a trend, and this trend will continue to expand. Open source helps bring together developers from all over the world who contribute their ideas and code in the same field. This cohesion is very powerful, and the Linux project has proven this. However, it is worth noting that the profit model of open source technology frameworks needs additional exploration.
Q: Do you think that Apache Flink is a good fit for companies eager to build their stream processing architecture? If yes, why?
Vino: My answer is: Yes. Many companies and especially startups main goal is to use Flink's API to implement their business logic. At this point, Flink provides a multi-level API abstraction and rich transformation functions to meet their needs.
Q: What’s one element that truly differentiates Flink from other stream processing engines? What makes Flink unique and what would be your advice to developers starting with Flink now?
Vino: In my opinion, Flink’s native support for state is one of its core highlights, making it different from other stream processing engines. It provides a prerequisite for ensuring the correctness of stream processing.
For new developers, the project’s official website can help them get a deeper understanding of Flink. Flink's dev and users mailing lists are very active, which can help answer their questions.
Q: Do you have any interaction with the Apache Flink community? What is your impression of the community and how is it compared to other open source communities?
Vino: I have participated in the Flink community. I have been contributing some features and fixing some issues to the Flink community when I developed Oceanus. I have submitted nearly 100 commits to the community. I also actively participate in the mailing list and help review PR. Of course, other colleagues in my team are also actively participating in the community's contribution. I feel that the community is constantly growing, more and more developers and users are involved, and a lot of software developers from China have joined recently. This is a very good phenomenon. The one thing to improve is the review process in the community which is relatively slow. This causes some PRs response times to increase, but I believe the community will find a way to solve this problem.
Q: Tell us a bit more about Tencent’s Oceanus platform built around Flink? What are the platform’s highlights?
Vino: Oceanus is a one-stop real-time streaming computing platform. It has made numerous enhancements and improved the ease of use of Apache Flink. It allows users to submit jobs with one of JAR, SQL, and canvas ways. We previously published an introductory article on the Flink community blog, which gave a detailed introduction to Oceanus.
If you want to get involved and stay up-to-date with the latest developments of Apache Flink, we encourage you to subscribe to the Apache Flink Mailing Lists. If you have questions or feedback, feel free to get in touch below!
From Kappa Architecture to Streamhouse: Making the Lakehouse Real-Time
From Kappa to Lakehouse and now Streamhouse, explore how each help addres...
Fluss Is Now Open Source
Fluss, a real-time streaming storage system for data analytics, is now op...
Announcing Ververica Platform: Self-Managed 2.14
Discover the latest release of Ververica Platform Self-Managed v.2.14, in...
Real-Time Insights for Airlines with Complex Event Processing
Discover how Complex Event Processing (CEP) and Dynamic CEP help optimize...