This blog post is a Q&A session with Vino Yang, Senior Engineer at Tencent’s Big Data team. Below, we discuss the benefits of adopting stream processing and Apache Flink for modern application development.
Vino: I am a senior engineer from Tencent's big data team. I am currently involved in the development and maintenance of the Flink engine underneath the Tencent real-time streaming computing platform Oceanus. I participated in expanding the adoption of Flink within Tencent from the very early days to the current setup of nearly 20 trillion events processed per day. I am a long-time active contributor to the Flink project and one of Flink's early evangelists in China.
Vino: I started researching Flink in early 2016, and I first discovered the framework through an article mentioning that Flink was promoted to Apache's top-level projects. In the context of the time, I felt that Flink gave me the impression that it is technologically advanced compared to other streaming processing engines.
Vino: I think that in the domain of streaming computing, Flink is still beyond any other framework, and it is still the first choice. As the community continues to grow and contribute new features, I could see Flink achieving the unification of streaming and batch, improving the domain library of graph computing, machine learning and so on. At the same time, providing that Flink remains connected to the wider ecosystem and other frameworks and programming languages, its prospect will be very optimistic.
Vino: My favourite Flink feature is "guarantee of correctness". To elaborate, it includes "event time" semantics, checkpoint alignment, "abs" checkpoint algorithm, flexible state backend, and so on.
On our Oceanus platform, most of the applications we create will turn on checkpointing so that are well fault-tolerant and ensure correctness of the results.
Vino: Obviously, the answer is: yes. With the development of big data, the companies' goal is not only to deal with the massive data, but to pay attention to the timeliness of data processing. Early studies have shown that the lower the delay of data processing, the higher its value. Stream processing is the best-known and lowest delay data processing way at the moment, and I believe it will have broad prospects.
Vino: I think open source technology is already a trend, and this trend will continue to expand. Open source helps bring together developers from all over the world who contribute their ideas and code in the same field. This cohesion is very powerful, and the Linux project has proven this. However, it is worth noting that the profit model of open source technology frameworks needs additional exploration.
Vino: My answer is: Yes. Many companies and especially startups main goal is to use Flink's API to implement their business logic. At this point, Flink provides a multi-level API abstraction and rich transformation functions to meet their needs.
Vino: In my opinion, Flink’s native support for state is one of its core highlights, making it different from other stream processing engines. It provides a prerequisite for ensuring the correctness of stream processing.
For new developers, the project’s official website can help them get a deeper understanding of Flink. Flink's dev and users mailing lists are very active, which can help answer their questions.
Vino: I have participated in the Flink community. I have been contributing some features and fixing some issues to the Flink community when I developed Oceanus. I have submitted nearly 100 commits to the community. I also actively participate in the mailing list and help review PR. Of course, other colleagues in my team are also actively participating in the community's contribution. I feel that the community is constantly growing, more and more developers and users are involved, and a lot of software developers from China have joined recently. This is a very good phenomenon. The one thing to improve is the review process in the community which is relatively slow. This causes some PRs response times to increase, but I believe the community will find a way to solve this problem.
Vino: Oceanus is a one-stop real-time streaming computing platform. It has made numerous enhancements and improved the ease of use of Apache Flink. It allows users to submit jobs with one of JAR, SQL, and canvas ways. We previously published an introductory article on the Flink community blog, which gave a detailed introduction to Oceanus.
If you want to get involved and stay up-to-date with the latest developments of Apache Flink, we encourage you to subscribe to the Apache Flink Mailing Lists. If you have questions or feedback, feel free to get in touch below!