Stream Processing & Apache Flink - News and Best Practices

Announcing the Release of Apache Flink 1.20

Written by Ververica | 02 August 2024

Ververica, the original creators of Apache Flink congratulates the Apache Flink PMC and community on the release of Apache Flink 1.20.0! This release is jam-packed with a wide variety of improvements and new features. Overall, 142 people contributed to this release completing 13 FLIPs and 300+ issues. Thank you!

Flink 1.20 release aims to streamline data pipeline development and improve the overall user experience. This blog post will highlight some of the more prominent features. For a full list of changes, be sure to check the release notes.

Let’s dive into the highlights.

What's New in Apache Flink 1.20?

Flink SQL Improvements

Two notable Flink SQL improvements include the introduction of a New Materialized Table for Simplifying Data Pipelines and Catalog-Related Syntax.

The new Materialized table (FLIP-435) is designed to simplify the development of data processing pipelines and simplify both batch and streaming data processing by allowing uniform SQL operations and automatic data freshness management. With this dynamic table, uniform SQL statements and freshness, users can define batch and streaming transformations to data in the same way, accelerate ETL pipeline development, and manage task scheduling automatically.

Users no longer have the burden of understanding the concepts and differences between streaming and batch processing, and they do not have to directly maintain Flink streaming or batch jobs. All operations are done on Materialized tables, which can significantly accelerate ETL pipeline development.

Catalog-Related Syntax provides better metadata management capabilities, facilitating easier and more efficient SQL-based data processing in Flink. With the growing adoption of Flink SQL, the implementation of Flink's Catalog Interface plays an increasingly important role. With Flink 1.20, you can use the DQL syntax to obtain detailed metadata from existing catalogs, and the DDL syntax to modify metadata such as properties or comments in the specified catalog.

And the newly added “DISTRIBUTED BY” clause introduces flexible data partitioning options (FLIP-376) designed to facilitate easier and more efficient SQL-based data processing in Flink.

State and Checkpoint Improvements

In Flink 1.20, improvements to state management and checkpointing enhance performance and reliability including a unified file merging mechanism for checkpoints and compaction of small SST files.

The unified file merging mechanism for checkpointing (=FLIP-306) allows combining multiple small checkpoint files into fewer, larger files, reducing file system metadata overhead.

The compaction of small SST files addresses issues with excessive small file generation. From release 1.20 on, Flink can merge these files in the background using the RocksDB API.

Batch Processing Improvements

For batch processing, a new job recovery mechanism enables batch jobs to resume progress after JobMaster failures, avoiding the need to rerun completed tasks (FLIP-383). These enhancements aim to optimize checkpointing efficiency and state management robustness in Flink.

Additionally, Flink 1.20 also supports dynamic source parallelism inference in batch jobs for the Hive source connector (FLIP-445). This allows the connector to dynamically determine parallelism based on the actual partitions with dynamic partition pruning.

DataStream API Improvements

Flink 1.20 supports full partition processing on the DataStream API (FLIP-380). This adds built-in support for aggregations on non-keyed streams (subtask-scope aggregations) with the FullPartitionWindow API.

Configuration Improvements

To improve ease of use and maintainability, Flink 1.20 provides significant configuration improvements. Key updates include the conversion of several configuration options to types like Duration, Enum, and Int for consistency. Additionally, Flink 1.20 reorganizes state and checkpointing options with new prefixes for better clarity.

Upgrade Notes

The Flink community tries to ensure that upgrades are as seamless as possible. However, certain changes may require users to make adjustments to certain parts of the program when upgrading to version 1.20. Please refer to the release notes for a comprehensive list of adjustments to make and issues to check during the upgrading process.

Getting Started with Flink

  • Apache Flink is available for download.
  • Get started with Veverica Cloud to unlock the full power of Apache Flink as a managed service.
  • For additional information on the release of 1.20, check out the blog post published by Weijie Guo and Rui Fan on The Apache Flink Blog.

Get Involved

In addition to boasting continuous growth over the past 10 years, Apache Flink received the 2023 SIGMOD Systems Award, which is awarded to an individual or set of individuals to recognize the development of a software or hardware system whose technical contributions have had significant impact on the theory or practice of large-scale data management systems for: “...greatly expanding the use of stream data-processing.”

The vitality of Flink relies on continued community growth, which would not be possible without each and every contributor to the project. The Flink community welcomes contributions from anyone with a passion for open source, messaging and streaming! Looking for more ways to stay connected with the Flink community? Check out the following resources:

  • Ready to start contributing to the project? Start with the  Apache Flink Contributors guide.
  • Follow @ApacheFlink on Twitter/X , and join the Flink community on Slack.
  • Get started with Veverica Cloud to unlock the full power of Apache Flink as a managed service.
  • Join the Flink Community at Flink Forward Berlin 2024 in Berlin! Network with experts and thought leaders in the data streaming space, including a session about upcoming Apache Flink 2.0.

List of Contributors

Ververica offers our congratulations and thanks to all the contributors who made this release possible:

Ahmed Hamdy, Alan Sheinberg, Aleksandr Pilipenko, Alexander Fedulov, Andrey Gaskov, Antonio Vespoli, Anupam Aggarwal, Barak Ben-Nathan, Benchao Li, Brad, Cheng Pan, Chesnay Schepler, DamonXue, Danny Cranmer, David Christle, David Moravek, David Schlosnagle, Dawid Wysakowicz, Dian Fu, Dmitriy Linevich, Elphas Toringepi, Emre Kartoglu, Fang Yong, Feng Jin, Ferenc Csaky, Frank Yin, Gabor Somogyi, Gyula Fora, HCTommy, Hangxiang Yu, Hanyu Zheng, Hao Li, Hong Liang Teoh, Hong Teoh, HuangXingBo, Jacky Lau, James Hughes, Jane Chan, Jeyhun Karimov, Jiabao Sun, Jim Hughes, Jing Ge, Jinzhong Li, JunRuiLee, Juntao Hu, JustinLee, Kartikey Pant, Kumar Mallikarjuna, Leonard Xu, Lorenzo Affetti, Luke Chen, Martijn Visser, Mason Chen, Matthias Pohl, Mingliang Liu, Panagiotis Garefalakis, Peter Huang, Peter Vary, Piotr Nowojski, Puneet Duggal, Qinghui Xu, Qingsheng Ren, Ravi Dutt Singh, Robert Metzger, Robert Young, Roc Marshal, Roman, Roman Boyko, Roman Khachatryan, Ron, Rui Fan, Ryan Skraba, Samrat, Sergey Nuyanzin, Shilun Fan, Stefan Richter, SuDewei, Timo Walther, Ufuk Celebi, Vincent Woo, Wang FeiFan, Weijie Guo, Wencong Liu, Wouter Zorgdrager, Xiangyu Feng, Xintong Song, Xuyang, Yanfei Lei, Yangze Guo, Yu Chen, Yubin Li, Yuepeng Pan, Yun Tang, Yuxin Tan, Zakelly, Zhanghao Chen, Zhen Wang, Zhenqiu Huang, Zhu Zhu, Zmm, ammar-master, anupamaggarwal, bvarghese1, caicancai, caodizhou, chenzihao, drymatini, dsaisharath, eason.qin, elon-X, fengli, gongzhongqiang, hejufang, jectpro7, jiangxin, liming.1018, lincoln lee, liuyongvs, lxliyou001, oleksandr.nitavskyi, plugatarev, rmoff, slfan1989, spoon-lz, sunxia, sxnan, sychen, wforget, xiaogang, xingbo, yebukong, yunfengzhou-hub, yunhong, zhouyisha, 马越