Stream Processing & Apache Flink - News and Best Practices

Improvements for Apache Flink with Flink 1.5.4 and 1.6.1 release

Written by Fabian Hueske | 21 September 2018

The Apache Flink® community just released two bugfix versions, Apache Flink 1.5.4 and Apache Flink 1.6.1 including many resolved issues and improvements.

In the sections below we will outline the implemented improvements and main resolved issues. We encourage you to check the release notes for both Apache Flink 1.5.4 and 1.6.1 and upgrade to the latest release to benefit from the improvements.

Feel free to reach out with your feedback and suggestions for further improvements and new features through the Apache Flink mailing list.

Apache Flink 1.5.4

Apache Flink 1.5.4 is the fourth major bugfix version of the Apache Flink 1.5 series. The 1.5.4 release includes more than 20 fixes and improvements such as several updates for logging, and the resolution of an HA bug that prevented proper job cleanups when using standby JobMasters. Additional fixes include resolving timeout issues with long-running savepoints, and improved performance for the overall technology stack. Below we give the list of bug fixes and improvements in Apache Flink 1.5.4

Bug Fixes:

  • [FLINK-9878] - IO worker threads BLOCKED on SSL Session Cache while CMS full gc

  • [FLINK-10011] - Old job resurrected during HA failover

  • [FLINK-10101] - Mesos web UI url is missing.

  • [FLINK-10115] - Content-length limit is also applied to FileUploads

  • [FLINK-10116] - createComparator fails on case class with Unit type fields prior to the join-key

  • [FLINK-10141] - Reduce lock contention introduced with 1.5

  • [FLINK-10142] - Reduce synchronization overhead for credit notifications

  • [FLINK-10150] - Chained batch operators interfere with each other other

  • [FLINK-10172] - Inconsistency in ExpressionParser and ExpressionDsl for order by asc/desc

  • [FLINK-10193] - Default RPC timeout is used when triggering savepoint via JobMasterGateway

  • [FLINK-10204] - StreamElementSerializer#copy broken for LatencyMarkers

  • [FLINK-10255] - Standby Dispatcher locks submitted JobGraphs

  • [FLINK-10261] - INSERT INTO does not work with ORDER BY clause

  • [FLINK-10267] - [State] Fix arbitrary iterator access on RocksDBMapIterator

  • [FLINK-10293] - RemoteStreamEnvironment does not forward port to RestClusterClient

  • [FLINK-10314] - Blocking calls in Execution Graph creation bring down cluster

  • [FLINK-10328] - Stopping the ZooKeeperSubmittedJobGraphStore should release all currently held locks

  • [FLINK-10329] - Fail with the exception if the job cannot be removed by ZooKeeperSubmittedJobGraphStore#removeJobGraph

Improvements:

  • [FLINK-10082] - Initialize StringBuilder in Slf4jReporter with an estimated size

  • [FLINK-10131] - Improve logging around ResultSubpartition

  • [FLINK-10137] - YARN: Log completed Containers

  • [FLINK-10185] - Make ZooKeeperStateHandleStore#releaseAndTryRemove synchronous

  • [FLINK-10223] - TaskManagers should log their ResourceID during startup

  • [FLINK-10301] - Allow a custom Configuration in StreamNetworkBenchmarkEnvironment

Apache Flink 1.6.1

Apache Flink 1.6.1 is the first major bugfix version of the Apache Flink 1.6 series. The 1.6.1 release includes a new feature with metrics for input/output of buffers, and more than 60 fixes and improvements such as additional metrics for the Kinesis source connector, and support for resuming from savepoints when using the container entrypoint. More improvements include additional control over Akka's thread pools and the ability to remove a memory leak by releasing eagerly archived checkpoints. Below we give the list of bug fixes and improvements in Apache Flink 1.6.1.

Bug Fixes:

  • [FLINK-9289] - Parallelism of generated operators should have max parallelism of input

  • [FLINK-9546] - The heartbeatTimeoutIntervalMs of HeartbeatMonitor should be larger than 0

  • [FLINK-9693] - Possible memory leak in jobmanager retaining archived checkpoints

  • [FLINK-9972] - Debug memory logging not working

  • [FLINK-10011] - Old job resurrected during HA failover

  • [FLINK-10063] - Jepsen: Automatically restart Mesos Processes

  • [FLINK-10101] - Mesos web UI URL is missing.

  • [FLINK-10105] - Test failure because of jobmanager.execution.failover-strategy is outdated

  • [FLINK-10115] - Content-length limit is also applied to FileUploads

  • [FLINK-10116] - createComparator fails on case class with Unit type fields prior to the join-key

  • [FLINK-10141] - Reduce lock contention introduced with 1.5

  • [FLINK-10142] - Reduce synchronization overhead for credit notifications

  • [FLINK-10150] - Chained batch operators interfere with each other other

  • [FLINK-10151] - [State TTL] Fix false recursion call in TransformingStateTableKeyGroupPartitioner.tryAddToSource

  • [FLINK-10154] - Make sure we always read at least one record in KinesisConnector

  • [FLINK-10169] - RowtimeValidator fails with custom TimestampExtractor

  • [FLINK-10172] - Inconsistency in ExpressionParser and ExpressionDsl for order by asc/desc

  • [FLINK-10192] - SQL Client table visualization mode does not update correctly

  • [FLINK-10193] - Default RPC timeout is used when triggering savepoint via JobMasterGateway

  • [FLINK-10204] - StreamElementSerializer#copy broken for LatencyMarkers

  • [FLINK-10255] - Standby Dispatcher locks submitted JobGraphs

  • [FLINK-10261] - INSERT INTO does not work with ORDER BY clause

  • [FLINK-10267] - [State] Fix arbitrary iterator access on RocksDBMapIterator

  • [FLINK-10269] - Elasticsearch 6 UpdateRequest fail because of binary incompatibility

  • [FLINK-10283] - FileCache logs unnecessary warnings

  • [FLINK-10293] - RemoteStreamEnvironment does not forward port to RestClusterClient

  • [FLINK-10314] - Blocking calls in Execution Graph creation bring down cluster

  • [FLINK-10328] - Stopping the ZooKeeperSubmittedJobGraphStore should release all currently held locks

  • [FLINK-10329] - Fail with exception if the job cannot be removed by ZooKeeperSubmittedJobGraphStore#removeJobGraph

New Feature:

Improvements:

  • [FLINK-9013] - Document yarn.containers.vcores only being effective when adapting YARN config

  • [FLINK-9446] - Compatibility table not up-to-date

  • [FLINK-9795] - Update Mesos documentation for flip6

  • [FLINK-9859] - More Akka config options

  • [FLINK-9899] - Add more metrics to the Kinesis source connector

  • [FLINK-9962] - allow users to specify TimeZone in DateTimeBucketer

  • [FLINK-10001] - Improve Kubernetes documentation

  • [FLINK-10006] - Improve logging in BarrierBuffer

  • [FLINK-10020] - Kinesis Consumer listShards should support more recoverable exceptions

  • [FLINK-10082] - Initialize StringBuilder in Slf4jReporter with estimated size

  • [FLINK-10094] - Always backup default config for end-to-end tests

  • [FLINK-10110] - Harden e2e Kafka shutdown

  • [FLINK-10131] - Improve logging around ResultSubpartition

  • [FLINK-10137] - YARN: Log completed Containers

  • [FLINK-10164] - Add support for resuming from savepoints to StandaloneJobClusterEntrypoint

  • [FLINK-10170] - Support string representation for map and array types in descriptor-based Table API

  • [FLINK-10185] - Make ZooKeeperStateHandleStore#releaseAndTryRemove synchronous

  • [FLINK-10223] - TaskManagers should log their ResourceID during startup

  • [FLINK-10301] - Allow a custom Configuration in StreamNetworkBenchmarkEnvironment

  • [FLINK-10325] - [State TTL] Refactor TtlListState to use only loops, no java stream API for performance

We encourage everyone to download the new versions and check out the latest binaries. Feedback through the Flink mailing lists or JIRA is, as always, very much appreciated!

Apache Flink, Flink®, Apache®, the squirrel logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.