如何使用 kafka 流以块/批次的形式处理数据？

首页课程实战体系课手记专栏慕课教程

对于大数据中的许多情况，最好一次处理一小块记录缓冲区，而不是一次处理一条记录。

自然的例子是调用一些支持批处理以提高效率的外部 API。

我们如何在 Kafka Streams 中做到这一点？我在 API 中找不到任何看起来像我想要的东西。

到目前为止，我有：

builder.stream[String, String]("my-input-topic")

.mapValues(externalApiCall).to("my-output-topic")

我想要的是：

builder.stream[String, String]("my-input-topic")

.batched(chunkSize = 2000).map(externalBatchedApiCall).to("my-output-topic")

在 Scala 和 Akka Streams 中，该函数被称为groupedor batch。在 Spark Structured Streaming 中，我们可以做到mapPartitions.map(_.grouped(2000).map(externalBatchedApiCall))。

元芳怎么了

浏览 190回答 2

随时随地看视频慕课网APP