Structured streaming默认支持的sink类型有File sink,Foreach sink,Console sink,Memory sink。
特别的说明一下Foreach sink的用法(ps:以通过Foreach sink写入外部redis为例)。
lastEtlData.writeStream().foreach(new TestForeachWriter()).outputMode("update").start();
foreach方法的参数为ForeachWriter对象,看下api说明:
datasetOfString.writeStream().foreach(new ForeachWriter<String>() { @Override public boolean open(long partitionId, long version) { // open connection //此处用于打开连接,以redis为例,此处从redis连接池中获取连接 } @Override public void process(String value) { // write string to connection //此处用于数据写入redis。value为GenericRowWithSchema对象 } @Override public void close(Throwable errorOrNull) { // close the connection //此处用于关闭连接 } });
看一下三个方法在的调用过程,因为是每个Partition的一批数据调用一次,还是需要关注下open,close的频率。一批数据open,close各一次。如下所示:
data.queryExecution.toRdd.foreachPartition { iter => if (writer.open(TaskContext.getPartitionId(), batchId)) { try { while (iter.hasNext) { writer.process(encoder.fromRow(iter.next())) } } catch { case e: Throwable => writer.close(e) throw e } writer.close(null) } else { writer.close(null) } }
附写redis的foreachwriter实现:
public static class TestForeachWriter extends ForeachWriter implements Serializable{ public static JedisPool jedisPool; public Jedis jedis; static { JedisPoolConfig config = new JedisPoolConfig(); config.setMaxTotal(20); config.setMaxIdle(5); config.setMaxWaitMillis(1000); config.setMinIdle(2); config.setTestOnBorrow(false); jedisPool = new JedisPool(config, "127.0.0.1", 6379); } public static synchronized Jedis getJedis() { return jedisPool.getResource(); } @Override public boolean open(long partitionId, long version) { jedis = getJedis(); return true; } @Override public void process(Object value) { GenericRowWithSchema genericRowWithSchema = (GenericRowWithSchema) value; System.out.println(((GenericRowWithSchema) value).get(0).toString()+"-----------"+ ((GenericRowWithSchema) value).get(2).toString()); jedis.set(((GenericRowWithSchema) value).get(Integer.parseInt(genericRowWithSchema.schema().getFieldIndex("ID").get().toString())).toString(),((GenericRowWithSchema) value).get(Integer.parseInt(genericRowWithSchema.schema().getFieldIndex("COUNT(ADA1)").get().toString())).toString()); System.out.println("++++++++++"+((GenericRowWithSchema) value).get(Integer.parseInt(genericRowWithSchema.schema().getFieldIndex("ID").get().toString())).toString()); } @Override public void close(Throwable errorOrNull) { jedis.close(); } }
特别提醒:需要显示实现Serializable接口。
作者:假文艺的真码农
链接:https://www.jianshu.com/p/fa802faf747a
热门评论
一批数据一次open,之后直接process,那jedispool是不是没有存在价值