在DataflowRunner内部的实现细节方面,很多人可能并不关心是否BigQuerySink使用WriteToBigQuery。
但是,在我的情况下,我试图将我的代码部署到带有 RunTimeValueProvider 的参数的数据流模板。支持此行为WriteToBigQuery:
class WriteToBigQuery(PTransform):
....
table (str, callable, ValueProvider): The ID of the table, or a callable
that returns it. The ID must contain only letters ``a-z``, ``A-Z``,
numbers ``0-9``, or underscores ``_``. If dataset argument is
:data:`None` then the table argument must contain the entire table
reference specified as: ``'DATASET.TABLE'``
or ``'PROJECT:DATASET.TABLE'``. If it's a callable, it must receive one
argument representing an element to be written to BigQuery, and return
a TableReference, or a string table name as specified above.
Multiple destinations are only supported on Batch pipelines at the
moment.
BigQuerySink不支持它:
class BigQuerySink(dataflow_io.NativeSink):
table (str): The ID of the table. The ID must contain only letters
``a-z``, ``A-Z``, numbers ``0-9``, or underscores ``_``. If
**dataset** argument is :data:`None` then the table argument must
contain the entire table reference specified as: ``'DATASET.TABLE'`` or
``'PROJECT:DATASET.TABLE'``.
更有趣的是,BigQuerySink在代码中从 2.11.0 开始就被弃用了。
@deprecated(since='2.11.0', current="WriteToBigQuery")
但是在 DataFlowRunner 中,当前的代码和注释似乎完全不符合预期WriteToBigQuery使用 over 的默认类BigQuerySink:
def apply_WriteToBigQuery(self, transform, pcoll, options):
# Make sure this is the WriteToBigQuery class that we expected, and that
# users did not specifically request the new BQ sink by passing experiment
# flag.
我的问题是双重的:
DataflowRunner
为什么类和类之间的合同/期望存在差异io.BigQuery
?
无需等待错误修复,有人对如何强制使用DataflowRunner
overWriteToBigQuery
有建议BigQuerySink
吗?
偶然的你
相关分类