从 Apache Beam 读取 CSV 并写入 BigQuery

OP 的评论让我意识到我的错误：预期的库是regex，而不是 python 的 builtin re。使用import regex as re不仅让我感到困惑，而且还导致re库抛出nothing to repeat错误。这是因为默认情况下 Dataflow 不会保存您的主会话。当您的解析函数中的代码正在执行时，它无法访问re您在构建时导入的上下文。通常，这会失败NameError，但由于您使用的是有效的库名称，因此代码假定您指的是内置re库并尝试按原样执行它。如果您import regex改用，您会看到NameError: name 'regex' is not defined，这是代码失败的真正原因。为了解决这个问题，要么将导入语句移动到解析函数本身，要么--save_main_session作为选项传递给运行程序。有关更多详细信息，请参见此处。老答案：虽然我不知道您使用的是哪个版本的 Python，但您对正则表达式的怀疑似乎是正确的。 *是一个特殊字符，表示它之前的重复，但它(是一个表示分组的特殊字符，所以类似的模式(*SKIP)在语法上似乎不正确。在 Python 3.7 中，上述表达式甚至无法编译：python -c 'import re; rx = re.compile(r"""\{[^{}]+\}(*SKIP)(*FAIL)|,""")'Traceback (most recent call last):  File "<string>", line 1, in <module>  File "/home/ajp1/miniconda3/envs/b-building/lib/python3.7/re.py", line 234, in compile    return _compile(pattern, flags)  File "/home/ajp1/miniconda3/envs/b-building/lib/python3.7/re.py", line 286, in _compile    p = sre_compile.compile(pattern, flags)  File "/home/ajp1/miniconda3/envs/b-building/lib/python3.7/sre_compile.py", line 764, in compile    p = sre_parse.parse(p, flags)  File "/home/ajp1/miniconda3/envs/b-building/lib/python3.7/sre_parse.py", line 930, in parse    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)  File "/home/ajp1/miniconda3/envs/b-building/lib/python3.7/sre_parse.py", line 426, in _parse_sub    not nested and not items))  File "/home/ajp1/miniconda3/envs/b-building/lib/python3.7/sre_parse.py", line 816, in _parse    p = _parse_sub(source, state, sub_verbose, nested + 1)  File "/home/ajp1/miniconda3/envs/b-building/lib/python3.7/sre_parse.py", line 426, in _parse_sub    not nested and not items))  File "/home/ajp1/miniconda3/envs/b-building/lib/python3.7/sre_parse.py", line 651, in _parse    source.tell() - here + len(this))re.error: nothing to repeat at position 11Python 2.7.15 也不接受它：python2 -c 'import re; rx = re.compile(r"""\{[^{}]+\}(*SKIP)(*FAIL)|,""")'Traceback (most recent call last):  File "<string>", line 1, in <module>  File "/usr/lib/python2.7/re.py", line 194, in compile    return _compile(pattern, flags)  File "/usr/lib/python2.7/re.py", line 251, in _compile    raise error, v # invalid expressionsre_constants.error: nothing to repeat虽然我不知道您要匹配哪些字符串，但我怀疑您的某些字符需要转义。例如"\{[^{}]+\}(\*SKIP)(\*FAIL)|,"

从 Apache Beam 读取 CSV 并写入 BigQuery

1回答