Pyspark 将函数作为参数传递给 UDF

我正在尝试创建一个将另一个函数作为参数的 UDF。但是执行以异常结束。我运行的代码:


import pandas as pd

from pyspark import SparkConf, SparkContext, SQLContext

from pyspark.sql.types import MapType, DataType, StringType

from pyspark.sql.functions import udf, struct, lit

import os


sc = SparkContext.getOrCreate(conf=conf)

sqlContext = SQLContext(sc)


df_to_test = sqlContext.createDataFrame(

    pd.DataFrame({

        'inn': ['111', '222', '333'],

        'field1': [1, 2, 3],

        'field2': ['a', 'b', 'c']

    }))


def foo_fun(row, b) -> str:

    return 'a' + b()


def bar_fun():

    return 'I am bar'


foo_fun_udf = udf(foo_fun, StringType())

df_to_test.withColumn(

    'foo', 

    foo_fun_udf(struct([df_to_test[x] for x in df_to_test.columns]), bar_fun)

).show()

例外:


Invalid argument, not a string or column: <function bar_fun at 0x7f0e69ce6268> of type <class 'function'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.

我试图包装bar_fun成 udf 但没有成功。有没有办法将函数作为参数传递?


慕后森
浏览 156回答 1
1回答

墨色风雨

你离解决方案不远了。这是我会怎么做:def foo_fun_udf(func):&nbsp; &nbsp; def foo_fun(row) -> str:&nbsp; &nbsp; &nbsp; &nbsp; return 'a' + func()&nbsp; &nbsp; out_udf = udf(foo_fun, StringType())&nbsp; &nbsp; return out_udf&nbsp;df_to_test.withColumn(&nbsp; &nbsp; 'foo',&nbsp;&nbsp; &nbsp; foo_fun_udf(bar_fun)(struct([df_to_test[x] for x in df_to_test.columns]))).show()
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python