我编写了读取 CSV 文件并将结果写入控制台的 spark 程序。我在运行它时收到错误。我正在使用火花 2.2.0。
示例文件:
EmployeeID,FirstName,LastName,DepartmentId,Salaray
1,Gowdhaman,Dhandapani,IT,10000
2,Shaara,Gowdhaman,IT,150000
3,Karthiga,Gowdhaman,IT,120000
4,Aravind,Gunasekaran,Mech,100000
5,Padma,Dhandapani,Home,10000
程序:
from pyspark.sql import SparkSession
def read_csv(spark, filename):
df = spark.read.load(filename, format='.csv', sep=',', header = 'true')
return df
def main():
spark = SparkSession \
.builder \
.appName('Python Spark SQL Basic example') \
.getOrCreate()
emp = read_csv(spark, 'Employee.csv')
emp.show()
if __name__ == '__main__':
main()
相关分类