我有一个PySpark脚本,可以从MongoDB数据库读取集合。当我在独立模式下运行脚本时,它的工作原理是:
MONGO_URL = "mongodb://USER:PASSWORD@HOST:27017/DB_NAME.COLLECTION"
spark = SparkSession.builder \
.appName('TestMongoLoad') \
.config('spark.mongodb.input.uri', MONGO_URL) \
.getOrCreate()
df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()
spark-submit \
--master local[*] \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1 \
load_from_mongo.py
[SUCCESS]
当我在群集上运行脚本时,它失败了:
spark-submit \
--master yarn \
--deploy-mode client \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 3 \
--num-executors 10 \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1 \
load_from_mongo.py
慕慕森
相关分类