我正在使用以下指南在我的 aks Kubernetes 集群中部署 pyspark:
https://towardsdatascience.com/ignite-the-spark-68f3f988f642
http://blog.brainlounge.de/memoryleaks/getting-started-with-spark-on-kubernetes/
我已经按照上面链接中的说明部署了我的驱动程序吊舱:
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: spark
name: my-notebook-deployment
labels:
app: my-notebook
spec:
replicas: 1
selector:
matchLabels:
app: my-notebook
template:
metadata:
labels:
app: my-notebook
spec:
serviceAccountName: spark
containers:
- name: my-notebook
image: pidocker-docker-registry.default.svc.cluster.local:5000/my-notebook:latest
ports:
- containerPort: 8888
volumeMounts:
- mountPath: /root/data
name: my-notebook-pv
workingDir: /root
resources:
limits:
memory: 2Gi
volumes:
- name: my-notebook-pv
persistentVolumeClaim:
claimName: my-notebook-pvc
---
apiVersion: v1
kind: Service
metadata:
namespace: spark
name: my-notebook-deployment
spec:
selector:
app: my-notebook
ports:
- protocol: TCP
port: 29413
clusterIP: None
然后我可以使用以下代码创建 Spark 集群:
import os
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
# Create Spark config for our Kubernetes based cluster manager
sparkConf = SparkConf()
sparkConf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443")
sparkConf.setAppName("spark")
sparkConf.set("spark.kubernetes.container.image", "<MYIMAGE>")
sparkConf.set("spark.kubernetes.namespace", "spark")
sparkConf.set("spark.executor.instances", "7")
sparkConf.set("spark.executor.cores", "2")
sparkConf.set("spark.driver.memory", "512m")
sparkConf.set("spark.executor.memory", "512m")
sparkConf.set("spark.kubernetes.pyspark.pythonVersion", "3")
据我所知,我正在尝试在客户端节点中运行我的 Spark 集群,jupyter pod 充当主节点并创建从属节点,当我在 jupyter pod 内运行代码时它可以工作,但当其他 pod 尝试连接它时它可以工作。
我该如何解决这个问题?
喵喵时光机
相关分类