我使用的是旧的pyspark脚本。我正在尝试将数据帧df转换为rdd。
#Importing the required libraries
import pandas as pd
from pyspark.sql.types import *
from pyspark.ml.regression import RandomForestRegressor
from pyspark.mllib.util import MLUtils
from pyspark.ml import Pipeline
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.linalg import Vectors
from pyspark.ml import Pipeline
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.mllib.fpm import *
from pyspark.sql import SparkSession
spark = SparkSession .builder .appName("Python Spark") .config("spark.some.config.option", "some-value")
# read the data
df = pd.read_json("events.json")
df = (df.rdd.map(lambda x: (x[1],[x[0]])).reduceByKey(lambda x,y: x+y).sortBy(lambda k_v: (k_v[0], sorted(k_v[1], key=lambda x: x[1], reverse=True))).collect())
继承人错误输出: AttributeError:'DataFrame'对象没有属性'rdd'
我想念什么?如何将数据帧转换为rdd?
我安装了anaconda 3.6.1和spark 2.3.1
慕妹3242003
随时随地看视频慕课网APP
相关分类