要比较两个仅有一行的PySpark数据框并对其进行修改,你可以使用以下步骤:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data1 = [("Alice", 25, "Female")]
df1 = spark.createDataFrame(data1, ["name", "age", "gender"])
data2 = [("Bob", 30, "Male")]
df2 = spark.createDataFrame(data2, ["name", "age", "gender"])
exceptAll()
方法比较两个数据框并找到差异的行。diff_df = df1.exceptAll(df2)
collect()
方法获取差异行的数据。diff_row = diff_df.collect()[0]
diff_row = ("Charlie", 35, "Male")
updated_df = df2.union(spark.createDataFrame([diff_row], df2.schema))
完整的代码示例如下所示:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data1 = [("Alice", 25, "Female")]
df1 = spark.createDataFrame(data1, ["name", "age", "gender"])
data2 = [("Bob", 30, "Male")]
df2 = spark.createDataFrame(data2, ["name", "age", "gender"])
diff_df = df1.exceptAll(df2)
diff_row = diff_df.collect()[0]
diff_row = ("Charlie", 35, "Male")
updated_df = df2.union(spark.createDataFrame([diff_row], df2.schema))
updated_df.show()
这将输出以下结果:
+-------+---+------+
| name|age|gender|
+-------+---+------+
| Bob| 30| Male|
|Charlie| 35| Male|
+-------+---+------+
在这个示例中,我们首先创建了两个只有一行的数据框,然后使用exceptAll()
方法找到差异的行,再使用collect()
方法获取差异行的数据。然后,我们对差异行进行修改,并将修改后的差异行添加回原始数据框中,最后显示更新后的数据框。
下一篇:比较两个计数