parquet pyspark 1

parquet pyspark

df.write.parquet("AA_DWF_All.parquet",mode="overwrite")
df_new = spark.read.parquet("AA_DWF_All.parquet")
print(df_new.count())

Here is what the above code is Doing:
1. Read the data from the parquet file
2. Create a new dataframe with the same columns as the original dataframe
3. Append the new dataframe to the original dataframe
4. Write the new dataframe to the parquet file
5. Read the data from the parquet file
6. Print the number of rows in the dataframe

Similar Posts