samples = sc.parallelize([ ("abonsanto@fakemail.com", "Alberto", "Bonsanto"), ("mbonsanto@fakemail.com", "Miguel", "Bonsanto"), ("stranger@fakemail.com", "Stranger", "Weirdo"), ("dbonsanto@fakemail.com", "Dakota", "Bonsanto") ]) print samples.collect() samples.saveAsTextFile("folder/here.txt") read_rdd = sc.textFile("folder/here.txt") read_rdd.collect()
Here is what the above code is Doing:
1. Create an RDD from a Python list.
2. Save the RDD to a text file.
3. Read the text file back into an RDD.
4. Print the contents of the RDD.