Sorts this RDD by the given keyfunc 1

Sorts this RDD by the given keyfunc

# sortBy(keyfunc, ascending=true, numPartitions=None)

tmp = [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)]
sc.parallelize(tmp).sortBy(lambda x: x[0]).collect()
# [('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)]
sc.parallelize(tmp).sortBy(lambda x: x[1]).collect()
# [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)]

Here is what the above code is Doing:
1. Create a list of tuples, where each tuple consists of a string and an integer.
2. Create a SparkContext.
3. Parallelize the list created in step 1.
4. Sort the RDD by the first item in the tuple.
5. Collect the results.

Similar Posts