data:image/s3,"s3://crabby-images/85416/85416513892eedd3de683b2eb5c91819d1e2aed6" alt="Frank Kane's Taming Big Data with Apache Spark and Python"
Creating a key/value RDD
Syntactically, there's nothing really special about key/value RDDs, it's all kind of magic in Python. If you are storing a list of two items as your values in the RDD, then it is a key/value RDD and you can treat it as such. Here's a simple example:
totalsByAge = rdd.mapValues(lambda x: (x, 1))
If I want to create a totalsByAge RDD out of an original RDD that contains a single value in each element, my lambda function here, x, can take each rating or each number, whatever happens to be x, and transform that into a pair of the original number and then the number 1. So the syntax with the parentheses indicates that this is a single entity as far as Python is concerned, but it consists of two elements; it's a list of two things and in this example, the first item will be the key and the second item will be the value. Again, the key is important because we can do things like aggregate by key.
That's all there is to creating a key/value RDD. It's also okay to have complex things in the value of the key/value RDD; so I could keep the key as being the original value here from the first RDD and make the value itself a list of however many elements I want to. I'm not limited to just storing one thing in the value of the key/value RDD, I can store a list of things there if I want to as well, and we're going to do that in this example too, just to illustrate how that works.