Beruflich Dokumente
Kultur Dokumente
Assignment
SubmittedBy:-VaibhavSingh
14B00033
CSC
Writeaprograminpysparkthefollowingquestions:-
1. Toincrementeachnumberinalistbyone.
l1=sc.parallalize([1,2,3,4,5])
l1.collect()
l1=rdd1.map(lambda x:x+1)
l1.collect()
Output = [2,3,4,5,6]
2. Tomultiplyeachnumberinalistby10
l1=sc.parallalize([1,2,3,4,5])
l1.collect()
l1=rdd1.map(lambda x:x*10)
l1.collect()
Output = [10,20,30,40,50]
3. To find most commonly occurring words with their associated
frequencies.
s=["a","b","a","c","a"]
s1=sc.parallelize(s)
print s2.collect()
Ouput = [("a",3),("b",1),("c",1)]
4. Findfrequencyofeachstate:-
State=["delhi","HP","HR","HR","UP"]
s=["delhi","HP","HR","HR","UP"]
s1=sc.parallelize(s)
s2=s1.map(lambda x:(x,1)).reduceByKey(add).collect())
print s2.collect()
Output = [("delhi",1),("HP",1),("HR",2),("UP",1)]
5. Toprintevennumbersoutofalistofnumbers.
l1=sc.parallalize([1,2,3,4,5,6])
l1.collect()
l2=l1.filter(lambda x:x%2==0)
print l2.collect()
Output = [2,4,6]
7. Differentiatebetweenmapandflatmap.
Here is an example of the difference:
val textFile = sc.textFile("README.md") // create an RDD of lines of text
// MAP:
// FLATMAP:
The input and output RDDs will therefore typically be of different sizes.
(You may need to call collect() on the RDDs generated in the examples above - I have
omitted this for clarity)