Beruflich Dokumente
Kultur Dokumente
TODO:
Before we start, was the table created for the Spark dataframe vs python vs SQL ???
https://cloudxlab.com/assessment/slide/18/writing-spark-applications?course_id=1
https://cloudxlab.com/assessment/slide/58/spark-project-log-parsing?course_id=1
https://cloudxlab.com/assessment/slide/29/apache-spark-key-value-rdd/1244/project-
handling-binary-files?course_id=1
==> Grouping
"NYC", [(20, "1-1-2018"), (21, "2-1-2018")]
"SEATLE", [(20, "1-1-2017")]
=> max
"NYC", (21, "2-1-2018")
"SEATLE", (20, "1-1-2017"
----
Discuss
----
class Customer {
var Name:String
var Address: String
}
=======
var nums = sc.parallelize((1 to 100000), 50)
def mysum(itr:Iterator[Int]):Iterator[Int] = {
return Array(itr.sum).toIterator
}
var partitions = nums.mapPartitions(mysum)
partitions1.persist()
partitions1.collect()
partitions1.unpersist()
var mysl = StorageLevel(true, true, false, true, 1)
import org.apache.spark.storage.StorageLevel
var mysl = StorageLevel(true, true, false, true, 1)
partitions1.persist(mysl)
partitions1.collect()
:history
res = userData.join(events)
res = [(UserID1, (UserInfo1, LinkInfo1)), (UserID2, (UserInfo2, LinkInfo2))]