Beruflich Dokumente
Kultur Dokumente
----------------------------------------------
----------------------------------------------
Java Syntax:
---------------
<data_type> <variable_name> = <value / exp> ;
Scala Syntax:
---------------
val <variable_name> : <data_type> = <value / exp>
----------------------------------------------
How to start scala in `command prompt`:
----------------------------------------------
Command: scala
----------------------------------------------
orienit@kalyan:~$ scala
Welcome to Scala version 2.11.7 (OpenJDK 64-Bit Server VM, Java 1.7.0_101).
Type in expressions to have them evaluated.
Type :help for more information.
scala>
----------------------------------------------
scala> val name : String = "kalyan"
name: String = kalyan
----------------------------------------------
Scala:
----------
In `Scala` everything is a `Object`
we don't have any primitive datatypes like Java
Java:
----------
we have Objects
we have primitive datatypes
Examples:
-------------
int a = 1;
Integer a = 1;
Note:
1. In java `Objects` are serializable, not the primitive datatypes.
----------------------------------------------
`Scala` provides `Type Infer`.
----------------------------------------------
Examples for `Type Infer`:
scala> val id = 1
id: Int = 1
scala> val id = 1l
id: Long = 1
scala> val id = 1d
id: Double = 1.0
scala> val id = 1f
id: Float = 1.0
----------------------------------------------
Examples for `Operator Overloading`:
scala> val a = 10
a: Int = 10
scala> val b = 20
b: Int = 20
scala> val c = a + b
c: Int = 30
a + b ====> a.+(b)
scala> a.-(b)
res0: Int = -10
scala> a.*(b)
res1: Int = 200
scala> a./(b)
res2: Int = 0
scala> a.%(b)
res3: Int = 10
scala> a min b
res4: Int = 10
scala> a max b
res5: Int = 20
----------------------------------------------
Data Type Conversions:
---------------------------
scala> val id = 10
id: Int = 10
scala> id.to
toByte toDouble toInt toShort
toChar toFloat toLong toString
scala> id.toDouble
res6: Double = 10.0
scala> id.toLong
res7: Long = 10
scala> id.toString
res8: String = 10
scala> id.toChar
res9: Char =
scala> id.toByte
res10: Byte = 10
----------------------------------------------
If, If-Else expressions in Scala
----------------------------------------------
if(exp1) {
body1
}
if(exp2) {
body1
} else {
body2
}
if(exp1) {
body1
} else if(exp2) {
body2
} else {
body3
}
Note:
1. Java, C, C++ supports Ternary Operator
Scala Syntax:
-------------------
val <variable_name> : Array[<data_type>] = Array[<data_type>](list of values)
----------------------------------------------
Java Examples:
----------------------
String[] names = {"kalyan", "venkat", "ravi"};
(or)
Scala Examples:
----------------------
val names : Array[String] = Array[String]("kalyan", "venkat", "ravi")
(or)
----------------------------------------------
scala> names(0)
res11: String = kalyan
scala> names(1)
res12: String = venkat
scala> names(2)
res13: String = ravi
----------------------------------------------
scala> val names : Array[String] = new Array[String](3)
names: Array[String] = Array(null, null, null)
scala> names
res17: Array[String] = Array(kalyan, venkat, ravi)
----------------------------------------------
scala> val names = Array[String]("kalyan", "venkat", "ravi")
names: Array[String] = Array(kalyan, venkat, ravi)
----------------------------------------------
----------------------------------------------
scala> for( id <- ids) println(id)
1
2
3
4
5
6
----------------------------------------------
Scala supports 2 types of collections:
1. Immutable (scala.collection.immutable)
2. Mutable (scala.collection.mutable)
----------------------------------------------
scala> scala.collection.immutable.
:: LongMap SortedMap
AbstractMap LongMapEntryIterator SortedSet
BitSet LongMapIterator Stack
DefaultMap LongMapKeyIterator Stream
HashMap LongMapUtils StreamIterator
HashSet LongMapValueIterator StreamView
IndexedSeq Map StreamViewLike
IntMap MapLike StringLike
IntMapEntryIterator MapProxy StringOps
IntMapIterator Nil Traversable
IntMapKeyIterator NumericRange TreeMap
IntMapUtils Page TreeSet
IntMapValueIterator PagedSeq TrieIterator
Iterable Queue Vector
LinearSeq Range VectorBuilder
List RedBlackTree VectorIterator
ListMap Seq VectorPointer
ListSerializeEnd Set WrappedString
ListSet SetProxy
----------------------------------------------
scala> scala.collection.mutable.
AVLIterator ListBuffer
AVLTree ListMap
AbstractBuffer LongMap
AbstractIterable Map
AbstractMap MapBuilder
AbstractSeq MapLike
AbstractSet MapProxy
AnyRefMap MultiMap
ArrayBuffer MutableList
ArrayBuilder Node
ArrayLike ObservableBuffer
ArrayOps ObservableMap
ArraySeq ObservableSet
ArrayStack OpenHashMap
BitSet PriorityQueue
Buffer PriorityQueueProxy
BufferLike Publisher
BufferProxy Queue
Builder QueueProxy
Cloneable ResizableArray
DefaultEntry RevertibleHistory
DefaultMapModel Seq
DoubleLinkedList SeqLike
DoubleLinkedListLike Set
FlatHashTable SetBuilder
GrowingBuilder SetLike
HashEntry SetProxy
HashMap SortedSet
HashSet Stack
HashTable StackProxy
History StringBuilder
ImmutableMapAdaptor Subscriber
ImmutableSetAdaptor SynchronizedBuffer
IndexedSeq SynchronizedMap
IndexedSeqLike SynchronizedPriorityQueue
IndexedSeqOptimized SynchronizedQueue
IndexedSeqView SynchronizedSet
Iterable SynchronizedStack
LazyBuilder Traversable
Leaf TreeSet
LinearSeq Undoable
LinkedEntry UnrolledBuffer
LinkedHashMap WeakHashMap
LinkedHashSet WrappedArray
LinkedList WrappedArrayBuilder
LinkedListLike
----------------------------------------------
1. Convert `Immutable Collection` to `Mutable Collection` using `toBuffer`
----------------------------------------------
Examples on Collections:
----------------------------
val ids = Array[Int](1,2,3,4,5,6)
----------------------------------------------
----------------------------------------------
----------------------------------------------
----------------------------------------------
Scala supports 3 types of functions:
----------------------------------------------
1. Anonymus functions
2. Named functions
3. Curried functions
1. Anonymus functions
----------------------------------------------
(a: Int, b: Int) => { a + b }
scala> add(10,20)
res29: Int = 30
2. Named functions
----------------------------------------------
def add(a: Int, b: Int) = { a + b }
scala> add(1,3)
res30: Int = 4
scala> add(20,10)
res31: Int = 30
3. Curried functions
----------------------------------------------
def add1(a: Int, b: Int) = { a + b }
----------------------------------------------
scala> add1(10,20)
res32: Int = 30
scala> add2(10,20)
<console>:14: error: too many arguments for method add2: (a: Int)(b: Int)Int
add2(10,20)
^
scala> add2(10)(20)
res34: Int = 30
----------------------------------------------
----------------------------------------------
Spark
----------------------------------------------
RDD features:
---------------------
1. Immutability
2. Lazy Evaluation
3. Cacheable
4. Type Infer
RDD Operations:
---------------------
1. Transformations ( convert old_rdd into new_rdd )
1.Transformations:
---------------------------
f1(x) = { x + 1}
f2(x) = { x * x}
min(list) <- 1
max(list) <- 4
sum(list) <- 10
count(list) <- 4
----------------------------------------------
Spark Shell Start commands:
----------------------------------------------
scala => spark-shell
python => pyspark
R => SparkR
----------------------------------------------
----------------------------------------------
How to create RDD?
----------------------------------------------
We can create RDD in spark 2 ways
1. from collections (list, seq, seq, ....)
2. from datasets (txt, csv, tsv, json, hbase, ...)
----------------------------------------------
How to create RDD from collections?
----------------------------------------------
----------------------------------------------
How to create RDD from datasets?
----------------------------------------------
----------------------------------------------
Examples on RDD:
----------------------------------------------
scala> rdd1.getNumPartitions
res0: Int = 4
scala> rdd2.getNumPartitions
res1: Int = 2
----------------------------------------------
NOTE:
1. `rdd.collect()` will display the RDD data in console , similar to PIG `dump`
command.
----------------------------------------------
scala> rdd1.collect()
res3: Array[Int] = Array(1, 2, 3, 4, 5, 6)
scala> rdd2.collect()
res4: Array[Int] = Array(1, 2, 3, 4, 5, 6)
scala> rdd1.glom().collect()
res5: Array[Array[Int]] = Array(Array(1), Array(2, 3), Array(4), Array(5, 6))
scala> rdd2.glom().collect()
res6: Array[Array[Int]] = Array(Array(1, 2, 3), Array(4, 5, 6))
----------------------------------------------
scala> rdd3.getNumPartitions
res7: Int = 3
scala> rdd3.glom().collect()
res8: Array[Array[Int]] = Array(Array(5), Array(1, 2, 6), Array(3, 4))
scala> rdd4.glom().collect()
res10: Array[Array[Int]] = Array(Array(5, 3, 4), Array(1, 2, 6))
----------------------------------------------
scala> rdd1.collect()
res11: Array[Int] = Array(1, 2, 3, 4, 5, 6)
scala> rdd11.collect()
res13: Array[Int] = Array(2, 3, 4, 5, 6, 7)
scala> rdd12.collect()
res14: Array[Int] = Array(2, 3, 4, 5, 6, 7)
----------------------------------------------
scala> val rdd11 = rdd1.map((x : Int) => { x + 1 })
rdd11: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[14] at map at <console>:28
scala> rdd11.collect
res15: Array[Int] = Array(2, 3, 4, 5, 6, 7)
scala> rdd13.collect
res16: Array[Int] = Array(2, 3, 4, 5, 6, 7)
scala> rdd14.collect
res17: Array[Int] = Array(2, 3, 4, 5, 6, 7)
----------------------------------------------
val path = "file:///home/orienit/work/input/demoinput"
val rdd = sc.textFile(path)
----------------------------------------------
scala> rdd.getNumPartitions
res18: Int = 2
scala> rdd.getNumPartitions
res19: Int = 1
scala> rdd.collect()
res20: Array[String] = Array(I am going, to hyd, I am learning, hadoop course)
scala> rdd.collect().foreach(println)
I am going
to hyd
I am learning
hadoop course
----------------------------------------------
Word Count in Spark:
----------------------------------------------
sorted.saveAsTextFile(output)
----------------------------------------------
----------------------------------------------