Sie sind auf Seite 1von 4

Hadoop Pre-Test

Q1. In a traditional way of programing where your application is running on one machine and your
database is running on some other machine at different location. Lets assume the data base size is 1TB
and the network bandwidth is 10MBPS. According to you what could be the performance bottle neck?
Choose the best option (
Network band width is the performance bottleneck. If we can increase the network bandwidth
then we can increase the performance since it will increase the data transfer rate
Huge amount of data present on Database is the bottleneck. Since transferring the data over the
low latency network takes lot of time to transfer the data to the application server. This is the
main reason for the performance bottleneck
Performance will only depend on the latency involved in reading the data by the application
and processing. There is as such no performance degradation with respect to the network or
with the data base size.
All of the above
There is no performance bottleneck
Q2. You are performing matrix multiplication on a Quad core machine (4 cores/CPU). The dimensions of
the matrix are 1024 * 1024 and each element in the matrix is a random decimal number. Currently the
matrix multiplication is implemented as single threaded application and you have observed it is taking
too much of time to perform the multiplication. How can you increase the performance?
Since I have 4 cores so I will run 4 threads to perform the multiplication. Each thread will be
working on 256 rows
I will transpose one of the matrix to avoid cache misprediction. And I will also employ 4 threads
to utilize all the cores and each thread will be working on 256 rows
I will transpose both of the matrix and will employ 4 threads to utilize all the cores and each
thread will be working on 256 rows
Since I can run as many threads I want, so I will run 256 threads and each thread takes only 4
rows.
I will run 1024 threads and each thread will take only 1 row and hence it will be very fast.
Q3. You want to store 10TB of data on 100GB hard disk capacity machine. What would be your
approach? Choose the best feasible and realistic option
I will upgrade my hard disk capacity of the machine to 100TB
I will buy high end machine and will store the data
I will partition the data into 100GB chunks and distributes the data across several machines
All the above
Q4. If 1TB of data is stored in a machine of 64GB RAM, how will you perform sort operation on 1TB of
data? Choose the best option
A. I will employ quick sort algorithm to perform sorting.
B. I will first do the sorting on first 100GB of data, then on second 100GB of data and so on. Finally
I will do the final sorting on all the intermediate output
C. I will do a combination of the A and B
D. I will first do the sorting on first 64GB of data, then on second 64GB of data and so on. Finally I
will do the final sorting on all the intermediate output
E. I will do the combination of A and D
Q5. If you have 10million records in the traditional database and you are firing a query which includes
where clause like SELECT * FROM employeeTable WHERE emp_salary > 100000. Choose the best
option which describes this operation. ( Choose 2 options)
Since traditional databases are row oriented in nature where rows and columns are tightly
coupled together. So it takes the entire 10million records into the memory and does the table
scanning and gives back the result. Since this operation is done in memory so it is very fast
Since traditional databases are row oriented in nature where rows and columns are tightly
coupled together. So it takes the entire 10million records into the memory and does the table
scanning and gives back the result. And it is very slows since it performs the comparison
operation with every record
The internal data structures of the databases are B+ trees, so when the data bases size above
few 100GB, retrieving the data becomes very slow.
Traditional Databases are column oriented in nature where importance is given to the columns
instead of rows and hence it knows which column to select depending on the condition and it is
very fast.
Q6. Choose the best option which describes the traditional data bases ( choose the best options)
o Rigid Schema
o Dynamic column creation
o No Dynamic column creation
o Table scanning
o No Table scanning
Q7. If you are storing unstructured data in the traditional RDBMS, which option best describes this
scenario?
A. I can store unstructured data in the traditional RDBMS, but I will be storing lot of nulls along
with the data
B. I cannot store unstructured data
C. I can store unstructured data, but only problem would be I will be continuously changing
schema depending on the columns
D. Both A and C
Q8. What is efficient way of serializing the data over the network? Lets assume that you are passing
some integer values from machine1 to machine 2
I will transfer the data as it is
I will convert the data into binary and then will transfer
I will compress the integer values using some compression algorithm and then transfer
I will convert the integer values into binary and employ compression algorithm and then I will
transfer the data.
Q9. Choose the best option that why you want to go for distributed computing?
A. Because I am facing storage problem
B. Because I am facing performance problem
C. A and B both
D. None of the above

Q10. What is data localization?
Executing application on the machine where data is residing,
Moving data to the machine where application is running
Distributing the data across several machine
Localizing the entire data on single machine
None of these
Q11. What is the disadvantage of traditional application/systems?
Data is local to the application
Data is moved to the application over low latency network
Finite network bandwidth is used
Reading is slow
None of these

Das könnte Ihnen auch gefallen