Big Data Developer Mastery Award Exam - 12 May 2017

Question: 1 of 60
You are trying to load a JSON formatted data file from Hadoop into BigSheets
workbook. The data appears to be formatted incorrectly. What should you do?
A. Export the file to a compatible format.

B. Load the data into RDBMS table.
C. Pick a different catalog table.
D. Change the line reader for the file.
Question: 2 of 60
Which IBM BigInsights component should be used to build extractors from
unstructured data?
A. CRAN
B. Web Tooling
C. HDFS
D. BigSheets
Question: 3 of 60
Which of the following units is larger than the others?
A. TiB
B. ZiB
C. EiB
D. MiB
Question: 4 of 60
Which feature of Apache Spark allows for it to perform faster computations
than just Hadoop2?
A. APIs for Scala, Python, and Java

B. in-memory computation and caching
C. allows for batch applications
D. support for the MapReduce model
Question: 5 of 60
Which of the following describes a big data stream of data?
A. structured data from multiple systems

B. real-time transactions of stocks
C. information from email correspondence
D. collections of unstructured documents
Question: 6 of 60
Which YARN component do clients submit new jobs to in a Yarn architecture?
A. Container
B. NodeManager
C. Application Manager
D. Resource Manager
Question: 7 of 60
How is Apache Ambari used?
A. for flexible data processing framework

B. for machine learning
C. for Hadoop cluster administration
D. for statistics processing
Question: 8 of 60
BigSheets was designed for which group of users?
A. business analysts
B. DBAs
C. programmers
D. system administrators
Question: 9 of 60
When comparing Pig and SQL, which is a true statement?
A. SQL supports pipeline splits and ETL techniques.

B. Pig is a declarative language.
C. Pig can only store data at the end of evaluation.
D. SQL and RDBMSs are generally much faster after data is loaded.
Question: 10 of 60
Which two statements are true regarding the functionality of HDFS? (Choose
two.) (Please select ALL that apply)
A. Files are split across the cluster.

B. Programs are moved to the data for processing.
C. Each node has a copy of all the data.
D. Data can be updated from any node.
Question: 11 of 60
Which command can be used to read the results of a Hadoop computation
stored in a file named results?
A. hadoop fs -pwd results

B. hadoop fs -cat results
C. hadoop fs -ls -R results
D. hadoop fs -rm results
Question: 12 of 60
Which command lists the files available in a folder on HDFS?
A. hdfs dfs -chown

B. hdfs dfs -ls
C. hdfs dfs -put
D. hdfs dfs -cat
Question: 13 of 60
What is the minimum version of Java required to program with Spark v1.x?
A. Java 8
B. Java 7
C. Java 5
D. Java 6 with Spark 1.x, Java 7 with Spark 2.x
Question: 14 of 60
[missing]
Question: 15 of 60
What is the name of the variable of the Spark context in the Scala shell?
A. sp
B. sc
C. scala
D. spk
Question: 16 of 60
[missing]
Question: 17 of 60
Which command is used to submit a job to a Hadoop 2 cluster?
A. hdfs ls filename.jar
B. hadoop dfs -put filename.jar
C. hadoop jar filename.jar
D. jar -tf filename.jar
Question: 18 of 60
Which Apache tool should be used to initialize the Hadoop cluster?
A. Spark
B. Knox
C. Ambari
D. Yarn
Question: 19 of 60
Which two characteristics describe the Platform Symphony grid management
platform? (Choose two.) (Please select ALL that apply)
A. it is designed around a single-tenant platform

B. it can run applications built in Java
C. it manages only grid-aware applications
D. it is reactive to time-critical requirements
Question: 20 of 60
When using Spark Stream, what is a DStream?
A. the processed data stream

B. a bidirectional input/output stream
C. a sequences of RDDs
D. a SQL-to-stream reader
Question: 21 of 60
What does the hadoop fsck command do?
A. deletes old files

B. checks the status of the HDFS cluster
C. enables file system replication
D. rebalances files in the HDFS cluster
Question: 22 of 60
What are two primary issues when dealing with unstructured data in
comparison to structured data? (Choose two.) (Please select ALL that apply)
A. lack of value
B. lack of context
C. lack of availability
D. lack of data types
Question: 23 of 60
What is Hive used for?
A. job submitting and monitoring via GUI

B. migrating data from one cluster to another
C. cluster resource allocation and reporting
D. managing and querying structured data in Hadoop
Question: 24 of 60
Which of the following units is smaller than the others?
A. YiB
B. EiB
C. PiB
D. ZiB
Question: 25 of 60
What does it mean to create a new shard in HBase?
A. Scale out seamlessly to another partition.

B. Create a data restore point.
C. Collect related data together.
D. Remove unused data from the cluster.
Question: 26 of 60
What should be done to load data that is at rest into Hadoop?
A. Use standard HDFS commands to load the data.

B. Setup Flume to import the data into Hadoop.
C. Create a data streamer for the data.
D. Import the data into an RDBMS.
Question: 27 of 60
Which type of statement is available in the Pig Latin language?
A. if
B. case
C. foreach
D. do loops
Question: 28 of 60
In a YARN architecture, what does all computation code run inside of?
A. ApplicationManager
B. Container
C. ResourceManager
D. Distributed Application.
Question: 29 of 60
[missing]
Question: 30 of 60
Which of the following describes data in motion?
A. a file being updated by a sensor network

B. a read-only database view of archived data
C. a text file containing data from the last census
D. a particular encounter in a patient’s medical record
Question: 31 of 60
What command will launch ZooKeeper from a Linux terminal window?
A. ZK.sh
B. zkCli.sh
C. zk-admin
D. zookeepersh
Question: 32 of 60
What is the default storage level for Apache Spark?
A. MEMORY_AND_DISK
B. MEMORY_ONLY
C. OFF_HEAP
D. DlSK_ONLY
Question: 33 of 60
What are two features that BigR provides for R users? (Choose two.) (Please
select ALL that apply)
A. BigR queries on Big Data without MapReduce code

B. provides scalability of machine learning for large data sets
C. is an open source implementation of the R language
D. has a large community library of statistics packages for big data
Question: 34 of 60
Which lBM Biglnsight tool is designed to build extractors to extract structured
data from both unstructured and semi-structured data?
A. Online Analytical Programming (OLAP)

B. Annotation Query Language (AOL)
C. Structured Query Language (SQL)
D. Distributed File System (DFS)
Question: 35 of 60
What is the correct definition of a Proximity Rule?
A. Proximity Rule is a component for building extractors that pull structured

information from unstructured text.
B. Proximity Rule is a union of two or more elements with the same schema.
C. Proximity Rule specifies the minimum and maximum number of tokens that
might occur before or after.
D. Proximity Rule is a programming model for processing and generating large
datasets.
Question: 36 of 60
What does each view define from an AOL extractor?
A. a row
B. a column
C. a relation
D. a dictionary
Question: 37 of 60
What should you do when an extractor generates multiple rows for the same
text?
A. Re-create the extractors.

B. Edit the properties of the sequence.
C. Create a new filter.
D. Create a consolidation rule.
Question: 38 of 60
Which text analytics phase should you stay in until you are satisfied that
extractors are meeting your requirements?
A. Analysis
B. Production
C. Performance Tuning
D. Rule Development
Question: 39 of 60
Which pre-built extractors are used for extracting numeric information and
percentages?
A. generic extractors
B. named-entity extractors
C. financial extractors
D. other extractors
Question: 40 of 60
During which text analytics phase should you refine the rules for runtime
enhancements and speed?
A. Analysis
B. Production
C. Rule Development
D. Performance Tuning
Question: 41 of 60
Which category of pre-built extractors is used to extract dates and emails?
A. financial extractors
B. generic extractors
C. other extractors
D. named-entity extractors
Question: 42 of 60
You have some results from an extractor. You need to see which extractor
found the results. Which button should you click on?
A. Service Actions
B. Filter Button
C. Show Extractor Name
D. Tag Button
Question: 43 of 60
Which approach is taken during the Analysis phase using text analytics?
A. Refine rules for runtime performance.

B. Create extractors that will meet requirements.
C. Verify appropriate data is being extracted.
D. Locate examples of information to be extracted.
Question: 44 of 60
Which AQL candidate rule is used to perform pattern matching?
A. Sequence
B. Select
C. Union
D. Blocks
Question: 45 of 60
Which two types of encoding can an HBASE entity have? (Choose Two.)
(Please select ALL that apply)
A. binary
B. hex
C. string
D. decimal
E. character
Question: 46 of 60
Which lSO/lEC standards and capabilities does Big SQL support?
A. SOL:2008
B. SQL:2011
C. SQL:2013
D. SQL:2006
Question: 47 of 60
Which column storage format for Big SQL allows for parallel processing of row
collections in a cluster?
A. Parquet
B. ORC
C. Sequence
D. Avro
Question: 48 of 60
Which feature of Big SQL can split up data that will be later added into multiple
files?
A. BigSheets
B. Primary Key
C. Foreign Key
D. Partitioned Tables
Question: 49 of 60
Which Biq SQL file format is supported by a native l/O engine, is the highest
performance format, and has an optimal block size of 168 (the same as the
HDFS block size)?
A. Parquet
B. ORC
C. Sequence
D. Avro
Question: 50 of 60
What is the reason that no information other than EXPLAIN itself is collected in
the snapshot column when a Big SQL EXPLAIN statement is executed?
A. No column was defined for the snapshot.

B. The WITH SNAPSHOT clause was used.
C. No errors occurred during compilation of the explainable statement.
D. The FOR SNAPSHOT clause was used.
Question: 51 of 60
What should you create when you need to query a remote table and a Big SQL
table?
A. a snapshot
B. a database
C. a nickname
D. a union
Question: 52 of 60
How is HBASE different from a relational database?
A. Is not useful for sparse datasets.

B. All data is stored as bit arrays.
C. It has a schema.
D. All data is stored as byte arrays.
Question: 53 of 60
Which connector type does LOAD use to retrieve data from a Hadoop data
source into an HBASE table?
A. DATAACCESS
B. JDBC URL
C. Insert
D. Sqoop
Question: 54 of 60
Which command is recommended to get HDFS data into a Big SQL table
because it has the best runtime performance?
A. Select
B. Create
C. Load
D. Insert
Question: 55 of 60
You need to create a table in an HDFS file system called "users". Which
command should you use?
A. create hdfs table users

B. create hadoop table users
C. create external users
D. create table users
Question: 56 of 60
What are the two ways to invoke the EXPLAIN statement in Big SQL? (Choose
two.) (Please select ALL that apply)
A. with an incremental scan

B. interactively
C. embedded into an application
D. in batch scripts
Question: 57 of 60
In a Spark operation, what is considered a "lazy" evaluation?
A. Count
B. Transformations
C. Actions
D. Sequences
Question: 58 of 60
What is the default directory in HDFS that holds any tables created by Big
SQL?
A. /apps/hive/warehouse/
B. /apps/hbase/data/
C. /apps/hive/warehouse/schema/
D. /apps/hbase/warehouse
Question: 59 of 60
Which ANALYZE TABLE...COMPUTE STATISTICS... command option should
you run to collect just basic table statistics (number of files and partitions, total
size)?
A. -NOSCAN
B. -COPYHIVE
C. -INCREMENTAL
D. -PARTIALSCAN
Question: 60 of 60
How should you collect statistics about your data in a Big SQL table to help
better organize your data?
A. Run the SYSPROCSYSINSTALLOBJECT procedure.

B. Run the SJSQSH_HOME/bin/JSQSH script.
C. Run the EXPLAIN command with no columns selected.
D. Run the ANALYZE command with at least one column selected.

Big Data Developer Mastery Award Exam - 12 May 2017

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Big Data Developer Mastery Award Exam - 12 May 2017

Hochgeladen von

Copyright:

Verfügbare Formate

Question: 1 of 60

A. Export the file to a compatible format.

A. APIs for Scala, Python, and Java

A. structured data from multiple systems

A. for flexible data processing framework

A. SQL supports pipeline splits and ETL techniques.

A. Files are split across the cluster.

A. hadoop fs -pwd results

A. hdfs dfs -chown

A. it is designed around a single-tenant platform

A. the processed data stream

A. deletes old ﬁles

A. job submitting and monitoring via GUI

A. Scale out seamlessly to another partition.

A. Use standard HDFS commands to load the data.

A. a ﬁle being updated by a sensor network

A. BigR queries on Big Data without MapReduce code

A. Online Analytical Programming (OLAP)

A. Proximity Rule is a component for building extractors that pull structured

A. Re-create the extractors.

A. Reﬁne rules for runtime performance.

A. No column was deﬁned for the snapshot.

A. Is not useful for sparse datasets.

A. create hdfs table users

A. with an incremental scan

A. Run the SYSPROCSYSINSTALLOBJECT procedure.

Das könnte Ihnen auch gefallen