1. To copy a file into the Hadoop file system, what command should you use?
Answer
Correct Answer:
Hadoop fs -copyFromLocal
Note: This Question is unanswered, help us to find answer for this one
2. What kind of storage and processing does Hadoop support?
Answer
Correct Answer:
Distributed
Note: This Question is unanswered, help us to find answer for this one
3. Which file system does Hadoop use for storage?
Answer
Correct Answer:
HDFS
Note: This Question is unanswered, help us to find answer for this one
4. Hadoop Common is written in which language?
Answer
Correct Answer:
Java
Note: This Question is unanswered, help us to find answer for this one
5. Which feature is used to roll back a corrupted HDFS instance to a previously known good point in time?
Answer
Correct Answer:
Snapshot
Note: This Question is unanswered, help us to find answer for this one
6. To view the execution details of an Impala query plan, which function would you use ?
Answer
Correct Answer:
Explain
Note: This Question is unanswered, help us to find answer for this one
7. Which object can be used to distribute jars or libraries for use in MapReduce tasks?
Answer
Correct Answer:
Distributed cache
Note: This Question is unanswered, help us to find answer for this one
8. Hadoop systems are _ RDBMS systems.
Answer
Correct Answer:
Additions for
Note: This Question is unanswered, help us to find answer for this one
9. Where would you configure the size of a block in a Hadoop environment?
Answer
Correct Answer:
Dfs.block.size in hdfs-site.xmls
Note: This Question is unanswered, help us to find answer for this one
10. In a MapReduce job, which phase runs after the Map phase completes?
Answer
Correct Answer:
Combiner
Note: This Question is unanswered, help us to find answer for this one
11. Suppose you are trying to finish a Pig script that converts text in the input string to uppercase. What code is needed on line 2 below? 1 data = LOAD '/user/hue/pig/examples/data/midsummer.txt'... 2
Answer
Correct Answer:
As (text:CHARARRAY); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);
Note: This Question is unanswered, help us to find answer for this one
12. Which type of Hadoop node executes file system namespace operations like opening, closing, and renaming files and directories?
Answer
Correct Answer:
NameNode
Note: This Question is unanswered, help us to find answer for this one
13. MapReduce 1.0 _ YARN
Answer
Correct Answer:
Does not include
Note: This Question is unanswered, help us to find answer for this one
14. _ is the query language, and _ is storage for NoSQL on Hadoop
Answer
Correct Answer:
HQL; HBase
Note: This Question is unanswered, help us to find answer for this one
15. MapReduce applications use which of these classes to report their statistics?
Answer
Correct Answer:
Counter
Note: This Question is unanswered, help us to find answer for this one
16. If no reduction is desired, you should set the numbers of _ tasks to zero
Answer
Correct Answer:
Reduce
Note: This Question is unanswered, help us to find answer for this one
17. What type of software is Hadoop Common?
Answer
Correct Answer:
Distributed computing framework
Note: This Question is unanswered, help us to find answer for this one
18. In MapReduce, _ have _
Answer
Correct Answer:
Jobs; tasks
Note: This Question is unanswered, help us to find answer for this one
19. Hadoop 2.x and later implement which service as the resource coordinator?
Answer
Correct Answer:
YARN
Note: This Question is unanswered, help us to find answer for this one
20. To implement high availability, how many instances of the master node should you configure?
Answer
Correct Answer:
Two or more (https://data-flair.training/blogs/hadoop-high-availability-tutorial)
Note: This Question is unanswered, help us to find answer for this one
21. Which Hive query returns the first 1,000 values?
Answer
Correct Answer:
SELECT … LIMIT 1000
Note: This Question is unanswered, help us to find answer for this one
22. To what does the Mapper map input key/value pairs?
Answer
Correct Answer:
A set of intermediate key/value pairs
Note: This Question is unanswered, help us to find answer for this one
23. In what format does RecordWriter write an output file?
Answer
Correct Answer:
pairs
Note: This Question is unanswered, help us to find answer for this one
24. In the Hadoop system, what administrative mode is used for maintenance?
Answer
Correct Answer:
Safe mode
Note: This Question is unanswered, help us to find answer for this one
25. When implemented on a public cloud, with what does Hadoop processing interact?
Answer
Correct Answer:
Files in object storage
Note: This Question is unanswered, help us to find answer for this one
26. What is the output of the Reducer?
Answer
Correct Answer:
A set of pairs
Note: This Question is unanswered, help us to find answer for this one
27. Which library should you use to perform ETL-type MapReduce jobs?
Answer
Correct Answer:
Pig
Note: This Question is unanswered, help us to find answer for this one
28. A distributed cache file path can originate from what location?
Answer
Correct Answer:
Hdfs or http
Note: This Question is unanswered, help us to find answer for this one
29. HDFS file are of what type?
Answer
Correct Answer:
Append-only
Note: This Question is unanswered, help us to find answer for this one
30. HBase works with which type of schema enforcement?
Answer
Correct Answer:
Schema on read
Note: This Question is unanswered, help us to find answer for this one
31. To connect Hadoop to AWS S3, which client should you use?
Answer
Correct Answer:
S3A
Note: This Question is unanswered, help us to find answer for this one
32. To create a MapReduce job, what should be coded first?
Answer
Correct Answer:
A Job class and instance (NOT SURE)
Note: This Question is unanswered, help us to find answer for this one
33. State _ between the JVMs in a MapReduce job
Answer
Correct Answer:
Is not shared (https://www.lynda.com/Hadoop-tutorials/Understanding-Java-virtual-machines-JVMs/191942/369545-4.html)
Note: This Question is unanswered, help us to find answer for this one
34. If you started the NameNode, then which kind of user must you be?
Answer
Correct Answer:
Super-user
Note: This Question is unanswered, help us to find answer for this one
35. Which library should be used to unit test MapReduce code?
Answer
Correct Answer:
MRUnit
Note: This Question is unanswered, help us to find answer for this one
36. In what form is Reducer output presented?
Answer
Correct Answer:
Compressed (NOT SURE)
Note: This Question is unanswered, help us to find answer for this one
37. Which command imports data to Hadoop from a MySQL database?
Note: This Question is unanswered, help us to find answer for this one
38. Skip bad records provides an option where a certain set of bad input records can be skipped when processing what type of data?
Answer
Correct Answer:
Map inputs
Note: This Question is unanswered, help us to find answer for this one
39. To reference a master file for lookups during Mapping, what type of cache should be used?
Answer
Correct Answer:
Distributed cache
Note: This Question is unanswered, help us to find answer for this one
40. In a MapReduce job, where does the map() function run?
Answer
Correct Answer:
On the data nodes of the cluster (NOT SURE)
Note: This Question is unanswered, help us to find answer for this one
41. Which method is used to implement Spark jobs?
Answer
Correct Answer:
In memory of all workers
Note: This Question is unanswered, help us to find answer for this one
42. DataNode supports which type of drives?
Answer
Correct Answer:
Hot swappable
Note: This Question is unanswered, help us to find answer for this one
43. For high availability, use multiple nodes of which type?
Answer
Correct Answer:
Name
Note: This Question is unanswered, help us to find answer for this one
44. To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the ___ service, which is ___.
Answer
Correct Answer:
Zookeeper; open source
Note: This Question is unanswered, help us to find answer for this one
45. What are the primary phases of a Reducer?
Answer
Correct Answer:
Shuffle, sort, and reduce
Note: This Question is unanswered, help us to find answer for this one
46. Hadoop Core supports which CAP capabilities?
Answer
Correct Answer:
A, P
Note: This Question is unanswered, help us to find answer for this one
47. To get the total number of mapped input records in a map job task, you should review the value of which counter?
Answer
Correct Answer:
TaskCounter (NOT SURE)
Note: This Question is unanswered, help us to find answer for this one
48. Which line of code implements a Reducer method in MapReduce 2.0?
Answer
Correct Answer:
Public void reduce(Text key, Iterator values, Context context){…}
Note: This Question is unanswered, help us to find answer for this one
49. To verify job status, look for the value ___ in the ___.
Answer
Correct Answer:
SUCCEEDED; stdout
Note: This Question is unanswered, help us to find answer for this one
50. To perform local aggregation of the intermediate outputs, MapReduce users can optionally specify which object?
Answer
Correct Answer:
Combiner
Note: This Question is unanswered, help us to find answer for this one
51. MapReduce jobs can be written in which language?
Answer
Correct Answer:
Java or Python
Note: This Question is unanswered, help us to find answer for this one
52. Hadoop Auth enforces authentication on protected resources. Once authentication has been established, it sets what type of authenticating cookie?
Answer
Correct Answer:
Signed HTTP
Note: This Question is unanswered, help us to find answer for this one
53. Rather than adding a Secondary Sort to a slow Reduce job, it is Hadoop best practice to perform which optimization?
Answer
Correct Answer:
Add a partitioned shuffle to the Reduce job.
Note: This Question is unanswered, help us to find answer for this one
54. SQL Windowing functions are implemented in Hive using which keywords?
Answer
Correct Answer:
OVER, RANK
Note: This Question is unanswered, help us to find answer for this one
55. Partitioner controls the partitioning of what data?
Answer
Correct Answer:
Intermediate keys
Note: This Question is unanswered, help us to find answer for this one
56. The hadoop framework consists of the ________ algorithm to solve large scale problems.
Answer
Correct Answer:
MapReduce
Note: This Question is unanswered, help us to find answer for this one
57. Which of the following operators is used for un-nesting the nested tuples and bags?
Answer
Correct Answer:
FLATTEN
Note: This Question is unanswered, help us to find answer for this one
58. Which of the following is the correct syntax for resetting the space quota for directories in HDFS?
Note: This Question is unanswered, help us to find answer for this one
68. Which of the following functions is NOT performed by the InputFormat class for a MapReduce Hadoop job?
Answer
Correct Answer:
It presents a record view of the data to the Map task and reads from an InputSplit class.
Note: This Question is unanswered, help us to find answer for this one
69. Which of the following is the Hadoop directory service that stores the metadata related to the files present in the cluster storage?
Answer
Correct Answer:
NameNode
Note: This Question is unanswered, help us to find answer for this one
70. Which of the following are the required command line arguments for the oev command of HDFS?
Answer
Correct Answer:
-i, --inputFile arg
Note: This Question is unanswered, help us to find answer for this one
71. Which of the following HDFS shell commands is used for setting a group for a particular file or directory?
Answer
Correct Answer:
chown
Note: This Question is unanswered, help us to find answer for this one
72. While configuring HTTP authentication in Hadoop, which of the following is set as the value of the "hadoop.http.filter.initializers" property?
Answer
Correct Answer:
org.apache.hadoop.security.AuthenticationInitializer class name
Note: This Question is unanswered, help us to find answer for this one
73. What is the default value of the following security configuration property of the YARN architecture? yarn.timeline-service.delegation.token.renew-interval
Answer
Correct Answer:
1 day
Note: This Question is unanswered, help us to find answer for this one
74. In case of service-level authorization in Hadoop, which of the following properties is used for determining the ACEs used for granting permissions for the DataNodes to communicate and access the NameNode?
Answer
Correct Answer:
security.datanode.protocol.acl
Note: This Question is unanswered, help us to find answer for this one
75. Which of the following is the correct syntax for the docs Maven profile that is used for creating documentation in Hadoop Auth?
Answer
Correct Answer:
$ mvn package − Pdocs
Note: This Question is unanswered, help us to find answer for this one
76. Which of the following Pig commands is/are used for sampling a data and applying a query on it?
Answer
Correct Answer:
ILLUSTRATE
Note: This Question is unanswered, help us to find answer for this one
77. Which of the following HDFS commands is used for setting an extended attribute name and value for a file or a directory?
Answer
Correct Answer:
setfattr
Note: This Question is unanswered, help us to find answer for this one
78. Which of the following configuration properties of YARNs Resource Manager is used for specifying the host:port for clients to submit jobs?
Note: This Question is unanswered, help us to find answer for this one
79. Which of the given options is the correct function of the following HiveQL command? !
Answer
Correct Answer:
It is used for executing a shell command from the Hive shell.
Note: This Question is unanswered, help us to find answer for this one
80. Which of the following HiveQL commands is used for printing the list of configuration variables overridden by Hive or the user?
Answer
Correct Answer:
set
Note: This Question is unanswered, help us to find answer for this one
81. In order to execute a custom, user-built JAR file, the jar command is used. Which of the following is the correct syntax of this command?
Answer
Correct Answer:
yarn jar <jar file path> [main class name] [arguments…]
Note: This Question is unanswered, help us to find answer for this one
82. Which of the following commands is used for creating a keytab file used in Kerberos authentication?
Answer
Correct Answer:
ktutil
Note: This Question is unanswered, help us to find answer for this one
83. Which of the following join operations are NOT supported by Hive?
Answer
Correct Answer:
Theta join
Note: This Question is unanswered, help us to find answer for this one
84. Which of the following permission levels is NOT allowed in HDFS authorization?
Answer
Correct Answer:
Execute
Note: This Question is unanswered, help us to find answer for this one
85. For a file named abc, which of the following Hadoop commands is used for setting all the permissions for owner, setting read permissions for group, and setting no permissions for other users in the system?
Answer
Correct Answer:
hadoop fs −chmod 740 abc
Note: This Question is unanswered, help us to find answer for this one
86. Which of the following are the advantages of the disk-level encryption in HDFS?
Answer
Correct Answer:
It provides high performance.
Note: This Question is unanswered, help us to find answer for this one
87. Which of the following commands is used for setting an environment variable in a streaming command?
Note: This Question is unanswered, help us to find answer for this one
88. Which two of the following parameters of the Hadoop streaming command are optional?
Answer
Correct Answer:
−cmdenv name = value
Note: This Question is unanswered, help us to find answer for this one
89. Which of the following statements is correct about YARNs Web Application Proxy?
Answer
Correct Answer:
It strips the cookies from the user and replaces them with a single cookie, providing the user name of the logged in user.
Note: This Question is unanswered, help us to find answer for this one
90. Which two of the following are the correct differences between MapReduce and traditional RDBMS?
Answer
Correct Answer:
In MapReduce, the read operation can be performed many times but the write operation can be performed only once. In traditional RDBMS, both read and write operations can be performed many times.
Note: This Question is unanswered, help us to find answer for this one
91. What is the function of the following Configuration property of YARNs Resource Manager? yarn.resourcemanager.ha.id
Answer
Correct Answer:
It is used for identifying the Resource Manager in ensemble.
Note: This Question is unanswered, help us to find answer for this one
92. Consider an input file named abc.dat.txt with default block size 128 MB. Which of the following is the correct command that will upload this file into an HDFS, with a block size of 512 MB?
Note: This Question is unanswered, help us to find answer for this one
93. Which of the following Hadoop commands is used for creating a file of zero length?
Answer
Correct Answer:
touchz
Note: This Question is unanswered, help us to find answer for this one
94. Which of the following statements is correct about the Hadoop file system namespace?
Answer
Correct Answer:
User access permission is not implemented in HDFS.
Note: This Question is unanswered, help us to find answer for this one
95. Suppose you need to select a storage system that supports Resource Manager High Availability (RM HA). Which of the following types of storage should be selected in this case?
Answer
Correct Answer:
Zookeeper based state-store
Note: This Question is unanswered, help us to find answer for this one
96. Which of the following is used for changing the group of a file?
Note: This Question is unanswered, help us to find answer for this one
99. Which of the following operators is necessary to be used for theta-join?
Answer
Correct Answer:
Cross
Note: This Question is unanswered, help us to find answer for this one
100. Which of the following are the characteristics of the UNION operator of Pig?
Answer
Correct Answer:
It does not impose any restrictions on the schema of the two datasets that are being concatenated.
Note: This Question is unanswered, help us to find answer for this one
101. Which of the following functions is performed by the Scheduler of Resource Manager in the YARN architecture?
Answer
Correct Answer:
It allocates resources to the applications running in the cluster.
Note: This Question is unanswered, help us to find answer for this one
102. Which of the given Pig data types has the following characteristics? i) It is a collection of data values. ii) These data values are ordered and have a fixed length.
Answer
Correct Answer:
Tuple
Note: This Question is unanswered, help us to find answer for this one
103. In which of the following Pig execution modes, a Java program is capable of invoking Pig commands by importing the Pig libraries?
Answer
Correct Answer:
Embedded mode
Note: This Question is unanswered, help us to find answer for this one
104. Which of the given options is the function of the following Hadoop command? -a
Answer
Correct Answer:
It is used for checking whether all the libraries are available.
Note: This Question is unanswered, help us to find answer for this one
105. Which of the following Hive commands is used for creating a database named MyData?
Answer
Correct Answer:
CREATE DATABASE MyData
Note: This Question is unanswered, help us to find answer for this one
106. Which of the following Hive clauses should be used for imposing a total order on the query results?
Answer
Correct Answer:
SORT BY
Note: This Question is unanswered, help us to find answer for this one
107. Which of the following interfaces can decrease the amount of memory required by UDFs, by accepting the input in chunks?
Answer
Correct Answer:
Accumulator interface
Note: This Question is unanswered, help us to find answer for this one
108. Before being inserted to a disk, the intermediate output records emitted by the Map task are buffered in the local memory, using a circular buffer. Which of the following properties is used to configure the size of this circular buffer?
Answer
Correct Answer:
mapreduce.task.io.sort.mb
Note: This Question is unanswered, help us to find answer for this one
109. In Hadoop, HDFS snapshots are taken for which of the following reasons? i) For providing protection against user error. ii) For providing backup. iii) For disaster recovery. iv) For copying data from data nodes.
Answer
Correct Answer:
Only i), ii), and iii)
Note: This Question is unanswered, help us to find answer for this one
110. What is the function of the following Hadoop command? du
Answer
Correct Answer:
In case of a file, it displays the length of the file, while in case of a directory, it displays the sizes of the files and directories present in that directory.
Note: This Question is unanswered, help us to find answer for this one
111. Which of the following interfaces is used for accessing the Hive metastore?
Answer
Correct Answer:
Thrift
Note: This Question is unanswered, help us to find answer for this one
112. Which of the following commands is used to view the content of a file named /newexample/example1.txt?
Note: This Question is unanswered, help us to find answer for this one
113. Which of the following HDFS commands is used for checking inconsistencies and reporting problems with various files?
Answer
Correct Answer:
fsck
Note: This Question is unanswered, help us to find answer for this one
114. Which of the following environment variables is used for determining the Hadoop cluster for Pig, in order to run the MapReduce jobs?
Answer
Correct Answer:
HADOOP_CONF_DIR
Note: This Question is unanswered, help us to find answer for this one
115. Which of the following is the master process that accepts the job submissions from the clients and schedule the tasks to run on worker nodes?
Answer
Correct Answer:
Jobtracker
Note: This Question is unanswered, help us to find answer for this one
116. In Pig, which of the following types of join operations can be performed using the Replicated join? i) Inner join ii) Outer join iii) Right-outer join iv) Left-outer join
Answer
Correct Answer:
Only i) and iv)
Note: This Question is unanswered, help us to find answer for this one
117. Which of the following is the correct syntax for the command used for setting replication for an existing file in Hadoop WebHDFS REST API?
Answer
Correct Answer:
setReplication(Path src, short replication)
Note: This Question is unanswered, help us to find answer for this one