MCQs > IT & Programming > Hadoop MCQs > Basic Hadoop MCQs

Basic Hadoop MCQ

1. To copy a file into the Hadoop file system, what command should you use?


Correct Answer: Hadoop fs -copyFromLocal

Note: This Question is unanswered, help us to find answer for this one

2. What kind of storage and processing does Hadoop support?


Correct Answer: Distributed

Note: This Question is unanswered, help us to find answer for this one

3. Which file system does Hadoop use for storage?


Correct Answer: HDFS

Note: This Question is unanswered, help us to find answer for this one

4. Hadoop Common is written in which language?


Correct Answer: Java

Note: This Question is unanswered, help us to find answer for this one

5. Which feature is used to roll back a corrupted HDFS instance to a previously known good point in time?


Correct Answer: Snapshot

Note: This Question is unanswered, help us to find answer for this one

6. To view the execution details of an Impala query plan, which function would you use ?


Correct Answer: Explain

Note: This Question is unanswered, help us to find answer for this one

7. Which object can be used to distribute jars or libraries for use in MapReduce tasks?


Correct Answer: Distributed cache

Note: This Question is unanswered, help us to find answer for this one

8. Hadoop systems are _ RDBMS systems.


Correct Answer: Additions for

Note: This Question is unanswered, help us to find answer for this one

9. Where would you configure the size of a block in a Hadoop environment?


Correct Answer: Dfs.block.size in hdfs-site.xmls

Note: This Question is unanswered, help us to find answer for this one

10. In a MapReduce job, which phase runs after the Map phase completes?


Correct Answer: Combiner

Note: This Question is unanswered, help us to find answer for this one

11. Suppose you are trying to finish a Pig script that converts text in the input string to uppercase. What code is needed on line 2 below? 1 data = LOAD '/user/hue/pig/examples/data/midsummer.txt'... 2


Correct Answer: As (text:CHARARRAY); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

Note: This Question is unanswered, help us to find answer for this one

12. Which type of Hadoop node executes file system namespace operations like opening, closing, and renaming files and directories?


Correct Answer: NameNode

Note: This Question is unanswered, help us to find answer for this one

13. MapReduce 1.0 _ YARN


Correct Answer: Does not include

Note: This Question is unanswered, help us to find answer for this one

14. _ is the query language, and _ is storage for NoSQL on Hadoop


Correct Answer: HQL; HBase

Note: This Question is unanswered, help us to find answer for this one

15. MapReduce applications use which of these classes to report their statistics?


Correct Answer: Counter

Note: This Question is unanswered, help us to find answer for this one

16. If no reduction is desired, you should set the numbers of _ tasks to zero


Correct Answer: Reduce

Note: This Question is unanswered, help us to find answer for this one

17. What type of software is Hadoop Common?


Correct Answer: Distributed computing framework

Note: This Question is unanswered, help us to find answer for this one

18. In MapReduce, _ have _


Correct Answer: Jobs; tasks

Note: This Question is unanswered, help us to find answer for this one

19. Hadoop 2.x and later implement which service as the resource coordinator?


Correct Answer: YARN

Note: This Question is unanswered, help us to find answer for this one

20. To implement high availability, how many instances of the master node should you configure?


Correct Answer: Two or more (

Note: This Question is unanswered, help us to find answer for this one

21. Which Hive query returns the first 1,000 values?


Correct Answer: SELECT … LIMIT 1000

Note: This Question is unanswered, help us to find answer for this one

22. To what does the Mapper map input key/value pairs?


Correct Answer: A set of intermediate key/value pairs

Note: This Question is unanswered, help us to find answer for this one

23. In what format does RecordWriter write an output file?


Correct Answer: pairs

Note: This Question is unanswered, help us to find answer for this one

24. In the Hadoop system, what administrative mode is used for maintenance?


Correct Answer: Safe mode

Note: This Question is unanswered, help us to find answer for this one

25. When implemented on a public cloud, with what does Hadoop processing interact?


Correct Answer: Files in object storage

Note: This Question is unanswered, help us to find answer for this one

26. What is the output of the Reducer?


Correct Answer: A set of pairs

Note: This Question is unanswered, help us to find answer for this one

27. Which library should you use to perform ETL-type MapReduce jobs?


Correct Answer: Pig

Note: This Question is unanswered, help us to find answer for this one

28. A distributed cache file path can originate from what location?


Correct Answer: Hdfs or http

Note: This Question is unanswered, help us to find answer for this one

29. HDFS file are of what type?


Correct Answer: Append-only

Note: This Question is unanswered, help us to find answer for this one

30. HBase works with which type of schema enforcement?


Correct Answer: Schema on read

Note: This Question is unanswered, help us to find answer for this one

31. To connect Hadoop to AWS S3, which client should you use?


Correct Answer: S3A

Note: This Question is unanswered, help us to find answer for this one

32. To create a MapReduce job, what should be coded first?


Correct Answer: A Job class and instance (NOT SURE)

Note: This Question is unanswered, help us to find answer for this one

33. State _ between the JVMs in a MapReduce job


Correct Answer: Is not shared (

Note: This Question is unanswered, help us to find answer for this one

34. If you started the NameNode, then which kind of user must you be?


Correct Answer: Super-user

Note: This Question is unanswered, help us to find answer for this one

35. Which library should be used to unit test MapReduce code?


Correct Answer: MRUnit

Note: This Question is unanswered, help us to find answer for this one

36. In what form is Reducer output presented?


Correct Answer: Compressed (NOT SURE)

Note: This Question is unanswered, help us to find answer for this one

37. Which command imports data to Hadoop from a MySQL database?


Correct Answer: Sqoop import --connect jdbc:mysql:// --username sqoop --password sqoop --warehouse-dir user/hue/oozie/deployments/sqoop

Note: This Question is unanswered, help us to find answer for this one

38. Skip bad records provides an option where a certain set of bad input records can be skipped when processing what type of data?


Correct Answer: Map inputs

Note: This Question is unanswered, help us to find answer for this one

39. To reference a master file for lookups during Mapping, what type of cache should be used?


Correct Answer: Distributed cache

Note: This Question is unanswered, help us to find answer for this one

40. In a MapReduce job, where does the map() function run?


Correct Answer: On the data nodes of the cluster (NOT SURE)

Note: This Question is unanswered, help us to find answer for this one

41. Which method is used to implement Spark jobs?


Correct Answer: In memory of all workers

Note: This Question is unanswered, help us to find answer for this one

42. DataNode supports which type of drives?


Correct Answer: Hot swappable

Note: This Question is unanswered, help us to find answer for this one

43. For high availability, use multiple nodes of which type?


Correct Answer: Name

Note: This Question is unanswered, help us to find answer for this one

44. To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the ___ service, which is ___.


Correct Answer: Zookeeper; open source

Note: This Question is unanswered, help us to find answer for this one

45. What are the primary phases of a Reducer?


Correct Answer: Shuffle, sort, and reduce

Note: This Question is unanswered, help us to find answer for this one

46. Hadoop Core supports which CAP capabilities?


Correct Answer: A, P

Note: This Question is unanswered, help us to find answer for this one

47. To get the total number of mapped input records in a map job task, you should review the value of which counter?


Correct Answer: TaskCounter (NOT SURE)

Note: This Question is unanswered, help us to find answer for this one

48. Which line of code implements a Reducer method in MapReduce 2.0?


Correct Answer: Public void reduce(Text key, Iterator values, Context context){…}

Note: This Question is unanswered, help us to find answer for this one

49. To verify job status, look for the value ___ in the ___.


Correct Answer: SUCCEEDED; stdout

Note: This Question is unanswered, help us to find answer for this one

50. To perform local aggregation of the intermediate outputs, MapReduce users can optionally specify which object?


Correct Answer: Combiner

Note: This Question is unanswered, help us to find answer for this one

51. MapReduce jobs can be written in which language?


Correct Answer: Java or Python

Note: This Question is unanswered, help us to find answer for this one

52. Hadoop Auth enforces authentication on protected resources. Once authentication has been established, it sets what type of authenticating cookie?


Correct Answer: Signed HTTP

Note: This Question is unanswered, help us to find answer for this one

53. Rather than adding a Secondary Sort to a slow Reduce job, it is Hadoop best practice to perform which optimization?


Correct Answer: Add a partitioned shuffle to the Reduce job.

Note: This Question is unanswered, help us to find answer for this one

54. SQL Windowing functions are implemented in Hive using which keywords?


Correct Answer: OVER, RANK

Note: This Question is unanswered, help us to find answer for this one

55. Partitioner controls the partitioning of what data?


Correct Answer: Intermediate keys

Note: This Question is unanswered, help us to find answer for this one

56. The hadoop framework consists of the ________ algorithm to solve large scale problems.


Correct Answer: MapReduce

Note: This Question is unanswered, help us to find answer for this one

57. Which of the following operators is used for un-nesting the nested tuples and bags?


Correct Answer: FLATTEN

Note: This Question is unanswered, help us to find answer for this one

58. Which of the following is the correct syntax for resetting the space quota for directories in HDFS?


Correct Answer: hdfs dfsadmin -clrSpaceQuota <dir1>…. <dirn>

Note: This Question is unanswered, help us to find answer for this one

59. Which of the following objects is used by the RecordReader class for reading data from an InputSplit class?


Correct Answer: FSDataInputStream

Note: This Question is unanswered, help us to find answer for this one

60. Which of the following is the correct data type of the "totalMB" element of the clusterMetrics object used in the YARN ResourceManager REST API?


Correct Answer: long

Note: This Question is unanswered, help us to find answer for this one

61. Which of the following are NOT the properties of the app (Application) object of NodeManager REST API?


Correct Answer: containers

Note: This Question is unanswered, help us to find answer for this one

62. Which of the following Hadoop DFSAdmin commands generates a list of DataNodes?


Correct Answer: bin/hdfs dfsadmin −report

Note: This Question is unanswered, help us to find answer for this one

63. Which of the following Hadoop commands is used for copying a source path to stdout?


Correct Answer: cat 

Note: This Question is unanswered, help us to find answer for this one

64. In case of a Multiquery execution, which of the following return codes indicates retrievable errors for an execution?


Correct Answer: 1

Note: This Question is unanswered, help us to find answer for this one

65. What is the default value of the hadoop.http.authentication.token.validity property that is used in case of authentication via HTTP interface?


Correct Answer: 36,000 seconds

Note: This Question is unanswered, help us to find answer for this one

66. While accessing a Hadoop Auth protocol URL using curl, which of the following command options are used for storing and sending HTTP cookies?


Correct Answer: −b and −c

Note: This Question is unanswered, help us to find answer for this one

67. Which of the following commands is used for distributing an excluded file to all the Namenodes?


Correct Answer: [hdfs]$ $HADOOP_PREFIX/sbin/ <exclude_file>

Note: This Question is unanswered, help us to find answer for this one

68. Which of the following functions is NOT performed by the InputFormat class for a MapReduce Hadoop job?


Correct Answer: It presents a record view of the data to the Map task and reads from an InputSplit class.

Note: This Question is unanswered, help us to find answer for this one

69. Which of the following is the Hadoop directory service that stores the metadata related to the files present in the cluster storage?


Correct Answer: NameNode

Note: This Question is unanswered, help us to find answer for this one

70. Which of the following are the required command line arguments for the oev command of HDFS?


Correct Answer: -i, --inputFile arg

Note: This Question is unanswered, help us to find answer for this one

71. Which of the following HDFS shell commands is used for setting a group for a particular file or directory?


Correct Answer: chown

Note: This Question is unanswered, help us to find answer for this one

72. While configuring HTTP authentication in Hadoop, which of the following is set as the value of the "hadoop.http.filter.initializers" property?


Correct Answer: class name

Note: This Question is unanswered, help us to find answer for this one

73. What is the default value of the following security configuration property of the YARN architecture?


Correct Answer: 1 day

Note: This Question is unanswered, help us to find answer for this one

74. In case of service-level authorization in Hadoop, which of the following properties is used for determining the ACEs used for granting permissions for the DataNodes to communicate and access the NameNode?


Correct Answer: security.datanode.protocol.acl

Note: This Question is unanswered, help us to find answer for this one

75. Which of the following is the correct syntax for the docs Maven profile that is used for creating documentation in Hadoop Auth?


Correct Answer: $ mvn package − Pdocs

Note: This Question is unanswered, help us to find answer for this one

76. Which of the following Pig commands is/are used for sampling a data and applying a query on it?


Correct Answer: ILLUSTRATE

Note: This Question is unanswered, help us to find answer for this one

77. Which of the following HDFS commands is used for setting an extended attribute name and value for a file or a directory?


Correct Answer: setfattr

Note: This Question is unanswered, help us to find answer for this one

78. Which of the following configuration properties of YARNs Resource Manager is used for specifying the host:port for clients to submit jobs?


Correct Answer: yarn.resourcemanager.address.rm-id

Note: This Question is unanswered, help us to find answer for this one

79. Which of the given options is the correct function of the following HiveQL command?


Correct Answer: It is used for executing a shell command from the Hive shell.

Note: This Question is unanswered, help us to find answer for this one

80. Which of the following HiveQL commands is used for printing the list of configuration variables overridden by Hive or the user?


Correct Answer: set

Note: This Question is unanswered, help us to find answer for this one

81. In order to execute a custom, user-built JAR file, the jar command is used. Which of the following is the correct syntax of this command?


Correct Answer: yarn jar <jar file path> [main class name] [arguments…]

Note: This Question is unanswered, help us to find answer for this one

82. Which of the following commands is used for creating a keytab file used in Kerberos authentication?


Correct Answer: ktutil

Note: This Question is unanswered, help us to find answer for this one

83. Which of the following join operations are NOT supported by Hive?


Correct Answer: Theta join

Note: This Question is unanswered, help us to find answer for this one

84. Which of the following permission levels is NOT allowed in HDFS authorization?


Correct Answer: Execute

Note: This Question is unanswered, help us to find answer for this one

85. For a file named abc, which of the following Hadoop commands is used for setting all the permissions for owner, setting read permissions for group, and setting no permissions for other users in the system?


Correct Answer: hadoop fs −chmod 740 abc

Note: This Question is unanswered, help us to find answer for this one

86. Which of the following are the advantages of the disk-level encryption in HDFS?


Correct Answer: It provides high performance.

Note: This Question is unanswered, help us to find answer for this one

87. Which of the following commands is used for setting an environment variable in a streaming command?


Correct Answer: -cmdenv ABC = /home/example/dictionaries/ 

Note: This Question is unanswered, help us to find answer for this one

88. Which two of the following parameters of the Hadoop streaming command are optional?


Correct Answer: −cmdenv name = value

Note: This Question is unanswered, help us to find answer for this one

89. Which of the following statements is correct about YARNs Web Application Proxy?


Correct Answer: It strips the cookies from the user and replaces them with a single cookie, providing the user name of the logged in user.

Note: This Question is unanswered, help us to find answer for this one

90. Which two of the following are the correct differences between MapReduce and traditional RDBMS?


Correct Answer: In MapReduce, the read operation can be performed many times but the write operation can be performed only once. In traditional RDBMS, both read and write operations can be performed many times.

Note: This Question is unanswered, help us to find answer for this one

91. What is the function of the following Configuration property of YARNs Resource Manager?


Correct Answer: It is used for identifying the Resource Manager in ensemble. 

Note: This Question is unanswered, help us to find answer for this one

92. Consider an input file named abc.dat.txt with default block size 128 MB. Which of the following is the correct command that will upload this file into an HDFS, with a block size of 512 MB?


Correct Answer: hadoop fs -D dfs.blocksize=536870912 -put abc.dat.txt abc.dat.newblock.txt

Note: This Question is unanswered, help us to find answer for this one

93. Which of the following Hadoop commands is used for creating a file of zero length?


Correct Answer: touchz

Note: This Question is unanswered, help us to find answer for this one

94. Which of the following statements is correct about the Hadoop file system namespace?


Correct Answer: User access permission is not implemented in HDFS.

Note: This Question is unanswered, help us to find answer for this one

95. Suppose you need to select a storage system that supports Resource Manager High Availability (RM HA). Which of the following types of storage should be selected in this case?


Correct Answer: Zookeeper based state-store

Note: This Question is unanswered, help us to find answer for this one

96. Which of the following is used for changing the group of a file?


Correct Answer: hdfs chgrp [-R] <group> <filepath>

Note: This Question is unanswered, help us to find answer for this one

97. Which of the following statements is correct about the Hive joins?


Correct Answer: In Hive, more than two tables can be joined.

Note: This Question is unanswered, help us to find answer for this one

98. Which of the following is the correct line command syntax of the Hadoop streaming command?


Correct Answer: hadoop command [genericOptions] [streamingOptions]

Note: This Question is unanswered, help us to find answer for this one

99. Which of the following operators is necessary to be used for theta-join?


Correct Answer: Cross

Note: This Question is unanswered, help us to find answer for this one

100. Which of the following are the characteristics of the UNION operator of Pig?


Correct Answer: It does not impose any restrictions on the schema of the two datasets that are being concatenated.

Note: This Question is unanswered, help us to find answer for this one

101. Which of the following functions is performed by the Scheduler of Resource Manager in the YARN architecture?


Correct Answer: It allocates resources to the applications running in the cluster.

Note: This Question is unanswered, help us to find answer for this one

102. Which of the given Pig data types has the following characteristics?
i) It is a collection of data values.
ii) These data values are ordered and have a fixed length.


Correct Answer: Tuple 

Note: This Question is unanswered, help us to find answer for this one

103. In which of the following Pig execution modes, a Java program is capable of invoking Pig commands by importing the Pig libraries?


Correct Answer: Embedded mode

Note: This Question is unanswered, help us to find answer for this one

104. Which of the given options is the function of the following Hadoop command?


Correct Answer: It is used for checking whether all the libraries are available.

Note: This Question is unanswered, help us to find answer for this one

105. Which of the following Hive commands is used for creating a database named MyData?


Correct Answer: CREATE DATABASE MyData

Note: This Question is unanswered, help us to find answer for this one

106. Which of the following Hive clauses should be used for imposing a total order on the query results?


Correct Answer: SORT BY

Note: This Question is unanswered, help us to find answer for this one

107. Which of the following interfaces can decrease the amount of memory required by UDFs, by accepting the input in chunks?


Correct Answer: Accumulator interface 

Note: This Question is unanswered, help us to find answer for this one

108. Before being inserted to a disk, the intermediate output records emitted by the Map task are buffered in the local memory, using a circular buffer. Which of the following properties is used to configure the size of this circular buffer?


Correct Answer:

Note: This Question is unanswered, help us to find answer for this one

109. In Hadoop, HDFS snapshots are taken for which of the following reasons?
i) For providing protection against user error.
ii) For providing backup.
iii) For disaster recovery.
iv) For copying data from data nodes.


Correct Answer: Only i), ii), and iii)

Note: This Question is unanswered, help us to find answer for this one

110. What is the function of the following Hadoop command?


Correct Answer: In case of a file, it displays the length of the file, while in case of a directory, it displays the sizes of the files and directories present in that directory.

Note: This Question is unanswered, help us to find answer for this one

111. Which of the following interfaces is used for accessing the Hive metastore?


Correct Answer: Thrift

Note: This Question is unanswered, help us to find answer for this one

112. Which of the following commands is used to view the content of a file named /newexample/example1.txt?


Correct Answer: bin/hadoop dfs -cat /newexample/example1.txt

Note: This Question is unanswered, help us to find answer for this one

113. Which of the following HDFS commands is used for checking inconsistencies and reporting problems with various files?


Correct Answer: fsck 

Note: This Question is unanswered, help us to find answer for this one

114. Which of the following environment variables is used for determining the Hadoop cluster for Pig, in order to run the MapReduce jobs?


Correct Answer: HADOOP_CONF_DIR

Note: This Question is unanswered, help us to find answer for this one

115. Which of the following is the master process that accepts the job submissions from the clients and schedule the tasks to run on worker nodes?


Correct Answer: Jobtracker

Note: This Question is unanswered, help us to find answer for this one

116. In Pig, which of the following types of join operations can be performed using the Replicated join?
i) Inner join
ii) Outer join
iii) Right-outer join
iv) Left-outer join


Correct Answer: Only i) and iv)

Note: This Question is unanswered, help us to find answer for this one

117. Which of the following is the correct syntax for the command used for setting replication for an existing file in Hadoop WebHDFS REST API?


Correct Answer: setReplication(Path src, short replication) 

Note: This Question is unanswered, help us to find answer for this one

Hadoop Subjects