MCQs > IT & Programming > Hadoop MCQs > Basic Hadoop MCQs

Basic Hadoop MCQ

1. To copy a file into the Hadoop file system, what command should you use?

Answer

Correct Answer: Hadoop fs -copyFromLocal

Note: This Question is unanswered, help us to find answer for this one

2. What kind of storage and processing does Hadoop support?

Answer

Correct Answer: Distributed

Note: This Question is unanswered, help us to find answer for this one

3. Which file system does Hadoop use for storage?

Answer

Correct Answer: HDFS

Note: This Question is unanswered, help us to find answer for this one

4. Hadoop Common is written in which language?

Answer

Correct Answer: Java

Note: This Question is unanswered, help us to find answer for this one

5. Which feature is used to roll back a corrupted HDFS instance to a previously known good point in time?

Answer

Correct Answer: Snapshot

Note: This Question is unanswered, help us to find answer for this one

6. To view the execution details of an Impala query plan, which function would you use ?

Answer

Correct Answer: Explain

Note: This Question is unanswered, help us to find answer for this one

7. Which object can be used to distribute jars or libraries for use in MapReduce tasks?

Answer

Correct Answer: Distributed cache

Note: This Question is unanswered, help us to find answer for this one

8. Hadoop systems are _ RDBMS systems.

Answer

Correct Answer: Additions for

Note: This Question is unanswered, help us to find answer for this one

9. Where would you configure the size of a block in a Hadoop environment?

Answer

Correct Answer: Dfs.block.size in hdfs-site.xmls

Note: This Question is unanswered, help us to find answer for this one

10. In a MapReduce job, which phase runs after the Map phase completes?

Answer

Correct Answer: Combiner

Note: This Question is unanswered, help us to find answer for this one

11. Suppose you are trying to finish a Pig script that converts text in the input string to uppercase. What code is needed on line 2 below? 1 data = LOAD '/user/hue/pig/examples/data/midsummer.txt'... 2

Answer

Correct Answer: As (text:CHARARRAY); upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(TEXT);

Note: This Question is unanswered, help us to find answer for this one

12. Which type of Hadoop node executes file system namespace operations like opening, closing, and renaming files and directories?

Answer

Correct Answer: NameNode

Note: This Question is unanswered, help us to find answer for this one

13. MapReduce 1.0 _ YARN

Answer

Correct Answer: Does not include

Note: This Question is unanswered, help us to find answer for this one

14. _ is the query language, and _ is storage for NoSQL on Hadoop

Answer

Correct Answer: HQL; HBase

Note: This Question is unanswered, help us to find answer for this one

15. MapReduce applications use which of these classes to report their statistics?

Answer

Correct Answer: Counter

Note: This Question is unanswered, help us to find answer for this one

16. If no reduction is desired, you should set the numbers of _ tasks to zero

Answer

Correct Answer: Reduce

Note: This Question is unanswered, help us to find answer for this one

17. What type of software is Hadoop Common?

Answer

Correct Answer: Distributed computing framework

Note: This Question is unanswered, help us to find answer for this one

18. In MapReduce, _ have _

Answer

Correct Answer: Jobs; tasks

Note: This Question is unanswered, help us to find answer for this one

19. Hadoop 2.x and later implement which service as the resource coordinator?

Answer

Correct Answer: YARN

Note: This Question is unanswered, help us to find answer for this one

20. To implement high availability, how many instances of the master node should you configure?

Answer

Correct Answer: Two or more (https://data-flair.training/blogs/hadoop-high-availability-tutorial)

Note: This Question is unanswered, help us to find answer for this one

21. Which Hive query returns the first 1,000 values?

Answer

Correct Answer: SELECT … LIMIT 1000

Note: This Question is unanswered, help us to find answer for this one

22. To what does the Mapper map input key/value pairs?

Answer

Correct Answer: A set of intermediate key/value pairs

Note: This Question is unanswered, help us to find answer for this one

23. In what format does RecordWriter write an output file?

Answer

Correct Answer: pairs

Note: This Question is unanswered, help us to find answer for this one

24. In the Hadoop system, what administrative mode is used for maintenance?

Answer

Correct Answer: Safe mode

Note: This Question is unanswered, help us to find answer for this one

25. When implemented on a public cloud, with what does Hadoop processing interact?

Answer

Correct Answer: Files in object storage

Note: This Question is unanswered, help us to find answer for this one

26. What is the output of the Reducer?

Answer

Correct Answer: A set of pairs

Note: This Question is unanswered, help us to find answer for this one

27. Which library should you use to perform ETL-type MapReduce jobs?

Answer

Correct Answer: Pig

Note: This Question is unanswered, help us to find answer for this one

28. A distributed cache file path can originate from what location?

Answer

Correct Answer: Hdfs or http

Note: This Question is unanswered, help us to find answer for this one

29. HDFS file are of what type?

Answer

Correct Answer: Append-only

Note: This Question is unanswered, help us to find answer for this one

30. HBase works with which type of schema enforcement?

Answer

Correct Answer: Schema on read

Note: This Question is unanswered, help us to find answer for this one

31. To connect Hadoop to AWS S3, which client should you use?

Answer

Correct Answer: S3A

Note: This Question is unanswered, help us to find answer for this one

32. To create a MapReduce job, what should be coded first?

Answer

Correct Answer: A Job class and instance (NOT SURE)

Note: This Question is unanswered, help us to find answer for this one

33. State _ between the JVMs in a MapReduce job

Answer

Correct Answer: Is not shared (https://www.lynda.com/Hadoop-tutorials/Understanding-Java-virtual-machines-JVMs/191942/369545-4.html)

Note: This Question is unanswered, help us to find answer for this one

34. If you started the NameNode, then which kind of user must you be?

Answer

Correct Answer: Super-user

Note: This Question is unanswered, help us to find answer for this one

35. Which library should be used to unit test MapReduce code?

Answer

Correct Answer: MRUnit

Note: This Question is unanswered, help us to find answer for this one

36. In what form is Reducer output presented?

Answer

Correct Answer: Compressed (NOT SURE)

Note: This Question is unanswered, help us to find answer for this one

37. Which command imports data to Hadoop from a MySQL database?

Answer

Correct Answer: Sqoop import --connect jdbc:mysql://mysql.example.com/sqoop --username sqoop --password sqoop --warehouse-dir user/hue/oozie/deployments/sqoop

Note: This Question is unanswered, help us to find answer for this one

38. Skip bad records provides an option where a certain set of bad input records can be skipped when processing what type of data?

Answer

Correct Answer: Map inputs

Note: This Question is unanswered, help us to find answer for this one

39. To reference a master file for lookups during Mapping, what type of cache should be used?

Answer

Correct Answer: Distributed cache

Note: This Question is unanswered, help us to find answer for this one

40. In a MapReduce job, where does the map() function run?

Answer

Correct Answer: On the data nodes of the cluster (NOT SURE)

Note: This Question is unanswered, help us to find answer for this one

41. Which method is used to implement Spark jobs?

Answer

Correct Answer: In memory of all workers

Note: This Question is unanswered, help us to find answer for this one

42. DataNode supports which type of drives?

Answer

Correct Answer: Hot swappable

Note: This Question is unanswered, help us to find answer for this one

43. For high availability, use multiple nodes of which type?

Answer

Correct Answer: Name

Note: This Question is unanswered, help us to find answer for this one

44. To set up Hadoop workflow with synchronization of data between jobs that process tasks both on disk and in memory, use the ___ service, which is ___.

Answer

Correct Answer: Zookeeper; open source

Note: This Question is unanswered, help us to find answer for this one

45. What are the primary phases of a Reducer?

Answer

Correct Answer: Shuffle, sort, and reduce

Note: This Question is unanswered, help us to find answer for this one

46. Hadoop Core supports which CAP capabilities?

Answer

Correct Answer: A, P

Note: This Question is unanswered, help us to find answer for this one

47. To get the total number of mapped input records in a map job task, you should review the value of which counter?

Answer

Correct Answer: TaskCounter (NOT SURE)

Note: This Question is unanswered, help us to find answer for this one

48. Which line of code implements a Reducer method in MapReduce 2.0?

Answer

Correct Answer: Public void reduce(Text key, Iterator values, Context context){…}

Note: This Question is unanswered, help us to find answer for this one

49. To verify job status, look for the value ___ in the ___.

Answer

Correct Answer: SUCCEEDED; stdout

Note: This Question is unanswered, help us to find answer for this one

50. To perform local aggregation of the intermediate outputs, MapReduce users can optionally specify which object?

Answer

Correct Answer: Combiner

Note: This Question is unanswered, help us to find answer for this one

51. MapReduce jobs can be written in which language?

Answer

Correct Answer: Java or Python

Note: This Question is unanswered, help us to find answer for this one

52. Hadoop Auth enforces authentication on protected resources. Once authentication has been established, it sets what type of authenticating cookie?

Answer

Correct Answer: Signed HTTP

Note: This Question is unanswered, help us to find answer for this one

53. Rather than adding a Secondary Sort to a slow Reduce job, it is Hadoop best practice to perform which optimization?

Answer

Correct Answer: Add a partitioned shuffle to the Reduce job.

Note: This Question is unanswered, help us to find answer for this one

54. SQL Windowing functions are implemented in Hive using which keywords?

Answer

Correct Answer: OVER, RANK

Note: This Question is unanswered, help us to find answer for this one

55. Partitioner controls the partitioning of what data?

Answer

Correct Answer: Intermediate keys

Note: This Question is unanswered, help us to find answer for this one

56. The hadoop framework consists of the ________ algorithm to solve large scale problems.

Answer

Correct Answer: MapReduce

Note: This Question is unanswered, help us to find answer for this one

57. Which of the following operators is used for un-nesting the nested tuples and bags?

Answer

Correct Answer: FLATTEN

Note: This Question is unanswered, help us to find answer for this one

58. Which of the following is the correct syntax for resetting the space quota for directories in HDFS?

Answer

Correct Answer: hdfs dfsadmin -clrSpaceQuota <dir1>…. <dirn>

Note: This Question is unanswered, help us to find answer for this one

59. Which of the following objects is used by the RecordReader class for reading data from an InputSplit class?

Answer

Correct Answer: FSDataInputStream

Note: This Question is unanswered, help us to find answer for this one

60. Which of the following is the correct data type of the "totalMB" element of the clusterMetrics object used in the YARN ResourceManager REST API?

Answer

Correct Answer: long

Note: This Question is unanswered, help us to find answer for this one

61. Which of the following are NOT the properties of the app (Application) object of NodeManager REST API?

Answer

Correct Answer: containers

Note: This Question is unanswered, help us to find answer for this one

62. Which of the following Hadoop DFSAdmin commands generates a list of DataNodes?

Answer

Correct Answer: bin/hdfs dfsadmin −report

Note: This Question is unanswered, help us to find answer for this one

63. Which of the following Hadoop commands is used for copying a source path to stdout?

Answer

Correct Answer: cat 

Note: This Question is unanswered, help us to find answer for this one

64. In case of a Multiquery execution, which of the following return codes indicates retrievable errors for an execution?

Answer

Correct Answer: 1

Note: This Question is unanswered, help us to find answer for this one

65. What is the default value of the hadoop.http.authentication.token.validity property that is used in case of authentication via HTTP interface?

Answer

Correct Answer: 36,000 seconds

Note: This Question is unanswered, help us to find answer for this one

66. While accessing a Hadoop Auth protocol URL using curl, which of the following command options are used for storing and sending HTTP cookies?

Answer

Correct Answer: −b and −c

Note: This Question is unanswered, help us to find answer for this one

67. Which of the following commands is used for distributing an excluded file to all the Namenodes?

Answer

Correct Answer: [hdfs]$ $HADOOP_PREFIX/sbin/distribute-exclude.sh <exclude_file>

Note: This Question is unanswered, help us to find answer for this one

68. Which of the following functions is NOT performed by the InputFormat class for a MapReduce Hadoop job?

Answer

Correct Answer: It presents a record view of the data to the Map task and reads from an InputSplit class.

Note: This Question is unanswered, help us to find answer for this one

69. Which of the following is the Hadoop directory service that stores the metadata related to the files present in the cluster storage?

Answer

Correct Answer: NameNode

Note: This Question is unanswered, help us to find answer for this one

70. Which of the following are the required command line arguments for the oev command of HDFS?

Answer

Correct Answer: -i, --inputFile arg

Note: This Question is unanswered, help us to find answer for this one

71. Which of the following HDFS shell commands is used for setting a group for a particular file or directory?

Answer

Correct Answer: chown

Note: This Question is unanswered, help us to find answer for this one

72. While configuring HTTP authentication in Hadoop, which of the following is set as the value of the "hadoop.http.filter.initializers" property?

Answer

Correct Answer: org.apache.hadoop.security.AuthenticationInitializer class name

Note: This Question is unanswered, help us to find answer for this one

73. What is the default value of the following security configuration property of the YARN architecture?
yarn.timeline-service.delegation.token.renew-interval

Answer

Correct Answer: 1 day

Note: This Question is unanswered, help us to find answer for this one

74. In case of service-level authorization in Hadoop, which of the following properties is used for determining the ACEs used for granting permissions for the DataNodes to communicate and access the NameNode?

Answer

Correct Answer: security.datanode.protocol.acl

Note: This Question is unanswered, help us to find answer for this one

75. Which of the following is the correct syntax for the docs Maven profile that is used for creating documentation in Hadoop Auth?

Answer

Correct Answer: $ mvn package − Pdocs

Note: This Question is unanswered, help us to find answer for this one

76. Which of the following Pig commands is/are used for sampling a data and applying a query on it?

Answer

Correct Answer: ILLUSTRATE

Note: This Question is unanswered, help us to find answer for this one

77. Which of the following HDFS commands is used for setting an extended attribute name and value for a file or a directory?

Answer

Correct Answer: setfattr

Note: This Question is unanswered, help us to find answer for this one

78. Which of the following configuration properties of YARNs Resource Manager is used for specifying the host:port for clients to submit jobs?

Answer

Correct Answer: yarn.resourcemanager.address.rm-id

Note: This Question is unanswered, help us to find answer for this one

79. Which of the given options is the correct function of the following HiveQL command?
!

Answer

Correct Answer: It is used for executing a shell command from the Hive shell.

Note: This Question is unanswered, help us to find answer for this one

80. Which of the following HiveQL commands is used for printing the list of configuration variables overridden by Hive or the user?

Answer

Correct Answer: set

Note: This Question is unanswered, help us to find answer for this one

81. In order to execute a custom, user-built JAR file, the jar command is used. Which of the following is the correct syntax of this command?

Answer

Correct Answer: yarn jar <jar file path> [main class name] [arguments…]

Note: This Question is unanswered, help us to find answer for this one

82. Which of the following commands is used for creating a keytab file used in Kerberos authentication?

Answer

Correct Answer: ktutil

Note: This Question is unanswered, help us to find answer for this one

83. Which of the following join operations are NOT supported by Hive?

Answer

Correct Answer: Theta join

Note: This Question is unanswered, help us to find answer for this one

84. Which of the following permission levels is NOT allowed in HDFS authorization?

Answer

Correct Answer: Execute

Note: This Question is unanswered, help us to find answer for this one

85. For a file named abc, which of the following Hadoop commands is used for setting all the permissions for owner, setting read permissions for group, and setting no permissions for other users in the system?

Answer

Correct Answer: hadoop fs −chmod 740 abc

Note: This Question is unanswered, help us to find answer for this one

86. Which of the following are the advantages of the disk-level encryption in HDFS?

Answer

Correct Answer: It provides high performance.

Note: This Question is unanswered, help us to find answer for this one

87. Which of the following commands is used for setting an environment variable in a streaming command?

Answer

Correct Answer: -cmdenv ABC = /home/example/dictionaries/ 

Note: This Question is unanswered, help us to find answer for this one

88. Which two of the following parameters of the Hadoop streaming command are optional?

Answer

Correct Answer: −cmdenv name = value

Note: This Question is unanswered, help us to find answer for this one

89. Which of the following statements is correct about YARNs Web Application Proxy?

Answer

Correct Answer: It strips the cookies from the user and replaces them with a single cookie, providing the user name of the logged in user.

Note: This Question is unanswered, help us to find answer for this one

90. Which two of the following are the correct differences between MapReduce and traditional RDBMS?

Answer

Correct Answer: In MapReduce, the read operation can be performed many times but the write operation can be performed only once. In traditional RDBMS, both read and write operations can be performed many times.

Note: This Question is unanswered, help us to find answer for this one

91. What is the function of the following Configuration property of YARNs Resource Manager?
yarn.resourcemanager.ha.id

Answer

Correct Answer: It is used for identifying the Resource Manager in ensemble. 

Note: This Question is unanswered, help us to find answer for this one

92. Consider an input file named abc.dat.txt with default block size 128 MB. Which of the following is the correct command that will upload this file into an HDFS, with a block size of 512 MB?

Answer

Correct Answer: hadoop fs -D dfs.blocksize=536870912 -put abc.dat.txt abc.dat.newblock.txt

Note: This Question is unanswered, help us to find answer for this one

93. Which of the following Hadoop commands is used for creating a file of zero length?

Answer

Correct Answer: touchz

Note: This Question is unanswered, help us to find answer for this one

94. Which of the following statements is correct about the Hadoop file system namespace?

Answer

Correct Answer: User access permission is not implemented in HDFS.

Note: This Question is unanswered, help us to find answer for this one

95. Suppose you need to select a storage system that supports Resource Manager High Availability (RM HA). Which of the following types of storage should be selected in this case?

Answer

Correct Answer: Zookeeper based state-store

Note: This Question is unanswered, help us to find answer for this one

96. Which of the following is used for changing the group of a file?

Answer

Correct Answer: hdfs chgrp [-R] <group> <filepath>

Note: This Question is unanswered, help us to find answer for this one

97. Which of the following statements is correct about the Hive joins?

Answer

Correct Answer: In Hive, more than two tables can be joined.

Note: This Question is unanswered, help us to find answer for this one

98. Which of the following is the correct line command syntax of the Hadoop streaming command?

Answer

Correct Answer: hadoop command [genericOptions] [streamingOptions]

Note: This Question is unanswered, help us to find answer for this one

99. Which of the following operators is necessary to be used for theta-join?

Answer

Correct Answer: Cross

Note: This Question is unanswered, help us to find answer for this one

100. Which of the following are the characteristics of the UNION operator of Pig?

Answer

Correct Answer: It does not impose any restrictions on the schema of the two datasets that are being concatenated.

Note: This Question is unanswered, help us to find answer for this one

101. Which of the following functions is performed by the Scheduler of Resource Manager in the YARN architecture?

Answer

Correct Answer: It allocates resources to the applications running in the cluster.

Note: This Question is unanswered, help us to find answer for this one

102. Which of the given Pig data types has the following characteristics?
i) It is a collection of data values.
ii) These data values are ordered and have a fixed length.

Answer

Correct Answer: Tuple 

Note: This Question is unanswered, help us to find answer for this one

103. In which of the following Pig execution modes, a Java program is capable of invoking Pig commands by importing the Pig libraries?

Answer

Correct Answer: Embedded mode

Note: This Question is unanswered, help us to find answer for this one

104. Which of the given options is the function of the following Hadoop command?
-a

Answer

Correct Answer: It is used for checking whether all the libraries are available.

Note: This Question is unanswered, help us to find answer for this one

105. Which of the following Hive commands is used for creating a database named MyData?

Answer

Correct Answer: CREATE DATABASE MyData

Note: This Question is unanswered, help us to find answer for this one

106. Which of the following Hive clauses should be used for imposing a total order on the query results?

Answer

Correct Answer: SORT BY

Note: This Question is unanswered, help us to find answer for this one

107. Which of the following interfaces can decrease the amount of memory required by UDFs, by accepting the input in chunks?

Answer

Correct Answer: Accumulator interface 

Note: This Question is unanswered, help us to find answer for this one

108. Before being inserted to a disk, the intermediate output records emitted by the Map task are buffered in the local memory, using a circular buffer. Which of the following properties is used to configure the size of this circular buffer?

Answer

Correct Answer: mapreduce.task.io.sort.mb

Note: This Question is unanswered, help us to find answer for this one

109. In Hadoop, HDFS snapshots are taken for which of the following reasons?
i) For providing protection against user error.
ii) For providing backup.
iii) For disaster recovery.
iv) For copying data from data nodes.

Answer

Correct Answer: Only i), ii), and iii)

Note: This Question is unanswered, help us to find answer for this one

110. What is the function of the following Hadoop command?
du

Answer

Correct Answer: In case of a file, it displays the length of the file, while in case of a directory, it displays the sizes of the files and directories present in that directory.

Note: This Question is unanswered, help us to find answer for this one

111. Which of the following interfaces is used for accessing the Hive metastore?

Answer

Correct Answer: Thrift

Note: This Question is unanswered, help us to find answer for this one

112. Which of the following commands is used to view the content of a file named /newexample/example1.txt?

Answer

Correct Answer: bin/hadoop dfs -cat /newexample/example1.txt

Note: This Question is unanswered, help us to find answer for this one

113. Which of the following HDFS commands is used for checking inconsistencies and reporting problems with various files?

Answer

Correct Answer: fsck 

Note: This Question is unanswered, help us to find answer for this one

114. Which of the following environment variables is used for determining the Hadoop cluster for Pig, in order to run the MapReduce jobs?

Answer

Correct Answer: HADOOP_CONF_DIR

Note: This Question is unanswered, help us to find answer for this one

115. Which of the following is the master process that accepts the job submissions from the clients and schedule the tasks to run on worker nodes?

Answer

Correct Answer: Jobtracker

Note: This Question is unanswered, help us to find answer for this one

116. In Pig, which of the following types of join operations can be performed using the Replicated join?
i) Inner join
ii) Outer join
iii) Right-outer join
iv) Left-outer join

Answer

Correct Answer: Only i) and iv)

Note: This Question is unanswered, help us to find answer for this one

117. Which of the following is the correct syntax for the command used for setting replication for an existing file in Hadoop WebHDFS REST API?

Answer

Correct Answer: setReplication(Path src, short replication) 

Note: This Question is unanswered, help us to find answer for this one

search
Hadoop Subjects