MCQs > IT & Programming > Data Mining MCQs > Basic Data Mining MCQs

Basic Data Mining MCQ

1. Data mining is the ____

Answer

Correct Answer: Knowledge discovery from large database

Note: This Question is unanswered, help us to find answer for this one

2. Data mining is more ________ than olap.

Answer

Correct Answer: Discovery driven

Note: This Question is unanswered, help us to find answer for this one

3. Data mining applications are used to accomplish all of the following tasks except ________.

Answer

Correct Answer: Do what-if analysis only

Note: This Question is unanswered, help us to find answer for this one

4. Suppose that the company's marketing department collects data from customers. Make customer groups to ensure that the most appropriate group to target the different offers. Choose the appropriate data mining task for this business problem.

Answer

Correct Answer: Segmentation

Note: This Question is unanswered, help us to find answer for this one

5. The silhouette coefficient can be used to determine the natural number of clusters for ________.

Answer

Correct Answer: Partitioning Algorithms

Note: This Question is unanswered, help us to find answer for this one

6. What is Hive

Answer

Correct Answer: Hive enables Hadoop to operate as a data warehouse.

Note: This Question is unanswered, help us to find answer for this one

7. What is the purpose of the Hadoop Distributed File System (HDFS)?

Answer

Correct Answer: To enable computation to take place by allowing each server to have access to the data.

Note: This Question is unanswered, help us to find answer for this one

8. Data mining provides a link between:

Answer

Correct Answer: Separate transactional and analytical systems

Note: This Question is unanswered, help us to find answer for this one

9. Where can a website operator generally find data on her customers' IP addresses?

Answer

Correct Answer: server logfiles

Note: This Question is unanswered, help us to find answer for this one

10. Which of the following is not valid JSON?

Answer

Correct Answer: {["answer": "this one"]}

Note: This Question is unanswered, help us to find answer for this one

11. How do you measure interestingness in association patterns?

Answer

Correct Answer: measure lift

Note: This Question is unanswered, help us to find answer for this one

12. A descriptive approach to exploring data that can help identify relationships among values in a database is:

Answer

Correct Answer: Link analysis

Note: This Question is unanswered, help us to find answer for this one

13. Which of the following is not an appropriate tool for harvesting data from a website that accesses its database through Javascript/AJAX calls?

Answer

Correct Answer: wget

Note: This Question is unanswered, help us to find answer for this one

14. The two major functions of BI servers are:

Answer

Correct Answer: Management and delivery

Note: This Question is unanswered, help us to find answer for this one

15. Which decision tree method performs multi-level splits when computing classification trees?

Answer

Correct Answer: CHAID (Chi Square Automatic Interaction Detection)

Note: This Question is unanswered, help us to find answer for this one

16. Which of the following is part of a retail customer data mining strategy?

Answer

Correct Answer: loyalty cards

Note: This Question is unanswered, help us to find answer for this one

17. Hash based technique, Transaction Reduction, Portioning, Sampling, and Dynamic Item Counting are all examples of what?

Answer

Correct Answer: Techniques to improve the efficiency of an Apriori algorithm

Note: This Question is unanswered, help us to find answer for this one

18. The measured differences between a model and its predictions are known as:

Answer

Correct Answer: Noise

Note: This Question is unanswered, help us to find answer for this one

19. True or False? Artificial neural networks are linear predictive models.

Answer

Correct Answer: False

Note: This Question is unanswered, help us to find answer for this one

20. Which of these is a possible architecture of a data mining system?

Answer

Correct Answer: No-coupling

Note: This Question is unanswered, help us to find answer for this one

21. Which of the following is not a primary phase of a Hadoop Reducer?

Answer

Correct Answer: Map

Note: This Question is unanswered, help us to find answer for this one

22. Which of the following method can be used for modeling a categorical target variable?

Answer

Correct Answer: Logistic Regression

Note: This Question is unanswered, help us to find answer for this one

23. In any numerical data set with a meaningful mean value, what is the minimum fraction of data that will fall within n standard deviations of the mean?

Answer

Correct Answer: 1-1/n^2

Note: This Question is unanswered, help us to find answer for this one

24. Which of the following applications are usually used to classify students' performances?

Answer

Correct Answer: If...then... analysis

Note: This Question is unanswered, help us to find answer for this one

25. The authentication protocol used by many significant web APIs is called:

Answer

Correct Answer: OAuth

Note: This Question is unanswered, help us to find answer for this one

26. Apriori is a seminal algorithm for finding frequent item sets using:

Answer

Correct Answer: Candidate generation

Note: This Question is unanswered, help us to find answer for this one

27. The level of the model that specifies the strengths of the dependencies using some numerical scale.

Answer

Correct Answer: Quantitative Level

Note: This Question is unanswered, help us to find answer for this one

28. What is the first step in the business understanding phase?

Answer

Correct Answer: Firmly grasp business objectives and needs

Note: This Question is unanswered, help us to find answer for this one

29. If more than one value occurs the same number of times, the data is:

Answer

Correct Answer: Multi-modal

Note: This Question is unanswered, help us to find answer for this one

30. The component of the Hadoop Distributed Filesystem responsible for storing metadata is called the

Answer

Correct Answer: Namenode

Note: This Question is unanswered, help us to find answer for this one

31. Which of the following properties is a constraint on a RESTful application?

Answer

Correct Answer: stateless

Note: This Question is unanswered, help us to find answer for this one

32. Which of the following algorithms produces decision trees?

Answer

Correct Answer: ID3

Note: This Question is unanswered, help us to find answer for this one

33. Which xpath selector expression captures all link elements of the form 'http://example.com/profile/12345' in an html page while excluding all links of the form 'http://example.com/casenumber/12345?

Answer

Correct Answer: //a/[contains(@href, "profile")]

Note: This Question is unanswered, help us to find answer for this one

34. Taking multiple random samples of data and building a classification model for each is known as:

Answer

Correct Answer: Boosting

Note: This Question is unanswered, help us to find answer for this one

35. A commonly used continuous alternative to the step function in multi-layered neural network output is the

Answer

Correct Answer: logistic function

Note: This Question is unanswered, help us to find answer for this one

36. "In 2% of the purchases at the hardware store, both a pick and a shovel were bought,” is an example of:

Answer

Correct Answer: Support

Note: This Question is unanswered, help us to find answer for this one

37. What is Summarization?

Answer

Correct Answer: Methods for finding a compact description for a subset of data.

Note: This Question is unanswered, help us to find answer for this one

38. Which of the following is NOT a method of combining multiple models into an ensemble model?

Answer

Correct Answer: Bootstrapping

Note: This Question is unanswered, help us to find answer for this one

39. Which of the following properties applies to Single-Layer Perceptrons?

Answer

Correct Answer: random initalization of weights

Note: This Question is unanswered, help us to find answer for this one

40. Converted information to provide insights about historical patterns and future trends is known as:

Answer

Correct Answer: Knowledge

Note: This Question is unanswered, help us to find answer for this one

41. In which type of analysis is a Kohonen feature map typically employed?

Answer

Correct Answer: Cluster analysis

Note: This Question is unanswered, help us to find answer for this one

42. What is Clustering?

Answer

Correct Answer: A descriptive task where one seeks to identify a finite set of categories to describe the data.

Note: This Question is unanswered, help us to find answer for this one

43. In Natural Language Processing, what is the role of a lexical analyzer?

Answer

Correct Answer: splits the stream of input characters into tokens

Note: This Question is unanswered, help us to find answer for this one

44. In the MapReduce model, Map and Reduce functions act directly on which kind of data structure?

Answer

Correct Answer: key-value pair

Note: This Question is unanswered, help us to find answer for this one

45. What is Interestingness?

Answer

Correct Answer: An overall measure of pattern value, combining validity, novelty, usefulness, and simplicity.

Note: This Question is unanswered, help us to find answer for this one

46. Which of the following is not a common goal of the KDD Process:

Answer

Correct Answer: Performance

Note: This Question is unanswered, help us to find answer for this one

47. Which of the following is most appropriate for finding the shortest chain of friends linking two people in a social graph who are not friends with each other?

Answer

Correct Answer: Dijkstra's algorithm

Note: This Question is unanswered, help us to find answer for this one

48. True or False? The MARS algorithm cannot produce rules.

Answer

Correct Answer: True

Note: This Question is unanswered, help us to find answer for this one

49. Which of the following is NOT a function of data warehouses?

Answer

Correct Answer: Cleaning dirty data

Note: This Question is unanswered, help us to find answer for this one

50. What is Classification?

Answer

Correct Answer: Learning a function that maps a data item into one of several predefined groups.

Note: This Question is unanswered, help us to find answer for this one

51. A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset is:

Answer

Correct Answer: Nearest Neighbor

Note: This Question is unanswered, help us to find answer for this one

52. Which of the following is NOT a common source system?

Answer

Correct Answer: Node

Note: This Question is unanswered, help us to find answer for this one

53. Which of these are evolutionary computational methods?

Answer

Correct Answer: Genetic algorithms

Note: This Question is unanswered, help us to find answer for this one

54. Generalization error is a consequence of

Answer

Correct Answer: Overfit

Note: This Question is unanswered, help us to find answer for this one

55. Which of the following storage solutions is most appropriate for a semi-structured dataset whose members do not all have the same attributes?

Answer

Correct Answer: MongoDB

Note: This Question is unanswered, help us to find answer for this one

56. Which of the following algorithms is generally suitable for unsupervised learning tasks?

Answer

Correct Answer: k-means algorithm

Note: This Question is unanswered, help us to find answer for this one

57. What is Change and Deviation Detection?

Answer

Correct Answer: A task focusing on discovering the most significant changes in the data from previously measured or normative values

Note: This Question is unanswered, help us to find answer for this one

58. Sharding refers to:

Answer

Correct Answer: partioning a database for distribution across different servers

Note: This Question is unanswered, help us to find answer for this one

59. Which of these is NOT a common descriptions of layers?

Answer

Correct Answer: Functional

Note: This Question is unanswered, help us to find answer for this one

60. What is Dependency Modeling?

Answer

Correct Answer: The process of finding a model which describes significant dependencies between variables

Note: This Question is unanswered, help us to find answer for this one

61. In the analysis of time-series data, the mean value over a given time period (usually some interval in the past up to the present) is called a(n)

Answer

Correct Answer: moving average

Note: This Question is unanswered, help us to find answer for this one

62. In the association between two variables, what is the difference between the antecedent and the consequent?

Answer

Correct Answer: The antecedent is on the left, the consequent on the right

Note: This Question is unanswered, help us to find answer for this one

63. The algorithm powering the Google search engine is:

Answer

Correct Answer: PageRank

Note: This Question is unanswered, help us to find answer for this one

64. To increase the confidence of your state of classification performance on the entire population, you should:

Answer

Correct Answer: Increase the size of the test dataset

Note: This Question is unanswered, help us to find answer for this one

65. The level of the model that specifies (often graphically) which variables are locally dependent on each other.

Answer

Correct Answer: Structural Level

Note: This Question is unanswered, help us to find answer for this one

66. Which data mining technique organizes sets of data into predefined groups?

Answer

Correct Answer: Classification

Note: This Question is unanswered, help us to find answer for this one

67. Which of these are NOT considered internal data factors?

Answer

Correct Answer: Economic downturns

Note: This Question is unanswered, help us to find answer for this one

68. Data not collected by the organization, such as data from a proprietary database, that is combined with the organization’s own data is known as:

Answer

Correct Answer: Overlay

Note: This Question is unanswered, help us to find answer for this one

69. What is the front end layer of data mining architecture?

Answer

Correct Answer: An intuitive and user friendly user interface

Note: This Question is unanswered, help us to find answer for this one

70. The annual revenue of an international company is correlated with other attributes like advertisement, exchange rate, inflation rate etc. Having these values (or their reliable estimations for the next year) the company have to calculate its expected revenue for the next year. Choose the appropriate data mining task for this business problem.

Answer

Correct Answer: Regression

Note: This Question is unanswered, help us to find answer for this one

71. Which of these is an example of a sequential pattern relationship?

Answer

Correct Answer: Predicting the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes

Note: This Question is unanswered, help us to find answer for this one

72. What is the measure of how much two random variables change together?

Answer

Correct Answer: covariance

Note: This Question is unanswered, help us to find answer for this one

73. A function used by a node in a neural net to transform input data from any domain of values into a finite range of values is known as a(n):

Answer

Correct Answer: Activation Function

Note: This Question is unanswered, help us to find answer for this one

74. True of False? Loose coupling data mining architecture is mainly for memory-based data mining systems that does not require high scalability and high performance.

Answer

Correct Answer: True

Note: This Question is unanswered, help us to find answer for this one

75. Which are popular data mining methods?

Answer

Correct Answer: All of these

Note: This Question is unanswered, help us to find answer for this one

76. What are decision trees?

Answer

Correct Answer: Structures that generate rules for the classification of a dataset

Note: This Question is unanswered, help us to find answer for this one

77. Data items grouped into relationships and preferences are known as:

Answer

Correct Answer: Clusters

Note: This Question is unanswered, help us to find answer for this one

78. You are a credit risk manager of a retail bank. Some information about customers are available to analytics. Based on this data you have to decide that a person will be a good or bad customer. Choose the appropriate data mining task for this business problems.

Answer

Correct Answer: Classification

Note: This Question is unanswered, help us to find answer for this one

79. In predictive models, the values or classes to be predicted are called the:

Answer

Correct Answer: All of these

Note: This Question is unanswered, help us to find answer for this one

80. Which of the following disciplines overlaps Data Mining?

Answer

Correct Answer: All of the above

Note: This Question is unanswered, help us to find answer for this one

81. True or False? Economic indicators are external data factors.

Answer

Correct Answer: True

Note: This Question is unanswered, help us to find answer for this one

82. Which of these are NOT types of analytical software:

Answer

Correct Answer: All are valid types

Note: This Question is unanswered, help us to find answer for this one

83. What is data visualization?

Answer

Correct Answer: The visual interpretation of complex relationships in multidimensional data

Note: This Question is unanswered, help us to find answer for this one

84. Which of the following is not a relational database?

Answer

Correct Answer: All of the above

Note: This Question is unanswered, help us to find answer for this one

85. Which of the following is valid XML?

Answer

Correct Answer: All are valid

Note: This Question is unanswered, help us to find answer for this one

86. A(n) _____ algorithm creates rules that describe how often events have occurred together.

Answer

Correct Answer: associative

Note: This Question is unanswered, help us to find answer for this one

87. Decision trees are able to handle missing values without using any impute transformation. True or False?

Answer

Correct Answer: True

Note: This Question is unanswered, help us to find answer for this one

88. Which of the following clustering algorithms can find clusters of arbitrary shape?

Answer

Correct Answer: Both of these

Note: This Question is unanswered, help us to find answer for this one

89. In a neural net, to what does topology refer?

Answer

Correct Answer: The number of layers and the number of nodes in each layer

Note: This Question is unanswered, help us to find answer for this one

90. Changes to parts of a code could lead to the problem of ______________ data.

Answer

Correct Answer: inconsistent

Note: This Question is unanswered, help us to find answer for this one

91. With which of these layers does a neural network start?

Answer

Correct Answer: Input layer

Note: This Question is unanswered, help us to find answer for this one

92. Which industry can benefit from data mining?

Answer

Correct Answer: All of these

Note: This Question is unanswered, help us to find answer for this one