1. Which industry can benefit from data mining?
2. With which of these layers does a neural network start?
3. Changes to parts of a code could lead to the problem of ______________ data.
4. In a neural net, to what does topology refer?
5. Which of the following clustering algorithms can find clusters of arbitrary shape?
6. Decision trees are able to handle missing values without using any impute transformation. True or False?
7. A(n) _____ algorithm creates rules that describe how often events have occurred together.
8. Which of the following is valid XML?
9. Which of the following is not a relational database?
10. What is data visualization?
11. What is a KDD Process?
12. Which of these are NOT types of analytical software:
13. True or False? Economic indicators are external data factors.
14. Which of the following disciplines overlaps Data Mining?
15. In predictive models, the values or classes to be predicted are called the:
16. You are a credit risk manager of a retail bank. Some information about customers are available to analytics. Based on this data you have to decide that a person will be a good or bad customer. Choose the appropriate data mining task for this business problems.
17. Data items grouped into relationships and preferences are known as:
18. What are decision trees?
19. Which are popular data mining methods?
20. True of False? Loose coupling data mining architecture is mainly for memory-based data mining systems that does not require high scalability and high performance.
21. What is CRISP-DM?
22. A function used by a node in a neural net to transform input data from any domain of values into a finite range of values is known as a(n):
23. True or False? Tests in CART are always Binary.
24. What is the measure of how much two random variables change together?
25. Which of these is an example of a sequential pattern relationship?
26. The annual revenue of an international company is correlated with other attributes like advertisement, exchange rate, inflation rate etc. Having these values (or their reliable estimations for the next year) the company have to calculate its expected revenue for the next year. Choose the appropriate data mining task for this business problem.
27. What is the front end layer of data mining architecture?
28. A hyperplane is a
29. Data not collected by the organization, such as data from a proprietary database, that is combined with the organization’s own data is known as:
30. Which of these are NOT considered internal data factors?
31. Which data mining technique organizes sets of data into predefined groups?
32. The level of the model that specifies (often graphically) which variables are locally dependent on each other.
33. To increase the confidence of your state of classification performance on the entire population, you should:
34. The algorithm powering the Google search engine is:
35. In the association between two variables, what is the difference between the antecedent and the consequent?
36. In the analysis of time-series data, the mean value over a given time period (usually some interval in the past up to the present) is called a(n)
37. What is Regression?
38. What is Dependency Modeling?
39. Which of these is NOT a common descriptions of layers?
40. Sharding refers to:
41. What is Change and Deviation Detection?
42. What is the type of data mining that drives the Amazon.com recommendation system?
43. Which of the following algorithms is generally suitable for unsupervised learning tasks?
44. Which of the following storage solutions is most appropriate for a semi-structured dataset whose members do not all have the same attributes?
45. In order to estimate classification performance on an entire population, you need _______
46. Generalization error is a consequence of
47. Which of these are evolutionary computational methods?
48. Support Vector Machines have an advantage over Neural Networks because SVM's are
49. Which of the following is NOT a common source system?
50. A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset is:
51. What is the extraction of useful if-then rules from data based on statistical significance?
52. What is Classification?
53. Which of the following is NOT a function of data warehouses?
54. True or False? The MARS algorithm cannot produce rules.
55. Which of the following is most appropriate for finding the shortest chain of friends linking two people in a social graph who are not friends with each other?
56. Which of the following is not a common goal of the KDD Process:
57. What is a genetic algorithm?
58. What is Interestingness?
59. In the MapReduce model, Map and Reduce functions act directly on which kind of data structure?
60. In Natural Language Processing, what is the role of a lexical analyzer?
61. What is Clustering?
62. A DBMS reduces data redundancy and inconsistency by
63. In which type of analysis is a Kohonen feature map typically employed?
64. Which of the followng clustering algorithms can optimize an ojbective function?
65. Converted information to provide insights about historical patterns and future trends is known as:
66. Which of the following properties applies to Single-Layer Perceptrons?
67. Which of the following is NOT a method of combining multiple models into an ensemble model?
68. What is Summarization?
69. "In 2% of the purchases at the hardware store, both a pick and a shovel were bought,” is an example of:
70. A commonly used continuous alternative to the step function in multi-layered neural network output is the
71. What is Pig
72. Taking multiple random samples of data and building a classification model for each is known as:
73. Which xpath selector expression captures all link elements of the form 'http://example.com/profile/12345' in an html page while excluding all links of the form 'http://example.com/casenumber/12345?
74. Which of the following algorithms produces decision trees?
75. Which of the following properties is a constraint on a RESTful application?
76. The component of the Hadoop Distributed Filesystem responsible for storing metadata is called the
77. If more than one value occurs the same number of times, the data is:
78. What is the first step in the business understanding phase?
79. What is CURL?
80. The level of the model that specifies the strengths of the dependencies using some numerical scale.
81. Apriori is a seminal algorithm for finding frequent item sets using:
82. The authentication protocol used by many significant web APIs is called:
83. Which of these is not a step in the KDD process?
84. Which of the following applications are usually used to classify students' performances?
85. In any numerical data set with a meaningful mean value, what is the minimum fraction of data that will fall within n standard deviations of the mean?
86. Which of the following method can be used for modeling a categorical target variable?
87. Which of the following is not a primary phase of a Hadoop Reducer?
88. Which of these is a possible architecture of a data mining system?
89. True or False? Artificial neural networks are linear predictive models.
90. The measured differences between a model and its predictions are known as:
91. Hash based technique, Transaction Reduction, Portioning, Sampling, and Dynamic Item Counting are all examples of what?
92. Which of the following is part of a retail customer data mining strategy?
93. Which decision tree method performs multi-level splits when computing classification trees?
94. What is the advantage of the k-Medoids Clustering Algorithm over the k-Means Clustering (Lloyd's) Algorithm?
95. The two major functions of BI servers are:
96. Which of the following is not an appropriate tool for harvesting data from a website that accesses its database through Javascript/AJAX calls?
97. A descriptive approach to exploring data that can help identify relationships among values in a database is:
98. How do you measure interestingness in association patterns?
99. Which of the following is not valid JSON?
100. Where can a website operator generally find data on her customers' IP addresses?
101. Data mining provides a link between:
102. What is the purpose of the Hadoop Distributed File System (HDFS)?
103. What is Hive
104. The silhouette coefficient can be used to determine the natural number of clusters for ________.
105. Suppose that the company's marketing department collects data from customers. Make customer groups to ensure that the most appropriate group to target the different offers. Choose the appropriate data mining task for this business problem.
106. Data mining applications are used to accomplish all of the following tasks except ________.
107. Data mining is more ________ than olap.
108. Data mining is the ____