Correct Answer:
Knowledge discovery from large database
Note: This Question is unanswered, help us to find answer for this one
2. Data mining is more ________ than olap.
Answer
Correct Answer:
Discovery driven
Note: This Question is unanswered, help us to find answer for this one
3. Data mining applications are used to accomplish all of the following tasks except ________.
Answer
Correct Answer:
Do what-if analysis only
Note: This Question is unanswered, help us to find answer for this one
4. Suppose that the company's marketing department collects data from customers. Make customer groups to ensure that the most appropriate group to target the different offers. Choose the appropriate data mining task for this business problem.
Answer
Correct Answer:
Segmentation
Note: This Question is unanswered, help us to find answer for this one
5. The silhouette coefficient can be used to determine the natural number of clusters for ________.
Answer
Correct Answer:
Partitioning Algorithms
Note: This Question is unanswered, help us to find answer for this one
6. What is Hive
Answer
Correct Answer:
Hive enables Hadoop to operate as a data warehouse.
Note: This Question is unanswered, help us to find answer for this one
7. What is the purpose of the Hadoop Distributed File System (HDFS)?
Answer
Correct Answer:
To enable computation to take place by allowing each server to have access to the data.
Note: This Question is unanswered, help us to find answer for this one
8. Data mining provides a link between:
Answer
Correct Answer:
Separate transactional and analytical systems
Note: This Question is unanswered, help us to find answer for this one
9. Where can a website operator generally find data on her customers' IP addresses?
Answer
Correct Answer:
server logfiles
Note: This Question is unanswered, help us to find answer for this one
10. Which of the following is not valid JSON?
Answer
Correct Answer:
{["answer": "this one"]}
Note: This Question is unanswered, help us to find answer for this one
11. How do you measure interestingness in association patterns?
Answer
Correct Answer:
measure lift
Note: This Question is unanswered, help us to find answer for this one
12. A descriptive approach to exploring data that can help identify relationships among values in a database is:
Answer
Correct Answer:
Link analysis
Note: This Question is unanswered, help us to find answer for this one
13. Which of the following is not an appropriate tool for harvesting data from a website that accesses its database through Javascript/AJAX calls?
Answer
Correct Answer:
wget
Note: This Question is unanswered, help us to find answer for this one
14. The two major functions of BI servers are:
Answer
Correct Answer:
Management and delivery
Note: This Question is unanswered, help us to find answer for this one
15. Which decision tree method performs multi-level splits when computing classification trees?
Note: This Question is unanswered, help us to find answer for this one
16. Which of the following is part of a retail customer data mining strategy?
Answer
Correct Answer:
loyalty cards
Note: This Question is unanswered, help us to find answer for this one
17. Hash based technique, Transaction Reduction, Portioning, Sampling, and Dynamic Item Counting are all examples of what?
Answer
Correct Answer:
Techniques to improve the efficiency of an Apriori algorithm
Note: This Question is unanswered, help us to find answer for this one
18. The measured differences between a model and its predictions are known as:
Answer
Correct Answer:
Noise
Note: This Question is unanswered, help us to find answer for this one
19. True or False? Artificial neural networks are linear predictive models.
Answer
Correct Answer:
False
Note: This Question is unanswered, help us to find answer for this one
20. Which of these is a possible architecture of a data mining system?
Answer
Correct Answer:
No-coupling
Note: This Question is unanswered, help us to find answer for this one
21. Which of the following is not a primary phase of a Hadoop Reducer?
Answer
Correct Answer:
Map
Note: This Question is unanswered, help us to find answer for this one
22. Which of the following method can be used for modeling a categorical target variable?
Answer
Correct Answer:
Logistic Regression
Note: This Question is unanswered, help us to find answer for this one
23. In any numerical data set with a meaningful mean value, what is the minimum fraction of data that will fall within n standard deviations of the mean?
Answer
Correct Answer:
1-1/n^2
Note: This Question is unanswered, help us to find answer for this one
24. Which of the following applications are usually used to classify students' performances?
Answer
Correct Answer:
If...then... analysis
Note: This Question is unanswered, help us to find answer for this one
25. The authentication protocol used by many significant web APIs is called:
Answer
Correct Answer:
OAuth
Note: This Question is unanswered, help us to find answer for this one
26. Apriori is a seminal algorithm for finding frequent item sets using:
Answer
Correct Answer:
Candidate generation
Note: This Question is unanswered, help us to find answer for this one
27. The level of the model that specifies the strengths of the dependencies using some numerical scale.
Answer
Correct Answer:
Quantitative Level
Note: This Question is unanswered, help us to find answer for this one
28. What is the first step in the business understanding phase?
Answer
Correct Answer:
Firmly grasp business objectives and needs
Note: This Question is unanswered, help us to find answer for this one
29. If more than one value occurs the same number of times, the data is:
Answer
Correct Answer:
Multi-modal
Note: This Question is unanswered, help us to find answer for this one
30. The component of the Hadoop Distributed Filesystem responsible for storing metadata is called the
Answer
Correct Answer:
Namenode
Note: This Question is unanswered, help us to find answer for this one
31. Which of the following properties is a constraint on a RESTful application?
Answer
Correct Answer:
stateless
Note: This Question is unanswered, help us to find answer for this one
32. Which of the following algorithms produces decision trees?
Answer
Correct Answer:
ID3
Note: This Question is unanswered, help us to find answer for this one
33. Which xpath selector expression captures all link elements of the form 'http://example.com/profile/12345' in an html page while excluding all links of the form 'http://example.com/casenumber/12345?
Answer
Correct Answer:
//a/[contains(@href, "profile")]
Note: This Question is unanswered, help us to find answer for this one
34. Taking multiple random samples of data and building a classification model for each is known as:
Answer
Correct Answer:
Boosting
Note: This Question is unanswered, help us to find answer for this one
35. A commonly used continuous alternative to the step function in multi-layered neural network output is the
Answer
Correct Answer:
logistic function
Note: This Question is unanswered, help us to find answer for this one
36. "In 2% of the purchases at the hardware store, both a pick and a shovel were bought,” is an example of:
Answer
Correct Answer:
Support
Note: This Question is unanswered, help us to find answer for this one
37. What is Summarization?
Answer
Correct Answer:
Methods for finding a compact description for a subset of data.
Note: This Question is unanswered, help us to find answer for this one
38. Which of the following is NOT a method of combining multiple models into an ensemble model?
Answer
Correct Answer:
Bootstrapping
Note: This Question is unanswered, help us to find answer for this one
39. Which of the following properties applies to Single-Layer Perceptrons?
Answer
Correct Answer:
random initalization of weights
Note: This Question is unanswered, help us to find answer for this one
40. Converted information to provide insights about historical patterns and future trends is known as:
Answer
Correct Answer:
Knowledge
Note: This Question is unanswered, help us to find answer for this one
41. In which type of analysis is a Kohonen feature map typically employed?
Answer
Correct Answer:
Cluster analysis
Note: This Question is unanswered, help us to find answer for this one
42. What is Clustering?
Answer
Correct Answer:
A descriptive task where one seeks to identify a finite set of categories to describe the data.
Note: This Question is unanswered, help us to find answer for this one
43. In Natural Language Processing, what is the role of a lexical analyzer?
Answer
Correct Answer:
splits the stream of input characters into tokens
Note: This Question is unanswered, help us to find answer for this one
44. In the MapReduce model, Map and Reduce functions act directly on which kind of data structure?
Answer
Correct Answer:
key-value pair
Note: This Question is unanswered, help us to find answer for this one
45. What is Interestingness?
Answer
Correct Answer:
An overall measure of pattern value, combining validity, novelty, usefulness, and simplicity.
Note: This Question is unanswered, help us to find answer for this one
46. Which of the following is not a common goal of the KDD Process:
Answer
Correct Answer:
Performance
Note: This Question is unanswered, help us to find answer for this one
47. Which of the following is most appropriate for finding the shortest chain of friends linking two people in a social graph who are not friends with each other?
Answer
Correct Answer:
Dijkstra's algorithm
Note: This Question is unanswered, help us to find answer for this one
48. True or False? The MARS algorithm cannot produce rules.
Answer
Correct Answer:
True
Note: This Question is unanswered, help us to find answer for this one
49. Which of the following is NOT a function of data warehouses?
Answer
Correct Answer:
Cleaning dirty data
Note: This Question is unanswered, help us to find answer for this one
50. What is Classification?
Answer
Correct Answer:
Learning a function that maps a data item into one of several predefined groups.
Note: This Question is unanswered, help us to find answer for this one
51. A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset is:
Answer
Correct Answer:
Nearest Neighbor
Note: This Question is unanswered, help us to find answer for this one
52. Which of the following is NOT a common source system?
Answer
Correct Answer:
Node
Note: This Question is unanswered, help us to find answer for this one
53. Which of these are evolutionary computational methods?
Answer
Correct Answer:
Genetic algorithms
Note: This Question is unanswered, help us to find answer for this one
54. Generalization error is a consequence of
Answer
Correct Answer:
Overfit
Note: This Question is unanswered, help us to find answer for this one
55. Which of the following storage solutions is most appropriate for a semi-structured dataset whose members do not all have the same attributes?
Answer
Correct Answer:
MongoDB
Note: This Question is unanswered, help us to find answer for this one
56. Which of the following algorithms is generally suitable for unsupervised learning tasks?
Answer
Correct Answer:
k-means algorithm
Note: This Question is unanswered, help us to find answer for this one
57. What is Change and Deviation Detection?
Answer
Correct Answer:
A task focusing on discovering the most significant changes in the data from previously measured or normative values
Note: This Question is unanswered, help us to find answer for this one
58. Sharding refers to:
Answer
Correct Answer:
partioning a database for distribution across different servers
Note: This Question is unanswered, help us to find answer for this one
59. Which of these is NOT a common descriptions of layers?
Answer
Correct Answer:
Functional
Note: This Question is unanswered, help us to find answer for this one
60. What is Dependency Modeling?
Answer
Correct Answer:
The process of finding a model which describes significant dependencies between variables
Note: This Question is unanswered, help us to find answer for this one
61. In the analysis of time-series data, the mean value over a given time period (usually some interval in the past up to the present) is called a(n)
Answer
Correct Answer:
moving average
Note: This Question is unanswered, help us to find answer for this one
62. In the association between two variables, what is the difference between the antecedent and the consequent?
Answer
Correct Answer:
The antecedent is on the left, the consequent on the right
Note: This Question is unanswered, help us to find answer for this one
63. The algorithm powering the Google search engine is:
Answer
Correct Answer:
PageRank
Note: This Question is unanswered, help us to find answer for this one
64. To increase the confidence of your state of classification performance on the entire population, you should:
Answer
Correct Answer:
Increase the size of the test dataset
Note: This Question is unanswered, help us to find answer for this one
65. The level of the model that specifies (often graphically) which variables are locally dependent on each other.
Answer
Correct Answer:
Structural Level
Note: This Question is unanswered, help us to find answer for this one
66. Which data mining technique organizes sets of data into predefined groups?
Answer
Correct Answer:
Classification
Note: This Question is unanswered, help us to find answer for this one
67. Which of these are NOT considered internal data factors?
Answer
Correct Answer:
Economic downturns
Note: This Question is unanswered, help us to find answer for this one
68. Data not collected by the organization, such as data from a proprietary database, that is combined with the organization’s own data is known as:
Answer
Correct Answer:
Overlay
Note: This Question is unanswered, help us to find answer for this one
69. What is the front end layer of data mining architecture?
Answer
Correct Answer:
An intuitive and user friendly user interface
Note: This Question is unanswered, help us to find answer for this one
70. The annual revenue of an international company is correlated with other attributes like advertisement, exchange rate, inflation rate etc. Having these values (or their reliable estimations for the next year) the company have to calculate its expected revenue for the next year. Choose the appropriate data mining task for this business problem.
Answer
Correct Answer:
Regression
Note: This Question is unanswered, help us to find answer for this one
71. Which of these is an example of a sequential pattern relationship?
Answer
Correct Answer:
Predicting the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes
Note: This Question is unanswered, help us to find answer for this one
72. What is the measure of how much two random variables change together?
Answer
Correct Answer:
covariance
Note: This Question is unanswered, help us to find answer for this one
73. A function used by a node in a neural net to transform input data from any domain of values into a finite range of values is known as a(n):
Answer
Correct Answer:
Activation Function
Note: This Question is unanswered, help us to find answer for this one
74. True of False? Loose coupling data mining architecture is mainly for memory-based data mining systems that does not require high scalability and high performance.
Answer
Correct Answer:
True
Note: This Question is unanswered, help us to find answer for this one
75. Which are popular data mining methods?
Answer
Correct Answer:
All of these
Note: This Question is unanswered, help us to find answer for this one
76. What are decision trees?
Answer
Correct Answer:
Structures that generate rules for the classification of a dataset
Note: This Question is unanswered, help us to find answer for this one
77. Data items grouped into relationships and preferences are known as:
Answer
Correct Answer:
Clusters
Note: This Question is unanswered, help us to find answer for this one
78. You are a credit risk manager of a retail bank. Some information about customers are available to analytics. Based on this data you have to decide that a person will be a good or bad customer. Choose the appropriate data mining task for this business problems.
Answer
Correct Answer:
Classification
Note: This Question is unanswered, help us to find answer for this one
79. In predictive models, the values or classes to be predicted are called the:
Answer
Correct Answer:
All of these
Note: This Question is unanswered, help us to find answer for this one
80. Which of the following disciplines overlaps Data Mining?
Answer
Correct Answer:
All of the above
Note: This Question is unanswered, help us to find answer for this one
81. True or False? Economic indicators are external data factors.
Answer
Correct Answer:
True
Note: This Question is unanswered, help us to find answer for this one
82. Which of these are NOT types of analytical software:
Answer
Correct Answer:
All are valid types
Note: This Question is unanswered, help us to find answer for this one
83. What is data visualization?
Answer
Correct Answer:
The visual interpretation of complex relationships in multidimensional data
Note: This Question is unanswered, help us to find answer for this one
84. Which of the following is not a relational database?
Answer
Correct Answer:
All of the above
Note: This Question is unanswered, help us to find answer for this one
85. Which of the following is valid XML?
Answer
Correct Answer:
All are valid
Note: This Question is unanswered, help us to find answer for this one
86. A(n) _____ algorithm creates rules that describe how often events have occurred together.
Answer
Correct Answer:
associative
Note: This Question is unanswered, help us to find answer for this one
87. Decision trees are able to handle missing values without using any impute transformation. True or False?
Answer
Correct Answer:
True
Note: This Question is unanswered, help us to find answer for this one
88. Which of the following clustering algorithms can find clusters of arbitrary shape?
Answer
Correct Answer:
Both of these
Note: This Question is unanswered, help us to find answer for this one
89. In a neural net, to what does topology refer?
Answer
Correct Answer:
The number of layers and the number of nodes in each layer
Note: This Question is unanswered, help us to find answer for this one
90. Changes to parts of a code could lead to the problem of ______________ data.
Answer
Correct Answer:
inconsistent
Note: This Question is unanswered, help us to find answer for this one
91. With which of these layers does a neural network start?
Answer
Correct Answer:
Input layer
Note: This Question is unanswered, help us to find answer for this one
92. Which industry can benefit from data mining?
Answer
Correct Answer:
All of these
Note: This Question is unanswered, help us to find answer for this one