1. ________ tools and techniques process data and do statistical analysis for insight and discovery.
Answer
Correct Answer:
Business Intelligence
Note: This Question is unanswered, help us to find answer for this one
2. ____ case tools provide support for the coding and implementation phases.
Answer
Correct Answer:
Back-end
Note: This Question is unanswered, help us to find answer for this one
3. The ______ of a worksheet defines its appearance.
Answer
Correct Answer:
Format
Note: This Question is unanswered, help us to find answer for this one
4. Dirty data is ________.
Answer
Correct Answer:
Inaccurate, incomplete data
Note: This Question is unanswered, help us to find answer for this one
5. Diigo and delicious are ________ tools.
Answer
Correct Answer:
Social bookmarking
Note: This Question is unanswered, help us to find answer for this one
6. Data types that are created by the programmer are known as ________.
Answer
Correct Answer:
Abstract data types (ADTs)
Note: This Question is unanswered, help us to find answer for this one
7. _______ reduces the number of bits in a file by identifying and eliminating redundancy
Answer
Correct Answer:
Lossless compression
Note: This Question is unanswered, help us to find answer for this one
8.
Regression equation of Z on V is given as following:
Z = c +dV Using the least square method, which of the given normal equations will be used to calculate the values of c and d?
Answer
Correct Answer:
C)
Note: This Question is unanswered, help us to find answer for this one
9.
Regression equation of Z on V is given as following:
7. = c + dV
The relationship between two variables a and b, is given as b + 6a = 20 and between another two variables c and d, as 4c + 10d = 50. The regression coefficient of c on a is given as 0.90. Find the regression coefficient of d on b.
Answer
Correct Answer:
3/50
Note: This Question is unanswered, help us to find answer for this one
10. If the significance level of a test is 5%, what will be the outcome of the test if p-value obtained is greater than 0.05?
Answer
Correct Answer:
Fail to reject null hypothesis
Note: This Question is unanswered, help us to find answer for this one
11. A parametric statistical model is given as: (S, P) with P = [P6 : e e 9]. Based on statistical notations, which of the following is the correct method of representing a?
Answer
Correct Answer:
e g R 0d
Note: This Question is unanswered, help us to find answer for this one
12. In Web Analytics, which of the following metrics is monitored in the Ecommerce Dashboard?
Answer
Correct Answer:
Total sale by products
Note: This Question is unanswered, help us to find answer for this one
13.
If median weight is 46. compute the missing frequency in the given table.
Answer
Correct Answer:
20
Note: This Question is unanswered, help us to find answer for this one
14.
The given data shows the relation between the number of students enrolled in an institute and their age.
Which of the following is the appropriate regression equation for the given data?
Answer
Correct Answer:
y = 4.261 +1.239x
Note: This Question is unanswered, help us to find answer for this one
15. A user can obtain the pageviews of a website with the help of which of the following web analytics goals?
Answer
Correct Answer:
Destination goal
Note: This Question is unanswered, help us to find answer for this one
16. In association rule mining, which of the following statements is correct about Frequent Itemset Generation of the two-step approach?
Answer
Correct Answer:
Generates all itemsets whose support 5 minsup
Note: This Question is unanswered, help us to find answer for this one
17. Which Of the following t-tests should be performed in order to compare means from two different groups?
Answer
Correct Answer:
Independent samples t-test
Note: This Question is unanswered, help us to find answer for this one
18. Which of the following is the correct R command used for saving the contents of a workspace into the file. .RData?
Answer
Correct Answer:
save.image()
Note: This Question is unanswered, help us to find answer for this one
19.
In association rule mining, under tree projection, node P of a tree stores which of the following information?
(i)Itemset for node P
(ii)List of possible lexicographic extensions of P
(iii)Pointer to projected database of its child node
(iv)Bitvector containing information about which transactions in the projected database contain the
itemset
Answer
Correct Answer:
Only (0, (ii) and (iv)
Note: This Question is unanswered, help us to find answer for this one
20. In data mining, which of the following classification models is built by kNN algorithm?
Answer
Correct Answer:
No classification model is built by kNN
Note: This Question is unanswered, help us to find answer for this one
21. In the linear discriminant function of discriminant function analysis, what is the function Of the following method?
Answer
Correct Answer:
It prints discriminant functions based on variables that are centered, but not standardized.
Note: This Question is unanswered, help us to find answer for this one
22. Which of the following is the correct way of expressing null hypothesis of the lower tail test of the population mean? It is given that uo is a hypothesized lower bound of the true population mean
Answer
Correct Answer:
up 5 ll
Note: This Question is unanswered, help us to find answer for this one
23. For a given set of 25 items, coefficient of correlation between x and y is 0.6. The values of the arithmetic mean of x and y are 14 and 18, respectively, and the values of standard deviation of x and y are 4 and 6. respectively. If the pair (25. 18) has been wrongly taken as (18, 25). then find the correct value of correlation coefficient.
Answer
Correct Answer:
0.51
Note: This Question is unanswered, help us to find answer for this one
24. In association rule mining, an indication of how often the rule has been found to be true is represented by a term known as confidence. How is this term. confidence. represented for the rule, A => B?
Answer
Correct Answer:
conf(A => B) = supp(A U B) / supp(A)
Note: This Question is unanswered, help us to find answer for this one
25. In which of the key technologies, which are used for extracting business value from big data, data is managed as a strategic. core asset with ongoing process control for big data analytics?
Answer
Correct Answer:
Information management for big data
Note: This Question is unanswered, help us to find answer for this one
26.
For a group of employees of an organization, find the mean salary (in thousands) using the given data.
Answer
Correct Answer:
33.36
Note: This Question is unanswered, help us to find answer for this one
27.
Consider the given data.
Find the regression equation Of Y on X and the total variation in Y.
Answer
Correct Answer:
Regression equation: Y = 2.25X +1.5, total variation in Y: 250
Note: This Question is unanswered, help us to find answer for this one
28. Which of the following clustering algorithms can handle noisy data?
Answer
Correct Answer:
CURE
Note: This Question is unanswered, help us to find answer for this one
29.
What is the function of the following R command?
dataframename.colnames <— namesfdataframename)
Answer
Correct Answer:
It is used to store the column names of a data frame in the variable. Dataframename.colnames.
Note: This Question is unanswered, help us to find answer for this one
30.
Suppose a user has typed the following command, where mdata is a variable to which the user's data is stored. head(mdata)
Answer
Correct Answer:
6
Note: This Question is unanswered, help us to find answer for this one
31.
In which of the following types of reasoning in data science, the conclusions reached are probable,
reasonable. plausible and believable?
Deductive reasoning
Inductive reasoning
Answer
Correct Answer:
Only 2
Note: This Question is unanswered, help us to find answer for this one
32. Which of the following statements is NOT correct about data science?
Answer
Correct Answer:
In order to achieve success. organizations need to reach maximum data science maturity.
Note: This Question is unanswered, help us to find answer for this one
33. Which of the following is a descriptive function involved in data mining?
Answer
Correct Answer:
Mining of associations
Note: This Question is unanswered, help us to find answer for this one
34. In survival analysis, which of the following methods is used to model the hazard function on a set of predictor variables?
Answer
Correct Answer:
coxph()
Note: This Question is unanswered, help us to find answer for this one
35. Which of the following is the correct R syntax used for selecting certain rows from a data frame, based on specific logical criteria?
Note: This Question is unanswered, help us to find answer for this one
36. In Google Analytics tool, which of the following analysis should be performed in order to identify the origin of a user's web traffic?
Answer
Correct Answer:
Acquisition analysis
Note: This Question is unanswered, help us to find answer for this one
37. If a user wants to learn about the top keywords that send traffic to his/her website, then which of the following acquisition segmentations should be preferred?
Answer
Correct Answer:
Organic traffic
Note: This Question is unanswered, help us to find answer for this one
38.
Consider the given information.
What should be the expenses budget (in Rs. thousands). if the salary of an individual is increased to Rs. 70 thousand?
Answer
Correct Answer:
11
Note: This Question is unanswered, help us to find answer for this one
39. ln data mining, according to Bayes‘ theorem, which of the following formulae represents posterior probability in terms of prior probability?
Answer
Correct Answer:
P(H/X) = P(X/H)P(H)/P(X)
Note: This Question is unanswered, help us to find answer for this one
40. Which of the following commands is used for starting iPython interface in inline Pylab mode and opening iPython notebook in pylab environment?
Answer
Correct Answer:
ipython notebook —pylab=inline
Note: This Question is unanswered, help us to find answer for this one
41.
Which of the following challenges are faced in text mining?
(i) No publication is in electronic form.
(ii) Large textual database.
(iii) Complex relationships between concepts in text.
(iv) Limited number Of possible dimensions.
Answer
Correct Answer:
Only (ii) and (iii)
Note: This Question is unanswered, help us to find answer for this one
42. In the Baysian model, which of the following is the correct representation of the joint density of (6, X), if it is known that for a given 0, the observed data x are a realization of pa?
Answer
Correct Answer:
n(0)p(xl0)
Note: This Question is unanswered, help us to find answer for this one
43. Which of the following is a non-probability sampling method?
Answer
Correct Answer:
Judgement sampling
Note: This Question is unanswered, help us to find answer for this one
44. Sam is popular for hitting a target in 6 out of 12 shots, whereas John can hit the same target in 8 out of 14 shots. What will be the probability that the target will be hit when they both try?
Answer
Correct Answer:
11/14
Note: This Question is unanswered, help us to find answer for this one
45. It is given that there are 15 pairs of readings on X and Y such that the coefficient of correlation is 0.87. It is also given that the standard deviation on is 5.60. What will be the approximate standard error of estimate of Y on X?
Answer
Correct Answer:
2.8
Note: This Question is unanswered, help us to find answer for this one
46. While calculating rank correlation coefficient between sales and expenditure for a time period of12 years. the difference in rank for a year was mistakenly taken as 9 instead of 7 and as a result, the value Of rank correlation coefficient was calculated as 0.79. If the mistake is rectified, then what will be the approximate correct value of rank correlation coefficient?
Answer
Correct Answer:
0.90
Note: This Question is unanswered, help us to find answer for this one
47. In association rule mining, an itemset is considered to be closed in which of the following situations?
Answer
Correct Answer:
When none of its immediate supersets has the same support as the itemset.
Note: This Question is unanswered, help us to find answer for this one
48.
Consider the following data:
Average cost of wafers = Rs. 35
Average cost of chocolates = Rs. 37
Standard deviation of cost of wafers = 2.0
Standard deviation of cost of chocolates = 3.0
Correlation coefficient between the costs of chocolates and wafers = 0.7
What will be the expected cost of chocolates when the cost of wafers is Rs. 40?
Answer
Correct Answer:
Rs. 42.25
Note: This Question is unanswered, help us to find answer for this one
49.
Consider the following list:
squares_list = [2. 3. 5. 2. 8. 9. 7. 6}
What will be the output of the following Python command?
squares_list[-2]
Answer
Correct Answer:
7
Note: This Question is unanswered, help us to find answer for this one
50. Which of the following data mining algorithms is applied to a database containing a large number of transactions and also learns association rules?
Answer
Correct Answer:
Apriori
Note: This Question is unanswered, help us to find answer for this one
51. Which of the following statements is NOT correct about pandas?
Answer
Correct Answer:
Only labelled data can be placed into a pandas data structure.
Note: This Question is unanswered, help us to find answer for this one
52.
Consider the following list:
squares_list = [2, 3. S. 2. 8. 9. 7. 6}
In which of the following IR models of text mining, a document is represented by a set of key terms that is either chosen from a fixed set of key terms or automatically from the documents?
Answer
Correct Answer:
Boolean model
Note: This Question is unanswered, help us to find answer for this one
53. Which of the following statements is incorrect about sampling methods?
Answer
Correct Answer:
No specialized knowledge is required to use a sampling method.
Note: This Question is unanswered, help us to find answer for this one
54. Which of the following sampling methods is used for heterogeneous units of universe rather than the homogeneous units and can be adopted only when its population is known?
Answer
Correct Answer:
Stratified random sampling
Note: This Question is unanswered, help us to find answer for this one
55.
Consider the following parameters:
Vector input = x
Total number of digits displayed = digits
Minimum number of digits to the right of the decimal point = nsmall
Minimum width to be displayed by the padding blanks in the beginning = width
Term to denote the option used to display scientific notation = scientific
Term to denote the option used to display the string left. right or center =justify
Option used for eliminating the space in between two strings = collapse
Separator between the arguments = sep
As per string manipulation in R programming language, which of the following options is the correct syntax Of the format() function for formatting numbers and strings?