1. The values of X and Y are given in figure-1 Of the image. Choose the correct value of 2X — 5Y from figure-2.
2. Which of the following types of time series analysis aims at separating periodic or cyclical components in a time series?
3. With respect to the Microsoft sequence clustering algorithm, which of the following options is the correct syntax of the PredictCaseLikelihood (DMX) function?
4. Which of the following is the correct syntax of the PredictVariance (DMX) prediction function used in Microsoft logistic regression algorithm?
5. Which of the following options represent(s) the correct application of association rule mining?
6. Which of the following options is/are the correct application(s) of text mining?
7. Select the value of X given in figure-1, from the options given in figure-2.
8. Consider the matrix Z given in figure-1 of the image. Using the matrix methods. find the 1x3 vector. 9
9. For what purpose is the following R function run? print(getwd)
10. With respect to Microsoft neural network algorithm. which of the following options is the neuron type that represents predictable attribute values for a data mining model?
11. Which of the following options is/are correct about the Microsoft naive bayes algorithm?
12. Which of the following options is correct about the logistic regression technique?
13. In data mining, which of the following options is correct about the regression algorithm?
14. As per the Microsoft association rules model. which of the following options is the correct viewer tab that combines information about itemsets and their relative value?
15. Which of the following statements is correct about the intervention analysis type of the time series analysis?
16. Which of the following is the correct syntax of the PredictAssociation prediction function used in the Microsoft association rule algorithm?
17. Which of the following is the correct default value of the MAXIMUM_ITEMSET_SIZE parameter, which is used with the Microsoft association rules algorithm?
18. With respect to advanced statistics, which of the following options is the correct syntax Of the glm() function?
19. Find the output of the following R programming language code. z1 <- c(7,5,8,4,4,16) z2 <- c(9,6) add.result <- 21+22 print(add.result) sub.result <- 21-22 print(sub.result)
20. Find the output of the following code of the R programming language. z1 <- c(4,3,TRUE,2+6i) z2 <- c(4,7,TRUE.2+7i) print(z1&22)
21. What will be the output of the following R code? c(4,7,TRUE,3+7i) -> v1 c(9,6,FALSE,3+7i) ->> v2 print(v1) print(v2)
22. Which of the following is the correct syntax of the command that will verify the installation of the xlsx package and load the library into R workspace?
23. As per the Microsoft sequence clustering algorithm, which of the following options is the correct syntax of the Cluster (DMX) prediction function?
24. In the given image, which set of vectors is linearly independent?
25. Which of the following text mining techniques can be used for finding groups of documents with similar content?
26. What will be the output of the following code of the R programming language? a <- c(9,0.FALSE,2+9i) b <- c(8,0,TRUE,2+7i) print(alb)
27. Find the output of the following R programming language code. a <- c(7.5.FALSE.4+4i) b <- c(6,0,TRUE,4+7i) print(a&&b)
28. IN SOL Server data mining, which of the following algorithm types predicts one or more discrete variables that are based on other attributes in a dataset?
29. What will the following R code do? mydata$v2 <- mydata$v4 <- NULL
30. In data mining, which of the following options is the correct syntax for association?
31. Text mining is used in spam filtering. content enrichment and contextual advertising.
32. A user wants to read and print the contents of a CSV file named myexample-csv that is present in his current working directory. Which of the following is the correct syntax of the command that should be executed by him to accomplish this task?
33. Which of the following regression techniques attempts maximizing the prediction power with minimum number of predictor variables?
34. Which of the following is the correct syntax of the PredictSupport (DMX) prediction function used with Microsoft linear regression algorithm?
35. Which of the following statements is correct about the Predictable column supported by the Microsoft linear regression algorithm?
36. find the correct syntax of the R function used for creating binary files. Assume object as the binary file to be written. n as the number Of bytes and con as the connection object.
37. Which of the following statements is correct about the PREDICTION_SMOOTHING parameter used in the Microsoft time series algorithm?
38. Find the output of the following code of the R programming language. Iista <- Iist(5:7) print(lista) Iistb <-Iist(12:14) print(listb) x1 <- unlist(lista) x2 <- unlist(listb) print(xl) print(x2) r <- x1+x2 print(r)
39. Which of the following is the correct default value for the INSTABILITY_SENSITIVITY parameter used with the Microsoft time series algorithm?
40. Which of the following is the correct syntax of the command used for merging two data frames, myFrame1 and myFrame2, by ID and Country?
41. From figure-2 Of the given image, select the Option representing the inverse of the matrix given in figure-1.
42. Which of the following options represent correct application of the time series analysis? i) Yield Projections ii) Workload Projections iii) Census Analysis iv) Inventory Studies
43. Which of the following is the correct syntax for the PredictAdjustedProbability (DMX) prediction function used with the Microsoft association rules algorithm?
44. With respect to advanced statistics, which of the following options is correct about the arimaO function?
45. In data mining, which of the following options is correct about the F-score measure for text retrieval?
46. Which of the following is the default value of the parameter HISTORICAL_MODEL_GAP used in Microsoft time series algorithm?
47. Which of the following advanced statistics techniques is used for identifying latent variables that form groups?
48. In data mining, which of the following options correctly defines Precision, which is used for assessing the quality of text retrieval?
49. Which of the given options will be the output of the following code when it is executed in R? var <— c(8.4.NA.12) mean(var, na.rm=TRUE)
50. Which of the following text retrieval measures is the percentage of documents, which are relevant to the query and were actually retrieved?
51. Which of the following is the correct default value of the HOLDOUT_PERCENTAGE parameter of the Microsoft logistic regression algorithm, which is used for specifying the percentage of cases within the training data used to calculate a holdout error?
52. In advanced statistics, which of the following statements is correct about the Dirichlet Regression method?
53. In the given image, which matrix is in the row echelon form?
54. What is the rank of the matrix shown in the image?
55. Which of the matrices given in figure-2 is the reduced row echelon form of the matrix given in figure-1 of the image?
56. What will be the output of the following code of the R programming language? b1 <- 17 b2 <- 13 z <— 5:7 print(b1 96in96 z) print(b2 %in% z)
57. In which of the following text mining methods, terms are analyzed on the sentence and document level?
58. In advanced statistics. which of the following regression methods is used to model variables within the (0, 1) range?
59. As per the Microsoft association rules algorithm, which of the following parameters specifies the minimum number of cases that must contain an itemset before the algorithm generates a rule?
60. Which of the following is the correct syntax of the lsDescendant (DMX) prediction function used in data mining?
61. As per the Microsoft naive bayes algorithm, which two of the following options are the correct syntax of the Predict (DMX) prediction function?
62. According to advanced statistics generalized linear model, which of the following is the default link function for the gaussian family?
63. Consider the following parameters: control - Optional parameters for controlling boot data. frequency - Specifies the number of observations per unit time. data - Specifies the data frame. bootobject - The Object returned by the boot function. conf- The desired confidence interval. type - The type of confidence interval returned. According to bootstrapping in advanced statistics. which of the following options is the correct syntax of the boot.cio function?
64. As per Microsoft association rules algorithm, which of the following prediction functions has/have a Boolean return type?
65. As per Microsoft association rules algorithm, which of the following Options is the prediction function with scalar value as the return type?
66. Which of the following options is the default CLUSTERING_METHOD used by the Microsoft clustering algorithm?
67. Which of the following options is the correct return type of the PredictHistogram (DMX) prediction function used by the Microsoft logistic regression algorithm?
68. Which of the following options is the parameter of the Microsoft time series algorithm, which is used for controlling the growth of a decision tree?
69. Which of the following statements is correct about the NOT NULL modeling flag used in the Microsoft time series algorithm?
70. Vector input = x Total number of digits displayed = digits Minimum number of digits to the right of the decimal point = nsmall Minimum width to be displayed by the padding blanks in the beginning = width Term to denote the option used to display scientific notation = scientific Term to denote the option used to display the string left. right or center =justify Option used for eliminating the space in between two strings = collapse Separator between the arguments = sep As per string manipulation in R programming language, which of the following options is the correct syntax Of the format() function for formatting numbers and strings?
71. Which of the following sampling methods is used for heterogeneous units of universe rather than the homogeneous units and can be adopted only when its population is known?
72. Which of the following statements is incorrect about sampling methods?
73. Consider the following list: squares_list = [2, 3. S. 2. 8. 9. 7. 6} In which of the following IR models of text mining, a document is represented by a set of key terms that is either chosen from a fixed set of key terms or automatically from the documents?
74. Which of the following statements is NOT correct about pandas?
75. Which of the following fundamental measures used for assessing the quality of text retrieval represent(s) the percentage of retrieved documents relevant to a query?
76. Which of the following data mining algorithms is applied to a database containing a large number of transactions and also learns association rules?
77. Consider the following list: squares_list = [2. 3. 5. 2. 8. 9. 7. 6} What will be the output of the following Python command? squares_list[-2]
78. While working in a Pylab environment, which of the following options do NOT need to be imported?
79. Consider the following data: Average cost of wafers = Rs. 35 Average cost of chocolates = Rs. 37 Standard deviation of cost of wafers = 2.0 Standard deviation of cost of chocolates = 3.0 Correlation coefficient between the costs of chocolates and wafers = 0.7 What will be the expected cost of chocolates when the cost of wafers is Rs. 40?
80. In association rule mining, an itemset is considered to be closed in which of the following situations?
81. It is given that a and b are two independent binomial variables having parameters 3,114 and 2,1/4, respectively. Find P (a + b 21).
82. The bag-of-words model is used in which of the following text mining processes?
83. For a group of 12 students, the sum of squares of differences in their ranks for science and math is given as 60. On the basis of the given information. find the value of rank correlation coefficient.
84. While calculating rank correlation coefficient between sales and expenditure for a time period of12 years. the difference in rank for a year was mistakenly taken as 9 instead of 7 and as a result, the value Of rank correlation coefficient was calculated as 0.79. If the mistake is rectified, then what will be the approximate correct value of rank correlation coefficient?
85. Which of the following clustering algorithms is used for grid-based partitioning?
86. It is given that there are 15 pairs of readings on X and Y such that the coefficient of correlation is 0.87. It is also given that the standard deviation on is 5.60. What will be the approximate standard error of estimate of Y on X?
87. Sam is popular for hitting a target in 6 out of 12 shots, whereas John can hit the same target in 8 out of 14 shots. What will be the probability that the target will be hit when they both try?
88. Which of the following is a non-probability sampling method?
89. Which of the following statements are NOT correct about the Bayesian belief network?
90. Which of the following statements is correct about the judgement sampling method?
91. In the Baysian model, which of the following is the correct representation of the joint density of (6, X), if it is known that for a given 0, the observed data x are a realization of pa?
92. Which of the following commands is used to observe the way an R object is structured? It is given that mydata is a variable where a user's data is stored.
93. In which of the following Big Data technologies, moving relevant data management, analytics and reporting tasks to where the data resides, improves speed to insight, reduces data movement and promotes better data governance?
94. Which of the following challenges are faced in text mining? (i) No publication is in electronic form. (ii) Large textual database. (iii) Complex relationships between concepts in text. (iv) Limited number Of possible dimensions.
95. Which of the following commands is used for starting iPython interface in inline Pylab mode and opening iPython notebook in pylab environment?
96. ln data mining, according to Bayes‘ theorem, which of the following formulae represents posterior probability in terms of prior probability?
97. In data mining, which of the following statements is NOT correct about C45 algorithm?
98. What should be the expenses budget (in Rs. thousands). if the salary of an individual is increased to Rs. 70 thousand?
99. If a user wants to learn about the top keywords that send traffic to his/her website, then which of the following acquisition segmentations should be preferred?
100. In Google Analytics tool, which of the following analysis should be performed in order to identify the origin of a user's web traffic?
101. Which of the following types of association mining discovers subsequences that are common to more than the minsup sequences in a sequence database?
102. Which of the following factors is responsible for the occurrence of sampling errors?
103. In data mining, which of the following is the correct syntax for defining recall, which is used to assess the quality of text retrieval?
104. Which of the following is the correct R syntax used for selecting certain rows from a data frame, based on specific logical criteria?
105. In survival analysis, which of the following methods is used to model the hazard function on a set of predictor variables?
106. Which of the following is a descriptive function involved in data mining?
107. Which of the following statements is NOT correct about data science?
108. In which of the following types of reasoning in data science, the conclusions reached are probable, reasonable. plausible and believable? Deductive reasoning Inductive reasoning
109. Suppose a user has typed the following command, where mdata is a variable to which the user's data is stored. head(mdata)
110. Data science is used in which of the following industries? (i) Financial services (ii) Digital advertisements (iii) Healthcare (iv) Image recognition
111. What is the function of the following R command? dataframename.colnames <— namesfdataframename)
112. Which of the following clustering algorithms can handle noisy data?
113. Find the regression equation Of Y on X and the total variation in Y.
114. Which of the following statements is correct about the query-driven approach of data warehousing?
115. For a group of employees of an organization, find the mean salary (in thousands) using the given data.
116. It is given that y is a Poisson variate and satisfies the condition P(y=4) = P(y=5). What are the values of mean and standard deviation of y?
117. In logistic regression. which of the given methods is used to display the conditional density plot of the binary outcome, F. on the continuous x variable?
118. Which of the following functions is used to decompose a time series with additive trend, and seasonal and irregular components?
119. In data mining, which of the following models is/are used to predict the categorical class labels?
120. In which of the key technologies, which are used for extracting business value from big data, data is managed as a strategic. core asset with ongoing process control for big data analytics?
121. In association rule mining, an indication of how often the rule has been found to be true is represented by a term known as confidence. How is this term. confidence. represented for the rule, A => B?
122. For a given set of 25 items, coefficient of correlation between x and y is 0.6. The values of the arithmetic mean of x and y are 14 and 18, respectively, and the values of standard deviation of x and y are 4 and 6. respectively. If the pair (25. 18) has been wrongly taken as (18, 25). then find the correct value of correlation coefficient.
123. Which of the following is the correct way of expressing null hypothesis of the lower tail test of the population mean? It is given that uo is a hypothesized lower bound of the true population mean
124. In data mining, which of the following parts of a decision tree represents the outcome ofa test?
125. Which of the following statements is/are correct about an SAS differentiator?
126. Which of the following is correct about classification of data?
127. The following code represents a function performed in data mining; identify the function represented. mine comparison [as {pattern_name]} For (target_class } where {t arget_condition ) {versus {contrast_class_i } where [contrachondition_i]} analyze [measure(s) ]
128. In a generalized linear model. which of the following link functions belongs. by default, to Poisson family?
129. In the linear discriminant function of discriminant function analysis, what is the function Of the following method?
130. In data mining, which of the following classification models is built by kNN algorithm?
131. In data mining, which of the following is the correct syntax of the foil method, FOIL_Prune, used for rule pruning for a rule R? It is given that p is the number of positive tuples covered by R and n is the number of negative tuples covered by R.
132. In association rule mining, under tree projection, node P of a tree stores which of the following information? (i)Itemset for node P (ii)List of possible lexicographic extensions of P (iii)Pointer to projected database of its child node (iv)Bitvector containing information about which transactions in the projected database contain the itemset
133. Which of the following options denotes the probability of avoiding a type-ll error in hypothesis testing?
134. Which of the following is the correct R command used for saving the contents of a workspace into the file. .RData?
135. Which of the given options is the correct way of representing the regression equation of Y on X. given that byx is regression coefficient of Y on X?
136. In hypothesis testing. what will you call a population whose data is categorical and belongs to a collection Of discrete non-overlapping classes?
137. Which Of the following t-tests should be performed in order to compare means from two different groups?
138. By default, which of the following events is/are set using the KlSSmetrics analytics tool? (i) Visited site (ii) Search engine hit
139. In association rule mining, which of the following statements is correct about Frequent Itemset Generation of the two-step approach?
140. A user can obtain the pageviews of a website with the help of which of the following web analytics goals?
141. If there is some data with missing values and you need to read a help file of a function, say median, then which of the following is the correct R syntax to do so?
142. The given data shows the relation between the number of students enrolled in an institute and their age. Which of the following is the appropriate regression equation for the given data?
143. If median weight is 46. compute the missing frequency in the given table.
144. In Web Analytics, which of the following metrics is monitored in the Ecommerce Dashboard?
145. A parametric statistical model is given as: (S, P) with P = [P6 : e e 9]. Based on statistical notations, which of the following is the correct method of representing a?
146. If the significance level of a test is 5%, what will be the outcome of the test if p-value obtained is greater than 0.05?
147. Regression equation of Z on V is given as following: 7. = c + dV The relationship between two variables a and b, is given as b + 6a = 20 and between another two variables c and d, as 4c + 10d = 50. The regression coefficient of c on a is given as 0.90. Find the regression coefficient of d on b.
148. Which of the following is the default value of the parameter HlSTORlCAL_MODEL_GAP used in Microsoft time series algorithm?
149. Z = c +dV Using the least square method, which of the given normal equations will be used to calculate the values of c and d?
150. Which of the following is the DMQL syntax that is used for specifying task-relevant data?
151. _______ reduces the number of bits in a file by identifying and eliminating redundancy
152. Data types that are created by the programmer are known as ________.
153. Diigo and delicious are ________ tools.
154. Dirty data is ________.
155. The ______ of a worksheet defines its appearance.
156. ____ case tools provide support for the coding and implementation phases.
157. ________ tools and techniques process data and do statistical analysis for insight and discovery.