> > Data Analytics MCQs > Basic Data Analytics MCQs

Basic Data Analytics MCQ

1. ________ tools and techniques process data and do statistical analysis for insight and discovery.

Enterprise data governance

Proprietary information systems

Business Intelligence

Business Processes

Answer

Correct Answer: Business Intelligence

Note: This Question is unanswered, help us to find answer for this one

2. ____ case tools provide support for the coding and implementation phases.

Horizontal

Front-end

Back-end

Vertical

Answer

Correct Answer: Back-end

Note: This Question is unanswered, help us to find answer for this one

3. The ______ of a worksheet defines its appearance.

Form

Format

View

Record

Answer

Correct Answer: Format

Note: This Question is unanswered, help us to find answer for this one

4. Dirty data is ________.

Virus-infected data

Worm-infected data

Inaccurate, incomplete data

Stolen data

Answer

Correct Answer: Inaccurate, incomplete data

Note: This Question is unanswered, help us to find answer for this one

5. Diigo and delicious are ________ tools.

Social bookmarking

Research

Discussion group

Synchronous communication

Answer

Correct Answer: Social bookmarking

Note: This Question is unanswered, help us to find answer for this one

6. Data types that are created by the programmer are known as ________.

Variables

Abstract data types (ADTs)

Functions

Parameters

None of these

Answer

Correct Answer: Abstract data types (ADTs)

Note: This Question is unanswered, help us to find answer for this one

7. _______ reduces the number of bits in a file by identifying and eliminating redundancy

Lossless compression

Lossy compression

Bitmap

Data visualization

Answer

Correct Answer: Lossless compression

Note: This Question is unanswered, help us to find answer for this one

8.
Regression equation of Z on V is given as following:

Z = c +dV Using the least square method, which of the given normal equations will be used to calculate the values of c and d?

Answer

Correct Answer:

Note: This Question is unanswered, help us to find answer for this one

9.
Regression equation of Z on V is given as following:
7. = c + dV
The relationship between two variables a and b, is given as b + 6a = 20 and between another two variables c and d, as 4c + 10d = 50. The regression coefficient of c on a is given as 0.90. Find the regression coefficient of d on b.

2/49

3/50

3/74

5/29

Answer

Correct Answer:

3/50

Note: This Question is unanswered, help us to find answer for this one

10. If the signif‌icance level of a test is 5%, what will be the outcome of the test if p-value obtained is greater than 0.05?

Reject null hypothesis

Fail to reject null hypothesis

Acceptance or rejection of null hypothesis is independent of p-value.

Answer

Correct Answer: Fail to reject null hypothesis

Note: This Question is unanswered, help us to find answer for this one

11. A parametric statistical model is given as: (S, P) with P = [P6 : e e 9]. Based on statistical notations, which of the following is the correct method of representing a?

e g R 0d

a = R2d

o c 2dR

a e dRz

Answer

Correct Answer: e g R 0d

Note: This Question is unanswered, help us to find answer for this one

12. In Web Analytics, which of the following metrics is monitored in the Ecommerce Dashboard?

Page load time by browser

Total sale by products

Conversion by blog post

Real time traff‌ic source

Answer

Correct Answer: Total sale by products

Note: This Question is unanswered, help us to find answer for this one

13.
If median weight is 46. compute the missing frequency in the given table.

Answer

Correct Answer:

Note: This Question is unanswered, help us to find answer for this one

14.
The given data shows the relation between the number of students enrolled in an institute and their age.

Which of the following is the appropriate regression equation for the given data?

y = 4.261 +1.239x

y = 3.456 +1.128x

y = 4.338 +1.244x

y = 3.125 +1.045x

Answer

Correct Answer:

y = 4.261 +1.239x

Note: This Question is unanswered, help us to find answer for this one

15. A user can obtain the pageviews of a website with the help of which of the following web analytics goals?

Pages/session goal

Duration goal

Destination goal

Event goals

Answer

Correct Answer: Destination goal

Note: This Question is unanswered, help us to find answer for this one

16. In association rule mining, which of the following statements is correct about Frequent Itemset Generation of the two-step approach?

Generates only one itemset whose support 2 minsup

Generates all itemsets whose support 5 minsup

Generates high confidence rules from each frequent itemset

Answer

Correct Answer: Generates all itemsets whose support 5 minsup

Note: This Question is unanswered, help us to find answer for this one

17. Which Of the following t-tests should be performed in order to compare means from two different groups?

One sample t—test

Paired samples t—test

Independent samples t-test

Analysis Of Variance (ANOVA)

Answer

Correct Answer: Independent samples t-test

Note: This Question is unanswered, help us to find answer for this one

18. Which of the following is the correct R command used for saving the contents of a workspace into the file. .RData?

attach.image("archive.RData“)

save.image(f‌ile="archive.RData")

attach("tempscales.RData")

save.image()

Answer

Correct Answer:

save.image()

Note: This Question is unanswered, help us to find answer for this one

19.
In association rule mining, under tree projection, node P of a tree stores which of the following information?
(i)Itemset for node P
(ii)List of possible lexicographic extensions of P
(iii)Pointer to projected database of its child node
(iv)Bitvector containing information about which transactions in the projected database contain the
itemset

Only (0, (ii) and (iv)

Only (ii). (iii) and (iv)

Only (0, (iii) and (iv)

All (i), (ii), (iii) and (iv)

Answer

Correct Answer:

Only (0, (ii) and (iv)

Note: This Question is unanswered, help us to find answer for this one

20. In data mining, which of the following classification models is built by kNN algorithm?

Decision tree classification model

Ensemble classification model

Hyperplane classification model

No classification model is built by kNN

Answer

Correct Answer: No classification model is built by kNN

Note: This Question is unanswered, help us to find answer for this one

21. In the linear discriminant function of discriminant function analysis, what is the function Of the following method?

It generates jacknifed predictions.

It is used to obtain the quadratic discriminant function.

It prints discriminant functions based on variables that are centered, but not standardized.

It can display the results of a linear or quadratic classification with two variables at a time.

Answer

Correct Answer: It prints discriminant functions based on variables that are centered, but not standardized.

Note: This Question is unanswered, help us to find answer for this one

22. Which of the following is the correct way of expressing null hypothesis of the lower tail test of the population mean? It is given that uo is a hypothesized lower bound of the true population mean

up 5 ll

Po < l4

P0 = l1

po 2 p

Answer

Correct Answer: up 5 ll

Note: This Question is unanswered, help us to find answer for this one

23. For a given set of 25 items, coeff‌icient of correlation between x and y is 0.6. The values of the arithmetic mean of x and y are 14 and 18, respectively, and the values of standard deviation of x and y are 4 and 6. respectively. If the pair (25. 18) has been wrongly taken as (18, 25). then find the correct value of correlation coeff‌icient.

0.31

0.42

0.51

0.67

Answer

Correct Answer: 0.51

Note: This Question is unanswered, help us to find answer for this one

24. In association rule mining, an indication of how often the rule has been found to be true is represented by a term known as confidence. How is this term. confidence. represented for the rule, A => B?

conf(A => B) = supp(A U B) / supp(A)

conf(A => B) = supp(B) / supp(A)

conf(A => B) = supp(A U B) / supp(A)‘ supp(B)

conf(A => B) = supp(A U B) / 1 - supp(A)

Answer

Correct Answer: conf(A => B) = supp(A U B) / supp(A)

Note: This Question is unanswered, help us to find answer for this one

25. In which of the key technologies, which are used for extracting business value from big data, data is managed as a strategic. core asset with ongoing process control for big data analytics?

Information management for big data

High-performance analytics for big data

Flexible deployment options for big data

Answer

Correct Answer: Information management for big data

Note: This Question is unanswered, help us to find answer for this one

26.
For a group of employees of an organization, f‌ind the mean salary (in thousands) using the given data.

36.88

33.36

35.59

37.42

Answer

Correct Answer:

33.36

Note: This Question is unanswered, help us to find answer for this one

27.
Consider the given data.

Find the regression equation Of Y on X and the total variation in Y.

Regression equation: Y = 2.5x +1.5, total variation in Y: 500

Regression equation: Y = 2X + 3, total variation in Y: 0

Regression equation: Y = 2.25X +1.5, total variation in Y: 250

Regression equation: Y = 2.25X + 3, total variation in Y: 1000

Answer

Correct Answer:

Regression equation: Y = 2.25X +1.5, total variation in Y: 250

Note: This Question is unanswered, help us to find answer for this one

28. Which of the following clustering algorithms can handle noisy data?

CURE

ROCK

BIRCH

Chameleon

Answer

Correct Answer: CURE

Note: This Question is unanswered, help us to find answer for this one

29.
What is the function of the following R command?
dataframename.colnames <— namesfdataframename)

It is used to see all the column names in a data frame.

It is used to store the column names of a data frame in the variable. Dataframename.colnames.

It is used to access the data in a data frame corresponding to their column names.

Answer

Correct Answer:

It is used to store the column names of a data frame in the variable. Dataframename.colnames.

Note: This Question is unanswered, help us to find answer for this one

30.
Suppose a user has typed the following command, where mdata is a variable to which the user's data is stored. head(mdata)

Answer

Correct Answer:

Note: This Question is unanswered, help us to find answer for this one

31.
In which of the following types of reasoning in data science, the conclusions reached are probable,
reasonable. plausible and believable?
Deductive reasoning
Inductive reasoning

Only1

Only 2

Both 1 and 2

Neither1 nor 2

Answer

Correct Answer:

Only 2

Note: This Question is unanswered, help us to find answer for this one

32. Which of the following statements is NOT correct about data science?

It is used for turning data into actions.

It supports and encourages shifting between deductive and inductive reasoning.

In order to achieve success. organizations need to reach maximum data science maturity.

It is necessary for companies to stay with the pack and compete in future.

Answer

Correct Answer: In order to achieve success. organizations need to reach maximum data science maturity.

Note: This Question is unanswered, help us to find answer for this one

33. Which of the following is a descriptive function involved in data mining?

Evolution analysis

Prediction

Outlier analysis

Mining of associations

Answer

Correct Answer: Mining of associations

Note: This Question is unanswered, help us to find answer for this one

34. In survival analysis, which of the following methods is used to model the hazard function on a set of predictor variables?

Surv()

coxph()

survdiff()

survf‌it()

Answer

Correct Answer: coxph()

Note: This Question is unanswered, help us to find answer for this one

35. Which of the following is the correct R syntax used for selecting certain rows from a data frame, based on specif‌ic logical criteria?

select(dataframename, logical expression)

f‌ilter(logical expression, dataframename)

f‌ilter(dataframename, logical expression)

select(logical expression, dataframename)

Answer

Correct Answer: f‌ilter(dataframename, logical expression)

Note: This Question is unanswered, help us to find answer for this one

36. In Google Analytics tool, which of the following analysis should be performed in order to identify the origin of a user's web traff‌ic?

Acquisition analysis

Audience analysis

Behavior analysis

Conversion analysis

Answer

Correct Answer: Acquisition analysis

Note: This Question is unanswered, help us to find answer for this one

37. If a user wants to learn about the top keywords that send traff‌ic to his/her website, then which of the following acquisition segmentations should be preferred?

Referrals traff‌ic

Organic traff‌ic

Direct traff‌ic

Social traff‌ic

Answer

Correct Answer: Organic traff‌ic

Note: This Question is unanswered, help us to find answer for this one

38.
Consider the given information.

What should be the expenses budget (in Rs. thousands). if the salary of an individual is increased to Rs. 70 thousand?

Answer

Correct Answer:

Note: This Question is unanswered, help us to find answer for this one

39. ln data mining, according to Bayes‘ theorem, which of the following formulae represents posterior probability in terms of prior probability?

P(X/H) = P(H/X)P(H)/P(X)

P(H/X) = P(X/H)P(H)/P(X)

P(H/X) = P(X/H)P(X)/P(H)

P(XIH) = P(H/X)/P(H)P(X)

Answer

Correct Answer: P(H/X) = P(X/H)P(H)/P(X)

Note: This Question is unanswered, help us to find answer for this one

40. Which of the following commands is used for starting iPython interface in inline Pylab mode and opening iPython notebook in pylab environment?

ipython —pylab=in|ine

ipython —pylab=inline -notebook

ipython=notebook —pylab.in|ine

ipython notebook —pylab=inline

Answer

Correct Answer: ipython notebook —pylab=inline

Note: This Question is unanswered, help us to find answer for this one

41.
Which of the following challenges are faced in text mining?
(i) No publication is in electronic form.
(ii) Large textual database.
(iii) Complex relationships between concepts in text.
(iv) Limited number Of possible dimensions.

Only (i) and (ii)

Only (iii) and (iv)

Only (ii) and (iii)

Only (i) and (iv)

Answer

Correct Answer:

Only (ii) and (iii)

Note: This Question is unanswered, help us to find answer for this one

42. In the Baysian model, which of the following is the correct representation of the joint density of (6, X), if it is known that for a given 0, the observed data x are a realization of pa?

n(xl0)p(x)

n(0)p(x)

n(0)p(xl0)

nl(x)p(0lx)

Answer

Correct Answer: n(0)p(xl0)

Note: This Question is unanswered, help us to find answer for this one

43. Which of the following is a non-probability sampling method?

Judgement sampling

Stratified random sampling

Cluster sampling

Multistage random sampling

Answer

Correct Answer: Judgement sampling

Note: This Question is unanswered, help us to find answer for this one

44. Sam is popular for hitting a target in 6 out of 12 shots, whereas John can hit the same target in 8 out of 14 shots. What will be the probability that the target will be hit when they both try?

11/14

13/14

1/14

3/14

Answer

Correct Answer: 11/14

Note: This Question is unanswered, help us to find answer for this one

45. It is given that there are 15 pairs of readings on X and Y such that the coeff‌icient of correlation is 0.87. It is also given that the standard deviation on is 5.60. What will be the approximate standard error of estimate of Y on X?

2.5

2.8

3.2

3.4

Answer

Correct Answer: 2.8

Note: This Question is unanswered, help us to find answer for this one

46. While calculating rank correlation coeff‌icient between sales and expenditure for a time period of12 years. the difference in rank for a year was mistakenly taken as 9 instead of 7 and as a result, the value Of rank correlation coefficient was calculated as 0.79. If the mistake is rectified, then what will be the approximate correct value of rank correlation coeff‌icient?

0.88

0.82

0.95

0.90

Answer

Correct Answer: 0.90

Note: This Question is unanswered, help us to find answer for this one

47. In association rule mining, an itemset is considered to be closed in which of the following situations?

When all of its immediate supersets have the same support as the itemset.

When none of its immediate subsets has the same support as the itemset.

When all of its immediate subsets have the same support as the itemset.

When none of its immediate supersets has the same support as the itemset.

Answer

Correct Answer: When none of its immediate supersets has the same support as the itemset.

Note: This Question is unanswered, help us to find answer for this one

48.
Consider the following data:
Average cost of wafers = Rs. 35
Average cost of chocolates = Rs. 37
Standard deviation of cost of wafers = 2.0
Standard deviation of cost of chocolates = 3.0
Correlation coeff‌icient between the costs of chocolates and wafers = 0.7
What will be the expected cost of chocolates when the cost of wafers is Rs. 40?

Rs. 42.25

Rs. 45

Rs. 39.85

Rs. 41.75

Answer

Correct Answer:

Rs. 42.25

Note: This Question is unanswered, help us to find answer for this one

49.
Consider the following list:
squares_list = [2. 3. 5. 2. 8. 9. 7. 6}
What will be the output of the following Python command?
squares_list[-2]

-2

It is an invalid command.

Answer

Correct Answer:

Note: This Question is unanswered, help us to find answer for this one

50. Which of the following data mining algorithms is applied to a database containing a large number of transactions and also learns association rules?

K-means

C45

Apriori

Answer

Correct Answer: Apriori

Note: This Question is unanswered, help us to find answer for this one

51. Which of the following statements is NOT correct about pandas?

It is well suited for tabular data with heterogeneously—typed columns.

Only labelled data can be placed into a pandas data structure.

It is suitable for arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels.

Ordered and unordered (not necessarily f‌ixed-frequency) time series data can also be analyzed with pandas.

Answer

Correct Answer: Only labelled data can be placed into a pandas data structure.

Note: This Question is unanswered, help us to find answer for this one

52.
Consider the following list:
squares_list = [2, 3. S. 2. 8. 9. 7. 6}
In which of the following IR models of text mining, a document is represented by a set of key terms that is either chosen from a f‌ixed set of key terms or automatically from the documents?

Vector model

Boolean model

Connectionist model

probabilistic model

Answer

Correct Answer:

Boolean model

Note: This Question is unanswered, help us to find answer for this one

53. Which of the following statements is incorrect about sampling methods?

Data can be collected faster in a sampling method.

A sampling method provides the facility to organize and execute the research work conveniently.

It is less expensive.

No specialized knowledge is required to use a sampling method.

Answer

Correct Answer: No specialized knowledge is required to use a sampling method.

Note: This Question is unanswered, help us to find answer for this one

54. Which of the following sampling methods is used for heterogeneous units of universe rather than the homogeneous units and can be adopted only when its population is known?

Simple random sampling

Stratified random sampling

Extensive sampling f

Quota sampling

Answer

Correct Answer: Stratified random sampling

Note: This Question is unanswered, help us to find answer for this one

55.
Consider the following parameters:
Vector input = x
Total number of digits displayed = digits
Minimum number of digits to the right of the decimal point = nsmall
Minimum width to be displayed by the padding blanks in the beginning = width
Term to denote the option used to display scientif‌ic notation = scientific
Term to denote the option used to display the string left. right or center =justify
Option used for eliminating the space in between two strings = collapse
Separator between the arguments = sep
As per string manipulation in R programming language, which of the following options is the correct syntax Of the format() function for formatting numbers and strings?

format(digits, nsmall, sep = ' ", width = NULL)

format(x. nsmall, height. scientif‌ic = NULL. ,collapse = NULL. sep = " ")

format( x, nsmall, collapse, digits, justify = C("Ieft", "right", "centre", "none"), width = NULL)

format(x. digits, nsmall, scientif‌ic, width, justify = c("left", "right", "centre", "none"))

Answer

Correct Answer:

format(x. digits, nsmall, scientif‌ic, width, justify = c("left", "right", "centre", "none"))

Note: This Question is unanswered, help us to find answer for this one

56. Which of the following statements is correct about the NOT NULL modeling flag used in the Microsoft time series algorithm?

It applies to mining model columns.

It applies to mining structure columns.

It applies to both mining model columns and mining structure columns.

It applies neither to mining model columns nor to mining structure columns.

Answer

Correct Answer: It applies to mining structure columns.

Note: This Question is unanswered, help us to find answer for this one

57. As per Microsoft association rules algorithm, which of the following Options is the prediction function with scalar value as the return type?

lsInNode (DMX)

PredictAssociation(DMX)

PredictAdjustedProbability(DMX)

PredictHistogram(DMX)

Answer

Correct Answer: PredictAdjustedProbability(DMX)

Note: This Question is unanswered, help us to find answer for this one

58. As per Microsoft association rules algorithm, which of the following prediction functions has/have a Boolean return type?

lslnNode()

lsDescendant()

PredictNodeld (DMX)

Both a and b

None of the above

Answer

Correct Answer: Both a and b

Note: This Question is unanswered, help us to find answer for this one

59. According to advanced statistics generalized linear model, which of the following is the default link function for the gaussian family?

(link = '’identity")

(link = '’Iogit")

(link = ‘'Iog")

(link = ”inverse")

Answer

Correct Answer: (link = '’identity")

Note: This Question is unanswered, help us to find answer for this one

Basic Data Analytics MCQ

1. ________ tools and techniques process data and do statistical analysis for insight and discovery.

2. ____ case tools provide support for the coding and implementation phases.

3. The ______ of a worksheet defines its appearance.

4. Dirty data is ________.

5. Diigo and delicious are ________ tools.

6. Data types that are created by the programmer are known as ________.

7. _______ reduces the number of bits in a file by identifying and eliminating redundancy

8. Regression equation of Z on V is given as following:Z = c +dV Using the least square method, which of the given normal equations will be used to calculate the values of c and d?

9. Regression equation of Z on V is given as following:7. = c + dVThe relationship between two variables a and b, is given as b + 6a = 20 and between another two variables c and d, as 4c + 10d = 50. The regression coefficient of c on a is given as 0.90. Find the regression coefficient of d on b.

10. If the signif‌icance level of a test is 5%, what will be the outcome of the test if p-value obtained is greater than 0.05?

11. A parametric statistical model is given as: (S, P) with P = [P6 : e e 9]. Based on statistical notations, which of the following is the correct method of representing a?

12. In Web Analytics, which of the following metrics is monitored in the Ecommerce Dashboard?

13. If median weight is 46. compute the missing frequency in the given table.

14. The given data shows the relation between the number of students enrolled in an institute and their age.Which of the following is the appropriate regression equation for the given data?

15. A user can obtain the pageviews of a website with the help of which of the following web analytics goals?

16. In association rule mining, which of the following statements is correct about Frequent Itemset Generation of the two-step approach?

17. Which Of the following t-tests should be performed in order to compare means from two different groups?

18. Which of the following is the correct R command used for saving the contents of a workspace into the file. .RData?

20. In data mining, which of the following classification models is built by kNN algorithm?

21. In the linear discriminant function of discriminant function analysis, what is the function Of the following method?

22. Which of the following is the correct way of expressing null hypothesis of the lower tail test of the population mean? It is given that uo is a hypothesized lower bound of the true population mean

24. In association rule mining, an indication of how often the rule has been found to be true is represented by a term known as confidence. How is this term. confidence. represented for the rule, A => B?

25. In which of the key technologies, which are used for extracting business value from big data, data is managed as a strategic. core asset with ongoing process control for big data analytics?

26. For a group of employees of an organization, f‌ind the mean salary (in thousands) using the given data.

27. Consider the given data.Find the regression equation Of Y on X and the total variation in Y.

28. Which of the following clustering algorithms can handle noisy data?

29. What is the function of the following R command?dataframename.colnames <— namesfdataframename)

30. Suppose a user has typed the following command, where mdata is a variable to which the user's data is stored. head(mdata)

31. In which of the following types of reasoning in data science, the conclusions reached are probable,reasonable. plausible and believable?Deductive reasoningInductive reasoning

32. Which of the following statements is NOT correct about data science?

33. Which of the following is a descriptive function involved in data mining?

34. In survival analysis, which of the following methods is used to model the hazard function on a set of predictor variables?

35. Which of the following is the correct R syntax used for selecting certain rows from a data frame, based on specif‌ic logical criteria?

36. In Google Analytics tool, which of the following analysis should be performed in order to identify the origin of a user's web traff‌ic?

37. If a user wants to learn about the top keywords that send traff‌ic to his/her website, then which of the following acquisition segmentations should be preferred?

38. Consider the given information.What should be the expenses budget (in Rs. thousands). if the salary of an individual is increased to Rs. 70 thousand?

39. ln data mining, according to Bayes‘ theorem, which of the following formulae represents posterior probability in terms of prior probability?

40. Which of the following commands is used for starting iPython interface in inline Pylab mode and opening iPython notebook in pylab environment?

41. Which of the following challenges are faced in text mining?(i) No publication is in electronic form.(ii) Large textual database.(iii) Complex relationships between concepts in text.(iv) Limited number Of possible dimensions.

42. In the Baysian model, which of the following is the correct representation of the joint density of (6, X), if it is known that for a given 0, the observed data x are a realization of pa?

43. Which of the following is a non-probability sampling method?

44. Sam is popular for hitting a target in 6 out of 12 shots, whereas John can hit the same target in 8 out of 14 shots. What will be the probability that the target will be hit when they both try?

45. It is given that there are 15 pairs of readings on X and Y such that the coeff‌icient of correlation is 0.87. It is also given that the standard deviation on is 5.60. What will be the approximate standard error of estimate of Y on X?

47. In association rule mining, an itemset is considered to be closed in which of the following situations?

49. Consider the following list:squares_list = [2. 3. 5. 2. 8. 9. 7. 6}What will be the output of the following Python command? squares_list[-2]

50. Which of the following data mining algorithms is applied to a database containing a large number of transactions and also learns association rules?

51. Which of the following statements is NOT correct about pandas?

52. Consider the following list:squares_list = [2, 3. S. 2. 8. 9. 7. 6}In which of the following IR models of text mining, a document is represented by a set of key terms that is either chosen from a f‌ixed set of key terms or automatically from the documents?

53. Which of the following statements is incorrect about sampling methods?

54. Which of the following sampling methods is used for heterogeneous units of universe rather than the homogeneous units and can be adopted only when its population is known?

56. Which of the following statements is correct about the NOT NULL modeling flag used in the Microsoft time series algorithm?

57. As per Microsoft association rules algorithm, which of the following Options is the prediction function with scalar value as the return type?

58. As per Microsoft association rules algorithm, which of the following prediction functions has/have a Boolean return type?

59. According to advanced statistics generalized linear model, which of the following is the default link function for the gaussian family?

60. As per the Microsoft naive bayes algorithm, which two of the following options are the correct syntax of the Predict (DMX) prediction function?

61. In advanced statistics. which of the following regression methods is used to model variables within the (0, 1) range?

62. What will be the output of the following code of the R programming language?b1 <- 17b2 <- 13z <— 5:7print(b1 96in96 z)print(b2 %in% z)

63. What is the rank of the matrix shown in the image?

64. In the given image, which matrix is in the row echelon form?

65. Which of the given options will be the output of the following code when it is executed in R?var <— c(8.4.NA.12)mean(var, na.rm=TRUE)

66. In data mining, which of the following options correctly def‌ines Precision, which is used for assessing the quality of text retrieval?

67. Which of the following advanced statistics techniques is used for identifying latent variables that form groups?

68. Which of the following is the default value of the parameter HISTORICAL_MODEL_GAP used in Microsoft time series algorithm?

69. With respect to advanced statistics, which of the following options is correct about the arimaO function?

70. Which of the following options represent correct application of the time series analysis?i) Yield Projectionsii) Workload Projectionsiii) Census Analysisiv) Inventory Studies

71. Find the output of the following code of the R programming language.Iista <- Iist(5:7)print(lista)Iistb <-Iist(12:14)print(listb)x1 <- unlist(lista)x2 <- unlist(listb)print(xl)print(x2)r <- x1+x2print(r)

72. Which of the following statements is correct about the PREDICTION_SMOOTHING parameter used in the Microsoft time series algorithm?

73. Using the following information, find the correct syntax of the R function used for creating binary f‌iles.Assume object as the binary file to be written. n as the number Of bytes and con as the connection object.

74. Which of the following statements is correct about the Predictable column supported by the Microsoft linear regression algorithm?

75. Which of the following is the correct syntax of the PredictSupport (DMX) prediction function used with Microsoft linear regression algorithm?

76. Which of the following regression techniques attempts maximizing the prediction power with minimum number of predictor variables?

77. A user wants to read and print the contents of a CSV file named myexample-csv that is present in his current working directory. Which of the following is the correct syntax of the command that should be executed by him to accomplish this task?

78. Choose True or False.Text mining is used in spam filtering. content enrichment and contextual advertising.

79. In data mining, which of the following options is the correct syntax for association?

80. What will the following R code do?mydata$v2 <- mydata$v4 <- NULL

81. IN SOL Server data mining, which of the following algorithm types predicts one or more discrete variables that are based on other attributes in a dataset?

82. Find the output of the following R programming language code.a <- c(7.5.FALSE.4+4i)b <- c(6,0,TRUE,4+7i)print(a&&b)

83. What will be the output of the following code of the R programming language?a <- c(9,0.FALSE,2+9i)b <- c(8,0,TRUE,2+7i)print(alb)

84. In the given image, which set of vectors is linearly independent?

8.
Regression equation of Z on V is given as following:

Z = c +dV Using the least square method, which of the given normal equations will be used to calculate the values of c and d?

9.
Regression equation of Z on V is given as following:
7. = c + dV
The relationship between two variables a and b, is given as b + 6a = 20 and between another two variables c and d, as 4c + 10d = 50. The regression coefficient of c on a is given as 0.90. Find the regression coefficient of d on b.

13.
If median weight is 46. compute the missing frequency in the given table.

14.
The given data shows the relation between the number of students enrolled in an institute and their age.

Which of the following is the appropriate regression equation for the given data?

26.
For a group of employees of an organization, f‌ind the mean salary (in thousands) using the given data.

27.
Consider the given data.

Find the regression equation Of Y on X and the total variation in Y.

29.
What is the function of the following R command?
dataframename.colnames <— namesfdataframename)

30.
Suppose a user has typed the following command, where mdata is a variable to which the user's data is stored. head(mdata)

31.
In which of the following types of reasoning in data science, the conclusions reached are probable,
reasonable. plausible and believable?
Deductive reasoning
Inductive reasoning

38.
Consider the given information.

What should be the expenses budget (in Rs. thousands). if the salary of an individual is increased to Rs. 70 thousand?

41.
Which of the following challenges are faced in text mining?
(i) No publication is in electronic form.
(ii) Large textual database.
(iii) Complex relationships between concepts in text.
(iv) Limited number Of possible dimensions.

49.
Consider the following list:
squares_list = [2. 3. 5. 2. 8. 9. 7. 6}
What will be the output of the following Python command?
squares_list[-2]

52.
Consider the following list:
squares_list = [2, 3. S. 2. 8. 9. 7. 6}
In which of the following IR models of text mining, a document is represented by a set of key terms that is either chosen from a f‌ixed set of key terms or automatically from the documents?

62.
What will be the output of the following code of the R programming language?
b1 <- 17
b2 <- 13
z <— 5:7
print(b1 96in96 z)
print(b2 %in% z)

63.
What is the rank of the matrix shown in the image?

64.
In the given image, which matrix is in the row echelon form?

65.
Which of the given options will be the output of the following code when it is executed in R?
var <— c(8.4.NA.12)
mean(var, na.rm=TRUE)

70.
Which of the following options represent correct application of the time series analysis?
i) Yield Projections
ii) Workload Projections
iii) Census Analysis
iv) Inventory Studies

71.
Find the output of the following code of the R programming language.
Iista <- Iist(5:7)
print(lista)
Iistb <-Iist(12:14)
print(listb)
x1 <- unlist(lista)
x2 <- unlist(listb)
print(xl)
print(x2)
r <- x1+x2
print(r)

73.
Using the following information, find the correct syntax of the R function used for creating binary f‌iles.
Assume object as the binary file to be written. n as the number Of bytes and con as the connection object.

77.
A user wants to read and print the contents of a CSV file named myexample-csv that is present in his current working directory. Which of the following is the correct syntax of the command that should be executed by him to accomplish this task?

78.
Choose True or False.
Text mining is used in spam filtering. content enrichment and contextual advertising.

80.
What will the following R code do?
mydata$v2 <- mydata$v4 <- NULL

82.
Find the output of the following R programming language code.
a <- c(7.5.FALSE.4+4i)
b <- c(6,0,TRUE,4+7i)
print(a&&b)

83.
What will be the output of the following code of the R programming language?
a <- c(9,0.FALSE,2+9i)
b <- c(8,0,TRUE,2+7i)
print(alb)

84.
In the given image, which set of vectors is linearly independent?

85.
What will be the output of the following R code?
c(4,7,TRUE,3+7i) -> v1
c(9,6,FALSE,3+7i) ->> v2
print(v1)
print(v2)

86.
Find the output of the following code of the R programming language.
z1 <- c(4,3,TRUE,2+6i)
z2 <- c(4,7,TRUE.2+7i)
print(z1&22)

87.
Find the output of the following R programming language code.
z1 <- c(7,5,8,4,4,16)
z2 <- c(9,6)
add.result <- 21+22
print(add.result)
sub.result <- 21-22
print(sub.result)

94.
For what purpose is the following R function run?
print(getwd)

95.

Consider the matrix Z given in f‌igure-1 of the image. Using the matrix methods. find the 1x3 vector. 9

96.
Select the value of X given in f‌igure-1, from the options given in figure-2.