Latest 2014 Pass4sure and Lead2pass Cloudera DS-200 Practice Tests

July 23, 2014

Vendor: Cloudera
Exam Code: DS-200
Exam Name: Data Science Essentials

QUESTION 1
Why should stop an interactive machine learning algorithm as soon as the performance of the model on a test set stops improving?

A.    To avoid the need for cross-validating the model
B.    To prevent overfitting
C.    To increase the VC (VAPNIK-Chervonenkis) dimension for the model
D.    To keep the number of terms in the model as possible
E.    To maintain the highest VC (Vapnik-Chervonenkis) dimension for the model

Answer: B

QUESTION 2
What is default delimiter for Hive tables?

A.    ^A (Control-A)
B.    , (comma)
C.    \t (tab)
D.    : (colon)

Answer: A
Explanation:
http://blog.spryinc.com/2013/10/four-useful-tricks-for-working-with-hive.html(change the delimiter when exporting hive table)

QUESTION 3
Certain individuals are more susceptible to autism if they have particular combinations of genes expressed in their DNA. Given a sample of DNA from persons who have autism and a sample of DNA from persons who do not have autism, determine the best technique for predicting whether or not a given individual is susceptible to developing autism?

A.    Native Bayes
B.    Linear Regression
C.    Survival analysis
D.    Sequencealignment

Answer: B

QUESTION 4
Under what two conditions does stochastic gradient descent outperform 2nd-order optimization techniques such as iteratively reweighted least squares?

A.    When the volume of input data is so large and diverse that a 2nd-order optimization technique can be fit to a sample of the data
B.    When the model’s estimates must be updated in real-time in order to account for newobservations.
C.    When the input data can easily fit into memory on a single machine, but we want to calculate confidence intervals for all of the parameters in the model.
D.    When we are required to find the parameters that return the optimal value of the objective function.

Answer: AB

QUESTION 5
What is the result of the following command (the database username is foo and password is bar)?
$ sqoop list-tables – – connect jdbc : mysql : / / localhost/databasename – – table – – username foo – – password bar

A.    sqoop lists only those tables in the specified MySql database that have not already been imported into FDFS
B.    sqoop returns an error
C.    sqoop lists the available tables from the database
D.    sqoopimports all the tables from SQLHDFS

Answer: C
Explanation:
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-15/getting- sqoop

QUESTION 6
What is the most common reason for a k-means clustering algorithm to returns a sub-optimal clustering of its input?

A.    Non-negative values for the distance function
B.    Input data set is too large
C.    Non-normal distribution of the input data
D.    Poor selection of the initial controls

Answer: C

QUESTION 7
You have a large m x n data matrix M. You decide you want to perform dimension reduction/clustering on your data and have decide to use the singular value decomposition (SVD; also called principal components analysis PCA)
You performed singular value decomposition (SVD; also called principal components analysis or PCA) on you data matrix but you did not center your data first. What does your first singular component describe?

A.    The mean of the data set
B.    The variance of the data set
C.    The standard deviation of the data set
D.    The maximum of the data set
E.    The median of the data set

Answer: C

If you want to pass Cloudera DS-200 successfully, donot missing to read latest lead2pass Cloudera DS-200 dumps.
If you can master all lead2pass questions you will able to pass 100% guaranteed.

http://www.lead2pass.com/DS-200.html

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org