StatisticalLearning2024Midtermsolution

.pdf

School

The University of Hong Kong**We aren't endorsed by this school

Course

COMP 3354

Subject

Computer Science

Date

Dec 18, 2024

Pages

Uploaded by GrandMorningLeopard43

STATISTICALLEARNINGMIDTERM>>NAME:<<Information:Note that the midterm is reasonably long! Briefly look at all of the problems. If you get stuck, trythe other problems. Partial credit is available on written problems. If you do not know how to answer a step, writedown any feasible answer and use it in the next parts to get partial credit.Timing:You have the whole lecture time (15:30-17:20pm) to complete the exam.Resources:The midterm exam is open book. You may bring printed/hand-written notes or books. But no devicewith communication (such as computers/phones) is allowed. You may use standard calculators (that cannot com-municate with other devices).Sharing policy:You may not share or communicate the midterm questions during or after the exam. This policyis strict.1Multiple Choice QuestionsQ1 (5 points)Which is not an example of Machine Learning?(A) Clustering(B) Breadth first search(C) Regression(D) Image classificationBQ2 (5 points)Supervised learning is different than unsupervised learning because:(A) Supervised learning needs to interact with humans to get feedback during training(B) Supervised learning can cluster data(C) Supervised learning needs data(D) Supervised learning uses labelsDQ3 (5 points)Which of the following is true about machine learning:(A) Machine Learning is a subset of supervised learning(B) Machine Learning enables computers to learn without being explicitly programmed(C) Machine Learning requires data that consists of inputs with associated outputs(D) Machine Learning teaches computers using very specific instructions without the need to learn from experi-enceBQ4 (5 points)Machine learning most often attempts to (multiple correct answers are possible):(A) Minimize a loss function(B) Maximize a loss function(C) Extract data from patterns(D) Extract patterns from dataADQ5 (5 points)Which of the following is an unsupervised learning method?(A) Agglomerative clustering(B) Logistic regression(C) Na¨ıve Bayes(D) Linear regressionA1

MidtermCOMP3354 Statistical LearningQ6 (5 points)GivenP(A) =P(B) =14, P(C) =13, P(AB) = 0, P(AC) =P(BC) =112.What is the probability that exactly one event happens among eventsA, B, C? Note:P(AB)denotes the proba-bility that A and B happen at the same time.(A) 5/12(B) 1/12(C) 2/3(D) 3/4B. 1/2Q7 (5 points)Ifˆθ1is an estimator forθsuch thatE[ˆθ1] =aθ+b, wherea̸= 0. Letˆθ2=ˆθ1−ba. Thenˆθ2is anunbiased estimator forθ, i.e.,E[ˆθ2] =θ. True or False?(A) True(B) FalseAQ8 (5 points)Which of the following sentence isFALSEregarding regression?(A) It discovers causal relationships(B) It relates inputs to outputs(C) It is used for prediction(D) It may output integersAQ9 (5 points)Letxbe a continuous random variable with the following PDF:p(x) =(2xif0≤x≤10otherwise.(1)Also, suppose givenx,yhas distribution:P(y|x) =x(1−x)y−1,fory= 1,2,3, . . .(2)What is the MAP estimate ofxgiveny= 3?(A)1/2(B)1(C)3/2(D)2AQ10 (5 points)Letx1, x2, x3, . . . , xnbe independent samples from a geometric distribution with unknown pa-rameterθ, i.e., its density isp(x;θ) = (1−θ)x−1θ. What is the maximum likelihood estimator (MLE) ofθ?(A)∑ni=1X2in(B)n∑ni=1X2i(C)∑ni=1Xin(D)n∑ni=1XiD2

MidtermCOMP3354 Statistical Learning2Written QuestionsQ11 (10 points)LetPdenote the probability that the label is positive for the given input(x1, x2). Given alogistic regression modellnP1−P= 0.1 + 0.2x1+ 0.3x2and an inputx= [x1, x2] = [5,3], use the model todetermine if the input should be classified as positive or negative.lnP1−P= 2>0, so positive.3

MidtermCOMP3354 Statistical LearningQ12 (10 points)The infection rate of some bacteria in the whole population is 0.02. The result of testing iseither positive or negative, but the result may be inaccurate. Suppose:P(negative|infected) = 0.01,P(negative|noninfected) = 0.97.If one is tested positive, what is the probability of him/her being infected? (The solution should be in fractionform, i.e.,p/qfor two integerp, q.)P(infected|positive) =P(positive|infected)P(infected)P(positive).(3)First, we haveP(infected) = 0.02(4)P(positive|infected) = 1−P(negative|infected) = 1−0.01 = 0.99.(5)ThenP(positive) =P(positive, infected) +P(positive, noninfected)(6)=P(positive|infected)P(infected) +P(positive|noninfected)P(noninfected)(7)=P(positive|infected)P(infected) + (1−P(negative|noninfected))(1−P(infected))(8)= 0.99×0.02 + (1−0.97)×(1−0.02)(9)SoP(infected|positive) =P(positive|infected)P(infected)P(positive)(10)=0.99×0.020.99×0.02 + (1−0.97)×(1−0.02)(11)=99×299×2 + 3×98(12)=3382.(13)4

MidtermCOMP3354 Statistical LearningQ13 (10 points)Considering a linear regression model:Y=β0+β1Xwith a training dataset(x1, y1) = (2,0),(x2, y2) = (−1,1),(x3, y3) = (1,3),(x4, y4) = (0,1).(1) [5 points] Compute the estimatedˆβ0,ˆβ1by minimizing the mean squared error on the data. Hint: take thederivatives of the error with respect to the parameters and set them to 0.(2)[5 points] Suppose we have a testing dataset(x′1, y′1) = (3,3),(x′2, y′2) = (−2,0),(x′3, y′3) = (5,1),(x′4, y′4) = (−3,2).Use estimatedˆβ0,ˆβ1above to make predictions for each of the data points.(1)ˆβ0= 1.3,ˆβ1=−0.1(2)ˆy1= 1,ˆy2= 1.5,ˆy3= 0.8,ˆy4= 1.65

MidtermCOMP3354 Statistical LearningQ14 (10 points)Consider the following dataset about 5 types of mushrooms, about whether the mushroom ispoisonous based on three binary features. We want to use the naive Bayes method to classify a new mushroomwith Height=Tall, Body=Slim, and Color=Black. What’s the probability of this mushroom being poisonous?HeightBodyColorPoisonousTallFatBlackNoTallFatWhiteYesShortSlimWhiteNoShortSlimBlackYesTallSlimWhiteYesabout 0.646

MidtermCOMP3354 Statistical LearningQ15 (10 points)Letx1, x2, x3, . . . , xnbe independent samples from a uniform distribution over the interval(0, θ), whereθis unknown. Find the maximum likelihood estimator (MLE) ofθ.The density function of the uniform distribution isp(x) =(1θ0≤x≤θ0otherwise.(14)The likelihood of the data isp(x1, x2, . . . , xn|θ) =nYi=1p(xi|θ)(15)=(1θnif0≤x1, x2, . . . , xn≤θ0otherwise.(16)Note that1θnis a decreasing function ofθ. Thus, to maximize it, we need to choose the smallest possible value forθ. Fori= 1,2, . . . , n, we need to haveθ≥xi. Thus, the smallest possible value forθisˆθMLE= max(x1, x2, . . . , xn).(17)7