Understanding Z-Scores, Percentiles, and Confidence Intervals
School
Northeastern University**We aren't endorsed by this school
Course
INSH 5301
Subject
Statistics
Date
Dec 10, 2024
Pages
4
Uploaded by MateFieldSandpiper6
Homework 4 2024-10-06 Question 1 Part A: You get back your exam from problem 3.d of Homework 3, and you got a 45. What is your z score? #z = 45 #mu = 70 #sd = 10 z_score <- function(x, mu, sd) { z <= (x-mu)/sd return(z) 3 (z <~ z_score(45, 70, 10)) ## [1] -2.5 Part B: What percentile are you? paste (round(pnorm(z) * 100, 4), "%", sep = "") ## [1] "0.621%" Part C: What is the total chance of getting something at least that far from the mean, in either direction? (Ie, the chance of getting 45 or below or equally far or farther above the mean.) paste(round((pnorm(z) * 2) * 100, 2), "4", sep = "") ## [1] "1.24%" Question 2. Part A: Write a script that generates a population of at least 10,000 numbers and samples at random 9 of them. set.seed(1); population <~ rnorm(n=10000,mean=75,sd=10); sample_pop <- sample(population,9, replace=FALSE) sample_pop ## [1] 53.61391 78.92610 65.22783 93.52996 71.48750 71.99560 94.88486 77.00829
## [9] 66.53181 Part B: Calculate by hand the sample mean. Please show your work using proper mathematical notation using latex. 1yN NI 8 I T 159 population_sample z = (53.613914 78.92610 +65.22783+ 93.52996 +71.48750+ 71.99560+ 94.88486+ 77.00829 +66.53181) = 673.20586 T = 673.20586/9 T = 74.80065 mean (sample_pop) ## [1] 74.80065 Part C: Calculate by hand the sample standard deviation. s=y/op, Ee mean = 74.80065 -1 -1 s= /80 @0 o J(1/(n— 1)) x (((53.61301 — mean)? + (78.92610 — mean)® + (65.22783 — mean)? + (93.52096 — mec = 13.24666 sd(sample_pop) ## [1] 13.24666 Part D: Calculate by hand the standard error. SE = 5 SE = 13.24666/V0 SE = 4.41 se = sd(sample_pop)/sqrt (length(sample_pop)) se ## [1] 4.415553 Part E: Calculate by hand the 95% CI using the normal (z) distribution. (You can use R or tables to get the score.) CI=T4z% f5 7= TA80065 z = 1.96 s = 13.24666 N = 9 CT = 74.80065 & 1.96 12245 CT = 83.455 66.146 Part F: Calculate by hand the 95% CI using the t distribution. (You can use R or tables to get the score.) CI=Z+t 5 & = 7480065 df = N-1 =9-1 =81 = 2306 s = 13.24666 N =9 CI = T4.80065 4 2.306 + 12245 — 84.083 64.618
Question 3 Part A: Explain why 2.e is incorrect. 2. s incorrect hecause it has a sample size of 9 which doesn’t meet the criteria for the Central Limit Theorem where for a sample to be considered normal, it must have a sample >= 30. A sample with n<30 has a mean that will be too sensitive to outliers, a distribution that often skews the data, and a large standard error. Part B: In a sentence or two each, explain what’s wrong with each of the wrong answers in Module 4.4, “Calculating percentiles and scores,” and suggest what error in thinking might have led someone to choose that answer. (http://www. nickbeauchamp.com/comp_ stats NB/compstats_ 04-04.html) 34+2%1.533 + When determining the tvalue, it assumes the f, to be 1 minus the confidence level, resulting in 0.10. The degrees of freedom is incorrectly equated with n rather than df = n— 1, and the SE is confused with the sample standard deviation. 34+ 1%1.533 « When determining the fvalue, it assumes the , to be 1 minus the confidence level, resulting in 0.10. 3+2%1.638 « The degrees of freedom is correct, but it assumes the £, to be 1 minus the confidence level, resulting in 0.10. The SE is confused with the sample standard deviation. 34+1%2132 o The degrees of freedom is incorrect. It confuses it with n rather than df =n Question 4 Part A: Based on 2, calculate how many more individuals you would have to sample from your population to shink your 95% CI by 1/2 (ie, reduce the interval to half the size). Please show your work. half_tse <- qt(0.95, 8) = (sd(sample_pop)/sqrt (length(sample_pop))) half_tse ## [1] 8.210932 8.210932 13.24666 82092 S — 2.306 + 12505 — 4105466 VN 7.44051904 = VN N = 55. 361 = 55 Part B: Say you want to know the average income in the US. Previous studies have suggested that the standard deviation of your sample will be $20,000. How many people do you need to survey to get a 95% cofidence interval of + $1,0007 How many people do you need to survey to get a 95% CI of + $100? 1.96% =01 n—20000 196 CT = 1000 n = (209902 5
(n <~ round((20000/(1000/qnorm(0.975)))"2)) ## [1] 1537 CI =100 n = (%9802 (n <~ round((20000/(100/qnorm(0.975)))"2)) ## [1] 153658 Question 5: Write a script to test the accuracy of the confidence interval calculation as in Module 4.3. But with a few differences: (1) Test the 99% CI, not the 95% CI. (2) Each sample should be only 20 individuals, which means you need to use the t distribution to calculate your 99% CI. (3) Run 1000 complete samples rather than 100. (4) Your population distribution must be something other than a bimodal normal distribution (as used in the lesson), although anything else is fine, including any of the other continuous distributions we’ve discussed so far. nruns <- 1000 nsamples <~ 20 sample_summary <- matrix(NA, nruns, 3) for (j in 1:nrums) { sampler <- rep(NA, nsamples) for (i in 1:nsamples) { sampler[i] <- rt(1, 19) ¥ sample_summary[j, 1] <- mean(sampler) standard_error <- sd(sampler)/sqrt(nsamples) sample_summary[j, 2] <- mean(sampler) - qt(0.995, length(sampler) - 1) * standard_error sample_summary[j, 3] <- mean(sampler) + qt(0.995, length(sampler) - 1) * standard_error } counter = 0 for (j in 1:nrums) { if (0 > sample_summary[j, 2] && O < sample_summary[j, 31) { counter <- counter + 1 15 } counter/nruns ## [1] 0.988