Statistics150

.docx
School
Acton School of Business**We aren't endorsed by this school
Course
STAT 421
Subject
Statistics
Date
Dec 20, 2024
Pages
4
Uploaded by KidCaribouMaster1388
# Graduate Statistical Analysis Test## Exam Information- Name: Final Exam- Duration: 3 hours- Total Score: 100 points## Instructions- Answer all questions.- Show all work for calculation-based questions.- No calculators or electronic devices allowed.- Use only the space provided for your answers.- Late submissions will be penalized.- No questions during the exam.- This exam contains 25 questions.## Question 1Consider the following scenarios. Identify whether each scenario is best suited for a Poisson, Binomial, or Normal distribution, and justify your answer.a) The number of cars passing through a toll booth per hour.b) The number of defective items in a batch of 100 produced by a manufacturing process.c) The heights of adult male humans.## Question 2For the following hypothesis test, calculate the test statistic and the corresponding p-value. Assume a significance level of = 0.05.αa) H0: = 100 vs. Ha: < 100, given sample mean = 95, sample standard deviation s = 10, μμand n = 30.b) H0: = 200 vs. Ha: > 200, given sample mean = 210, sample standard deviation s = μμ15, and n = 50.## Question 3A simple linear regression model is fit to the data. Interpret the meaning of the slope coefficient in the context of the problem.a) The relationship between study hours and exam scores.b) The effect of advertising expenditure on sales.
Background image
## Question 4Perform a chi-squared test of independence for the following 2x2 contingency table. Includethe expected frequencies in your calculations.| | Category A | Category B ||-------|-------------|-------------|| Event | 10 | 20 || None | 30 | 40 |## Question 5For the following time series data, calculate the autocorrelation function (ACF) for lags 1 to 4.a) Weekly sales data for a retail store over the past 20 weeks.## Question 6Using R or Python, write the code to perform a principal component analysis (PCA) on a dataset with five continuous variables.## Question 7A Bayesian network is built to model the probability of a disease given symptoms. Describe how to update the posterior probability of having the disease after observing a new symptom.## Question 8For a given dataset, construct a 95% confidence interval for the population mean when the population standard deviation is known.## Question 9A multi-factorial design experiment is conducted with two factors (A and B) each with two levels. Calculate the main effects and interaction effects.## Question 10Assume a two-tailed test with n = 100 and = 0.01. Calculate the critical region and αdetermine if the null hypothesis can be rejected.## Question 11Explain the concept of multicollinearity in regression analysis and discuss a method to detect and address it.## Question 12Consider the following data: Heights of 30 adult females (cm). Construct a boxplot and
Background image
interpret the results.## Question 13A pharmaceutical company wants to test the effectiveness of a new drug. Describe the process of conducting a clinical trial, including the types of errors that can occur.## Question 14Calculate the likelihood function for the parameters of an exponential distribution given a sample of data.## Question 15Discuss the advantages and disadvantages of using k-means clustering for customer segmentation.## Question 16Given a dataset with n observations, derive the formula for the sample variance.## Question 17A study is conducted to test the effectiveness of a new teaching method. Describe how to setup a factorial design to measure the interaction between teacher experience and class size.## Question 18Calculate the posterior distribution for a beta-binomial model, given a prior distribution and observed data.## Question 19A scatter plot of two variables shows a strong negative correlation. Explain how this can affect the interpretation of the regression coefficient.## Question 20Describe the concept of cross-validation in the context of model evaluation, including the steps involved.## Question 21A company wants to predict customer churn. Explain how to approach this problem using logistic regression.## Question 22Calculate the sample median for the following dataset: [50, 75, 100, 125, 150].## Question 23Discuss the role of the design matrix in multiple linear regression.
Background image
## Question 24A study finds that the mean weight of a population is 150 pounds with a standard deviation of 30 pounds. Calculate the probability that a randomly selected individual weighs more than 180 pounds.## Question 25A researcher wants to compare the means of two independent populations. Describe the appropriate statistical test to use and explain why.Bonus Question: Earn up to 5 extra points by explaining the difference between a Type I andType II error in hypothesis testing, providing an example for each.End of Exam
Background image