Comprehensive Guide to Graduate Statistics Exam Preparation

School
Capital Community College, Hartford**We aren't endorsed by this school
Course
PHYS 8011
Subject
Statistics
Date
Dec 12, 2024
Pages
5
Uploaded by HighnessFoxPerson1237
---**Exam Name:** Graduate Statistics Comprehensive Evaluation**Exam Time:** 3 Hours 15 Minutes**Total Score:** 150 Points---**Instructions:**1. Please ensure that you have all the required materials before starting the exam.2. There are a total of 27 questions in this exam, covering multiple choice, open-ended, and calculation problems.3. Allocate your time appropriately. For multiple-choice questions, allocate about 1 minute per question, for open-ended questions, allocate about 2 minutes per question, and for calculation questions, allocate about 4 minutes per question.4. Write clearly and legibly. Marks will be deducted for poor handwriting.5. You are permitted to use a scientific calculator, but no other electronic devices are allowed.---**Question 1 (Multiple Choice, 5 Points):**Which of the following is a characteristic of the normal distribution?A) The mean is always equal to the median.B) The skewness is always positive.C) The variance is always negative.D) The distribution is symmetric and bell-shaped.**Question 2 (Multiple Choice, 5 Points):**In a binomial distribution with parameters \( n \) and \( p \), if \( n \) is fixed, how does the variance of the distribution change as \( p \) varies?A) It decreases.B) It increases.C) It remains constant.D) It first decreases and then increases.**Question 3 (Open-Ended, 7 Points):**Explain the concept of statistical independence in probability theory and provide a real-world example to illustrate your explanation.
Background image
---**Question 4 (Calculation, 10 Points):**Suppose a Poisson process with a rate of 5 events per hour is observed. What is the probability that there are exactly 2 events in a 3-hour period?**Question 5 (Open-Ended, 10 Points):**Discuss the difference between simple linear regression and multiple linear regression. Provide an example where one might be preferred over the other.---**Question 6 (Calculation, 10 Points):**In a one-sample t-test, a sample of 40 observations resulted in a t-value of 1.96. Assuming the population is normally distributed, what is the p-value for this test?---**Question 7 (Open-Ended, 10 Points):**Explain the concept of a factorial design in the context of experimental design. Provide an example of a factorial design and discuss its advantages and disadvantages.---**Question 8 (Calculation, 10 Points):**Suppose we have two random variables, \( X \) and \( Y \), with covariance \( Cov(X, Y) = 5 \) and variances \( Var(X) = 10 \) and \( Var(Y) = 20 \). Calculate the correlation coefficient \( r \) between \( X \) and \( Y \).---**Question 9 (Multiple Choice, 5 Points):**In multivariate statistics, what is the primary purpose of performing a principal component analysis (PCA)?A) To reduce the dimensionality of the data.B) To predict the value of one variable based on another.C) To compare the means of two or more groups.D) To test the significance of the difference between group means.**Question 10 (Open-Ended, 10 Points):**Discuss the difference between time series analysis and cross-sectional analysis. Provide an
Background image
example of a situation where time series analysis would be more appropriate.---**Question 11 (Calculation, 10 Points):**Suppose you are given a time series data set that exhibits a strong seasonal pattern. Describe how you would decompose this time series into its seasonal, trend, and residual components.---**Question 12 (Open-Ended, 10 Points):**In nonparametric statistics, why might you choose a rank sum test over a parametric t-test? Provide a scenario where this choice would be justified.---**Question 13 (Calculation, 10 Points):**Consider a Bayesian model where the prior distribution for a parameter \( \theta \) is a normal distribution with mean 0 and variance 1. Suppose a sample of 100 observations results in a sample mean of 0.5 and a sample variance of 0.25. Calculate the posterior mean and variance for \( \theta \).---**Question 14 (Multiple Choice, 5 Points):**In the context of data science and analytics, what does the term "data wrangling" refer to?A) The process of cleaning and transforming raw data into a more usable format.B) The use of machine learning algorithms to predict outcomes.C) The analysis of large data sets to uncover patterns and insights.D) The visualization of data to communicate information effectively.**Question 15 (Open-Ended, 10 Points):**Discuss the ethical considerations involved in data science and analytics, particularly in relation to privacy and confidentiality.---**Question 16 (Calculation, 10 Points):**A random sample of 25 observations from a normal population yields a sample mean of 100and a sample standard deviation of 15. Using the chi-square test, determine if the sample variance is significantly different from the known population variance of 144.
Background image
---**Question 17 (Multiple Choice, 5 Points):**Which of the following is a key assumption in regression analysis?A) The error term has a normal distribution.B) The independent variables are linearly independent.C) The dependent variable is normally distributed.D) All of the above.**Question 18 (Open-Ended, 10 Points):**Explain the difference between a Type I and Type II error in hypothesis testing. Provide a real-world example for each type of error.---**Question 19 (Calculation, 10 Points):**Consider a two-sample t-test where the sample sizes for the two groups are 16 and 20, and the observed t-value is 2.12. Assuming equal variances, calculate the p-value for this test.---**Question 20 (Open-Ended, 10 Points):**Discuss the concept of experimental bias and its implications in the design of experiments. Provide an example of how bias can be minimized in an experiment.---**Question 21 (Calculation, 10 Points):**Suppose a sample of 30 observations from a normal population results in a sample mean of 80 and a sample standard deviation of 10. Calculate the confidence interval for the population mean at the 95% confidence level.---**Question 22 (Multiple Choice, 5 Points):**In the context of Bayesian statistics, what does the likelihood function represent?A) The probability of observing the data given the parameter values.B) The probability of the parameter values given the data.C) The probability of the data.D) The probability of the parameter values.
Background image
**Question 23 (Open-Ended, 10 Points):**Discuss the difference between exploratory data analysis (EDA) and confirmatory data analysis (CDA). Provide an example of a situation where EDA would be more appropriate.---**Question 24 (Calculation, 10 Points):**A random sample of 50 observations from a Poisson distribution with an unknown mean results in a sample mean of 3.5. Calculate the maximum likelihood estimator (MLE) for the Poisson parameter \( \lambda \).---**Question 25 (Open-Ended, 10 Points):**Explain the concept of a confidence interval in statistical inference. How does the width of a confidence interval relate to the confidence level?---**Question 26 (Calculation, 10 Points):**Suppose a random sample of 60 observations from a normal population results in a sample mean of 95 and a sample standard deviation of 12. Calculate the test statistic for a hypothesis test where the null hypothesis states that the population mean is 90.---**Question 27 (Open-Ended, 10 Points):**Discuss the role of data visualization in data science and analytics. Provide an example of a situation where an effective visualization could reveal insights that would be difficult to obtain through raw data analysis.---End of Exam
Background image