Brigham Young University**We aren't endorsed by this school
Course
STAT 121
Subject
Statistics
Date
Dec 18, 2024
Pages
33
Uploaded by MasterSkunkPerson1224
Simple LinearRegression - Inference
Research ObjectiveResearch Question:Is the adult height of a student determined by the height of themother? In other words, what is the relationship between a student’s height and mother’sheight for all BYU students?Population:All BYU students.Parameter of Interest:The slope between mother’s height and student’s height.Sample:A convenience sample of 1727 BYU students who are in Stat 121.Are there any issues with this study setup?
Research ObjectiveResearch Question:Is the adult height of a student determined by the height of themother? In other words, what is the relationship between a student’s height and mother’sheight for all BYU students?Our model: .Considering the research question, What would it mean if ?yi=β0+β1xi+ϵiβ1= 0There is no relationship between mother’s height (x) and student’s height (y).
Population vs. Sample SlopeOur model: Our fitted model: So, doesn’t this mean that because ?yi=β0+β1xi+ϵi^y= 35.653 + 0.503 ×xβ1≠ 0^β1= 0.503Not necessarily! β1≠^β1We need to do a test for β1
Hypothesis Testing for Research Question:Does mother’s height impact a child’s height?Steps of hypothesis testing:1. Formulate null and alternative hypotheses.2. Gather the data and see if our sample data matches (or doesn’t match) the nullhypothesis.3. Draw a conclusion about .β1H0
Hypothesis Testing for - Step 2Step 2 - Compare our data result with what we expect to see if the null hypothesis is true.From our sample, we have is this “different enough” from to conclude that?β1^β1= 0.5030Ha:β1≠ 0
Hypothesis Testing for - Step 2Step 2 - Compare our data result with what we expect to see if the null hypothesis is true.From our sample, we have is this “different enough” from to conclude that?First, standardize using the formula (or let the computer do this for you):Interpret as the number of standard errors our is from the hypothesized .β1^β1= 0.5030Ha:β1≠ 0t=^β1−0β1^σ∑ni=1(xi−¯x)2= 15.914t^β1β1
Hypothesis Testing for - Step 2Theorem. Sampling Distribution of beta_1If the LINE assumptions of the regression model are appropriate, thenis a standardized statistic and follows distribution with center and spread and degrees of freedom.Note, above we would set because we assume is true unless provenotherwise.β1t=^β1−0β1^σ∑ni=1(xi−¯x)2t01n− 2β1= 0H0So…what does this mean?
Hypothesis Testing for - Step 2IF the LINE assumptions holds, the talues of that are consistent with the claimare given by the distribution (curve):β1tH0:β1= 0But we are getting ahead of ourselves because the LINE assumptions have to be true forthe above picture to be correct.
Checking LINE AssumptionsReminder, the LINE assumptions are:L - Linear relationship between and I - Independence (one obs. doesn’t impact the other)N - Normal residuals (distance from line is normal)E - Equal variance of residuals (spread about the line is constant)How would we see if there is a linear relationship between and ?xyxyScatterplot!
Checking LINE AssumptionsIs this (approximately) linear for the bulk of the data?
Checking LINE AssumptionsHow would we see if there is independence? In other words, how can we “check” if oneobservation doesn’t influence another?Critical Thinking!Does it “make sense” that one student’s height would determine another student’sheight?Likely a minimal influence.
Checking LINE AssumptionsHow would we see if the residuals are normal?
1. Calculate the residuals as (don’t worry - the computer will dothis for you)ϵi=yi− (^β0+^β1xi)2. Draw a histogram (or density plot) of residuals
Checking LINE AssumptionsHow would we see if the residuals are normal?1. Calculate the residuals as (don’t worry - the computer will dothis for you)2. Draw a histogram (or density plot) of residualsIs this approximately normal?ϵi=yi− (^β0+^β1xi)Close enough. Skew = 0.0526991
Checking LINE AssumptionsHow would we see if there is “equal spread” of the residuals about the fitted line?
Checking LINE AssumptionsHow would we see if there is “equal spread” of the residuals about the fitted line?Option 1: Scatterplot with fitted line
Checking LINE AssumptionsHow would we see if there is “equal spread” of the residuals about the fitted line?Option 2: Fitted values vs. residuals plot (just like a scatterplot with fitted line but madeto be easier to see visually)Is this roughly “equal spread”?Close enough except for 1 or 2 outliers
Checking LINE AssumptionsExamples of NOT equal spread
Using the Analysis ToolMelanoma is highly related to sun exposure. Hence, areas with greater sun have a greaterrisk of melanoma.
Using the Analysis Tool
Using the Analysis Tool
Using the Analysis Tool
Hypothesis Testing for Back to Step 2 - - gather the data and see if our sample data matches (or doesn’t match)the null hypothesis (note: do this only if LINE assumptions are valid)Measuring if our data is consistent with the null hypothesis:1. Standardized test statistic: the number of standard errors away from the hypothesizedvalue our data is. In our rent example 15.9137202.2. -value: probability of observing our sample result or “more extreme” (as stated by )if the null hypothesis is true. Our -value is 0.Step 3: Draw a conclusions about . Using , what do we concludeabout ?β1t=pHapH0:β1= 0α= 0.05β1Our data is NOT consistent with the null hypothesis so we conclude that the mother’sheight does have an effect on the student’s height.
Using the Analysis Tool
Vagueness of Hypothesis TestsIf we reject and conclude then we really haven’t concludedanything other than there is an effect.H0:β1= 0HA:β1≠ 0Use a confidence interval for more informative answers.
Confidence Intervals for Using the same ideas for building a confidence interval as before, a C% confidenceinterval for is:β1β1^β1±t⋆^σ∑ni=1(xi− ¯x)2Don’t worry about the formula, the computer will calculate it for you.
Confidence Intervals for Research Question:As the mother’s height increases, what happens to the child’s height?Answer:A 95% confidence interval for is calculated as (0.441,0.565).How do we interpret this interval?β1β1We are 95% confident that as the mother’s height goes up by 1 inch, we expectthestudent’s height to go up between (0.441,0.565) inches.Notice, that the interpretation says expectNOT will.
Using CIs to do TestsResearch Question:If the mother’s height goes up by 1 inch, can we expect the student’sheight to change by 1in?Answer:A 95% confidence interval for is calculated as (0.441,0.565).β1No because 1 is not in the interval at the 0.05 significance level.Principle: You can use CIs to do 2-sided hypothesis tests (i.e. alternative hypothesis with“”)≠
Using the Analysis Tool
Nuances of Inference for What do we do if the LINE assumptions aren’t quite appropriate?β1Throw out outliers (not recommended)Ignore them and do inference anyway (but acknowledge that your inferences could bevery wrong - not recommended)Use more explanatory variables. For example, use father’s height AND shoe size toexplain height (we’ll learn this next unit).Consult a statistician (or better yet - take more stats classes and we’ll teach you)
Additional Practice:Measuring possum head size can be difficult. However, measuring total possum length iseasier. What is the relationship between possum length and head size? Use a simple linearregression model (and the course app) to answer the following questions:1. Do the LINE assumptions all hold for this example?2. Does total length have a linear effect on head length?3. What would a Type 1 Error be for the hypothesis test in #1?4. If the total length goes up by 1, how much do we expect the head length to change?YesYes because the test on the slope rejects at the level.α= 0.05Saying there is a relationship between total and head length when there isn’t.We are 90% confident that head length will go up between (0.6904, 0.977)
Key TerminologyLINE AssumptionsHypothesis tests for Confidence intervals for Checking LINE assumptionsβ1β1