Cambrian College**We aren't endorsed by this school
Course
ACC 1025
Subject
Statistics
Date
Dec 15, 2024
Pages
2
Uploaded by CommodoreDeer199
Hypotheis Testing - (b)Okay, now the real data analytics begins. We will cover single variable regression in this tab.Recall the equation for a regression: y = a + bxy =data value of dependent variablea =interceptb =slopex = data value of independent variableRegression - Example 1 - single binary variableLet's begin with a simple 1-variable regression.In tab "4 - Hypothesis Testing (a)", we used a two-sample t test to test the difference in mean IQ between Males and Females.We can do the same using regression.Required: regress Grade on the variable Femaley =data value of dependent variablea =interceptb =slopex = data value of independent variableWhich of these (y,a,b,x) do we already know?y = data value for the variable Gradex = data value for the variable FemaleExample. The following are data for students 5 and 6. Student #GradeStudy TimeParticipationIQGMATFemaleGender5621061216681Female6952551356670MaleThe regression equation takes the following form: Grade = a + b(Female)where:Grade is the student's final grade Female equals 1 for female students and 0 for male studentsWrite out the regression equation for each student:Student 5:62 = a + b(1)Student 6:95 = a + b(0)We use regression to estimate the parameters a and b.Note: a and b will take the same values for both students AND for every student in our sample.Use the Data Analysis Tookpak to estimate the regression.You get the following error: regression - linest() function returns errorThis is because you have missing values for one of your variables.Re-run the regression after removing observations with missing values (hint: use a filter to sort the variable Female)Here is the regression output after excluding the observations with missing data:SUMMARY OUTPUTRegression StatisticsMultiple R0.31R Square0.10Adjusted R Squ0.08Standard Error11.74Observations48ANOVAdfSSMSFignificance FRegression1675.13675.134.900.03Residual466339.68137.82Total477014.81CoefficientsStandard Errort StatP-valueLower 95% Upper 95%Lower 95.0%Upper 95.0%Intercept70.892.2231.950.000066.4375.3666.4375.36Female7.613.442.210.03190.6914.530.6914.53Recall that a is the intercept. This is the average value of Y (i.e., Grade) when X (i.e., Female) is equal to 0.Thus, a is the average Grade for Male students. Confirm this by looking at the average grade of Male students in tab "1b - Data by Gender".The average Grade for Females is the sum of a and b (70.89+7.61) which equals 78.5. Confirm this also.The test of our hypothesis is the coefficent B on Female (i.e., 7.61).Note the pvalue of 0.0319 -> Reject the Null HypothesisThe coefficent B on Female tells us that the average Female in our sample has a grade that is 7.61% higher than the average Male. Again, confirm in tab "1b - Data by Gender". 7.61 is the difference between the average garde of females (i.e., 78.5) and the average grade of males (i.e., 70.89).
Regression - Example 2 - single continuous variableLet's suppose that I hypothesize that more study time leads to a higher grade.Null:Among ACCT3171 students, study time does not affect grade.Alternative:Among ACCT3171 students, more study time leads to a higher grade.Run the regression of the form: Grade = a + b(StudyTime)SUMMARY OUTPUTRegression StatisticsMultiple R0.82R Square0.68Adjusted R Squ0.67Standard Error7.00Observations48ANOVAdfSSMSFignificance FRegression14759.054759.0597.050.0000Residual462255.7649.04Total477014.81CoefficientsStandard Errort StatP-valueLower 95% Upper 95%Lower 95.0%Upper 95.0%Intercept51.9312.4621.080.000046.9756.8946.9756.89Study Time1.6100.169.850.00001.281.941.281.94Interpretation:Intercept The average value of Grade when study time is zero. Thus, a student who does not study at all receives a grade of 52% on average.Study TimeThis coefficent is the test of our hypothesis.Examine t stat and pvalue -> positive and highly statistically signficantInterpretation -> a one-hour increase in study time per week leads to an increase in overall Grade of 1.61% (see data dictionary tab for variable definitions)Range -> the true effect of a one hour increase in study time on Grade falls within the range [1.28, 1.94]We can use this model to estimate the average effect for any person.Suppose you want to know what grade you would achieve based the amount of hours you study.We can write it out: y = a + bxy = 51.931 + 1.61XPlug in your desired number of study hours into the equation to figure out the corresponding grade:StudyTimeGradeXY051.93153.54559.981068.031576.072084.122592.172693.782795.392897.002998.6130100.2231101.8332103.44According to the model, a student can achieve a grade of 100% by studying approximately 30 hours.Studying more than 30 hours is a waste of time.Imporant things to know:a) A regression estimates the averageeffect in the sample. Thus, we can draw conclusions about the average student only. Some students can get much higher (or lower) grades with zero study time.b) A regression's adjusted R2 is a measure of model accuracy - > Our model produces an adjusted r2 of 0.67.