University of California, Irvine**We aren't endorsed by this school
Course
STATS 205
Subject
Statistics
Date
Dec 18, 2024
Pages
3
Uploaded by UndisputedChampDRC
Stats midterm formula sheetIntroduction To Statistics (University of North Carolina at Chapel Hill)Scan to open on StudocuStudocu is not sponsored or endorsed by any college or universityStats midterm formula sheetIntroduction To Statistics (University of North Carolina at Chapel Hill)Scan to open on StudocuStudocu is not sponsored or endorsed by any college or universityDownloaded by RC DAI (rcd2024@126.com)lOMoARcPSD|50751978
Population: The entire group you’reinterested in (e.g., all STOR 155students).Sample: A smaller group from thepopulation used for analysis.Census: Data collected from theentire population.Survey: Data collected from asampleNumerical: a quantifiablecharacteristic that has values thatare numbers. These values candescribe a measurable quantity,such as "how much" or "how many"N. Continuous: Can take on anyvalue when measured, and youcan be as precise as you want. Forexample, age, which could be24.563.N. Discrete: Can only be specificvalues, usually integers. Forexample, the number of siblings oryear of birth.Categorical: A categorical variable(sometimes called a nominalvariable) is one that has two ormore categories, but there is nointrinsic ordering to the categories.Regular categorical: is a type ofvariable that categorizes data intodistinct groups or categorieswithout any inherent order (e.g.,gender, color, type of cuisine).Ordinal: is a type of categoricalvariable that has a meaningfulorder or ranking among itscategories, but the intervalsbetween the categories are notnecessarily equal (e.g., satisfactionlevels: low, medium, high).A confounding variable is anunmeasured third variable thatinfluences both the supposedcause and the supposed effect.(Good) Simple random sampling:every member of the wholepopulation has an equal chance ofbeing selected.Systematic sampling: the wholepopulation gets a number, butindividuals are chosen at randomintervals.(Good) Stratified sampling: dividingthe population into subpopulations(i.e age, gender) that may differ inimportant ways.(Good) Cluster sampling: eachsubgroup should have similarcharacteristics to the wholesample. Instead of samplingindividuals from each subgroup,you randomly select entiresubgroups.Multistage sampling: is oftenconsidered an extended version ofcluster sampling.you divide thepopulation into clusters and selectsome clusters at the first stage,then smaller and smaller.σ- standard deviationμ-meanx̅- mean of x valuesȳ - mean of y valuesSx- standard deviation of x valuesYx- standard deviation of y valuesr - correction of x and yR2is r2Slope of best-fitting line is m=r*(Sy/Sx)intercept=ȳ - (slope * x̅)Use point slope: y-ȳ=m(x-x̅)CorrelationThe closer to -1 the morenegatively linearThe closer to +1 the morepositively linearTo calculate new average, multiplyfirst mean by the initial # ofparticipants, then add new numberand divide by new # of participantsx̅/median<1, skewed leftx̅/median=1, symmetric distributionx̅/median>1, skewed right-When data trails off to the right(get smaller) and has a longer righttail, the shape is said to be rightskewed.-When data trails off to the left andhas a longer left tail, the shape issaid to be left skewed.The percentile on the z-score is thepeople below the decimal.Rule of thumb: Use r2, not r, todecide how strong an associationis.There are no units associated withcorrelation coefficientHow to calculate standarddeviation (σ-sigma):Step 1: Find the mean.Step 2: Subtract the mean fromeach score.Step 3: Square each deviation.Step 4: Add the squareddeviations.Step 5: Divide the sum by thenumber of scores.Step 6: Take the square root of theresult from Step 5.High correlation does not imply thatthe explanatory variable has acausal influence on the responsevariable.Mode is the most common valueMean is the averageMedian is the middle value whenorganized from smallest to largest.Least squares regression lineŷ=a+bxresidual for a value x in the dataset= y – ŷ (predicted-observed)The mean of a skewed curve ispulled in the direction of the longtail.The mean μ is at the center of thecurve, and the two inflection pointsare at μ + σ and μ – σ.- About 68% of the valuesare within 1 standarddeviation of the mean – thatis, between μ+σ and μ–σ.- About 95% of the valuesare within 2 standarddeviations of the mean – thatis, between μ+2σ and μ–2σ.- About 99.7% of the valuesare within 3 standarddeviations of the mean – thatis, between μ+3σ and μ–3σ.Explanatory variable- predict orexplain the response variableResponse variable- measures theeffect of the explanatory variableRobust: median and IQR notaffected by outliersDownloaded by RC DAI (rcd2024@126.com)lOMoARcPSD|50751978
Steps to find variance in calculator:1.STAT EDIT to get to list.2.Put all data in a list3.Click button stats then go tocalc then go to var stats4.Make sure it is the correctlist, then click enter5.Sx is standard deviation,square it to find variance.Types of charts:Bar Chart: Comparescategorical/ordinal data.Boxplot: Show spread and outliersfor numerical (continuous) data.Contingency Table: Showrelationships between categoricalvariables.Histogram: Show distribution ofnumerical data (discrete orcontinuous).Line Chart: Show trends over timefor numerical (continuous) data.Mosaic Plot: Compare categoricaldata.Pie Chart: Show proportions ofcategorical data.Scatterplot: Show relationshipsbetween two numerical(continuous) variables.Side-by-Side Bar Chart: Comparesubcategories in categorical/ordinaldata.Side-by-Side Bar Plot: Places barsnext to each other to directlycompare subcategories within thesame group.Stacked Bar Chart: Show parts of atotal for categorical/ordinal data.Stacked Bar Plot: Shows thecontribution of subcategories to atotal, stacking them on top of eachother.Standardized Bar Plot: Each barrepresents 100%, and sectionsshow the relative proportions ofsubcategories.Consider a train that runs fromSeattle to Los Angeles. The meantravel time from one stop to thenext is 130 minutes, with astandard deviation of 114 minutes.The mean distance traveled fromone stop to the next is 106 mileswith a standard deviation of 100miles. The correlation betweentravel time and distance is 0.631.a). Equation of regression line:ŷ= 53.76+0.719xb.) Interpret the slope in thiscontext.-For each mile increase indistance traveled, we wouldexpect travel time toincrease on average by0.719 minutesInterpret the intercept in thiscontext.-When the distance traveledis 0 miles, the travel time isexpected to be on average53.76c.) Interpret R2in the context of theapplication.Approximately 39.8% of thevariation in travel time is accountedfor by the model.d.) residual isobserved-calculatedSo 41 min residualDownloaded by RC DAI (rcd2024@126.com)lOMoARcPSD|50751978