Data *Data values or observations are information collected regarding some subject * Data are often organized into a data table such as the one below Purchase Order Ship to Area | PreviousCD Purchase Gift? Name ‘ Province ASIN ‘ Artist Number Price | Code 10675489 Katherine H. | Alberta 1099 | 403 | Nashville N | B00000I5Y6 | Kansas 10783489 Samuel P. Nova Scotia 1699 | 902 | Classical Y | B000002BK9 | Boston 12837593 Chris G. Quebec 1598 | 819 | Hip Hop N | B000068ZVQ | Chicago 15783947 Monique D. | Ontario 1199 | 905 | Reggae N | BOO000IOAA | Garbage &
Variable The characteristics recorded about each individual, case, or subject are called variables Variables are usually shown as the columns of a data table and identify What has been measyred g Yamables ppooooom Purchase Order Ship to Area | PreviousCD Number Name Province | Price | Code Purchase | Gift? ASIN Artist 10675489 Katherine H. | Alberta 1099 | 403 | Nashville N | B00000I5Y6 | Kansas 10783489 Samuel P. Nova Scotia 1699 | 902 | Classical Y | B000002BK9 | Boston 12837593 Chris G. Quebec 1598 | 819 | Hip Hop N | B000068ZVQ | Chicago N 15783947 Monique D. | Ontario 1199 | 905 | Reggae B00000IOAA | Garbage
Variable Types: Categorical and Numerical When a variable names categories and answers questions about how cases fall into those categories, it is called a categorical variable When a variable has measured numerical values with units and the variable tells us about the quantity of what is measured, it is called a quantitative or numerical variable &
Categorical variables Example Categorical variables ... * arise from descriptive responses to questions such as “What kind of advertising do you use?” * may only have two possible values (like “Yes” or “No”) * may be a number like a telephone area Question Categories or Responses Do you invest in the stock market? _ Yes__No What kind of advertising do you use? | __ Newspapers ___Internet ___ Direct mailings What is your class at school? __ Freshman___ Sophomore ___Junior __Senior I would recommend this course to __ Strongly Disagree __ Slightly Disagree __ Slightly Agree _ Strongly Agree another student. How satisfied are you with this __ Very Unsatisfied ___ Unsatisfied __Satisfied __ Very Satisfied product?
Numerical Variables Numerical or quantitative variables have units. The units indicate * how each value has been measured * the corresponding scale of measurement * how much of something we have * how far apart two values are &
FURTHER CLASSIFICATIONS OF VARIABLES Variables r Categorical Numerical (quantitative) Discrete Continuous Examples: Examples: " Number of =« Weight Children " Voltage " Defects per (Measured hour characteristics® (Counted ) o items)
MEASUREMENT LEVELS OF CATEGORICAL VARIABLES: ORDINAL AND NOMINAL Nominal Data Ordinal Data *In nominal data, the numbers are used only for the purpose of convenience, and they do not mean any ordering (Example: If female it is 1 and if male it is zero. The responses are words that describe the categories) *Ordinal data shows rank ordering of items, and similar to nominal data the values are words that describe responses (Example: C\ product quality rating: 1:poor, 2: average, - 3:good).
EXAMPLE 1 Upon visiting a newly opened Starbucks store, customers were given a brief survey. Is the answer to each of the following questions categorical or numerical? If categorical, give the level of measurement. If numerical, is it discreet or continuous? a) Is this your first visit to this Starbucks store? b)On a scale from 1 (very dissatisfied) to 5 (very satisfied), rate your level of satisfaction with today’s purchase? ¢) What was the actual cost of your purchase today? &
POPULATION VS. SAMPLE Population Sample “ .f‘fl" |I'I||flH I Hfi“fll Values calculated Values computed using population from sample data data are called are called C\ parameters statistic
DESCRIPTIVE STATISTICS: GRAPHICAL PRESENTATION OF DATA CTD. Categorical Numerical Variables Variables * Frequency * Line chart distribution * Frequency * Cross table distribution * Bar chart * Histogram and * Pie chart ogive * Pareto diagram * Scatter plot
DESCRIPTIVE STATISTICS: TABLES AND GRAPHS FOR CATEGORICAL VARIABLES Categorical Data Tabulating Data -Frequency Graphing Data -Bar Chart Distribution Table -Cross Table 1 -Stacked or Component bar chart Pie Chart Pareto Diagra m -
DESCRIPTIVE STATISTICS: THE FREQUENCY DISTRIBUTION TABLE FOR CATEGORICAL VARIABLE Example 3: Hospital Patients by Unit JL?%EFP' Unit Number of Patients (rounded) Cardiac Care 1,052 11.93 Emergency 2,245 25.46 Intensive Care 340 3.86 Maternity 552 6.26 Surgery 4,630 52.50 Total: 8,819 100.0 (Variables are Frequencies </ categorical)
A NOTE O Frequency is the number of observations in each category. O Relative frequency is obtained by dividing each frequency by the number of observations. O Percent is obtained from dividing each frequency by the number of observations and multiplying the resulting proportion by 100%.
DESCRIPTIVE STATISTICS: BAR CHART FOR CATEGORICAL VARIABLE - When we want to draw attention to the frequency of each category (in the frequency distribution table) in the categorical variable, we will use bar chart. * The height of a rectangle for a category is the frequency of each category or the number of observations in each category. - There is no need for the bars to touch each other. C
EXAMPLE 3 CTD. Bar chart for patient data ng@_'ltal Frequencies -pat|39$ of Cardiac Care 1,052 / Emergency 2,245 Hospital Patients by Unit Intensive Cane 340 5000 Maternity 552 c 4000 Surgery 4,630 s & 58 3000 % E2 28 2000 g 1000 0 1 Cardiac Care Emergency Intensive Care Maternity Surgery
DE FO oIf Y SCRIPTIVE STATISTICS: PIE CHARTS R CATEGORICAL VARIABLE the goal is drawing attention to the roportion of frequencies in each category of the frequency table, pie C oT hart is proper. he circle or pie represents the total. oT he pieces of pie display shares of the total, frequencies, or percentage for each category of the categorical variable. @
EXAMPLE 3 CTD. Pie chart for patient data c,#—Iospital Number % fl’otall"njt——o'f—ea'flej‘ts— Hospital Patients by Unit Cardiac Care 1,052 11.93 Cardiac Care Emergency 2,245 25.46 12% Intensive Care 340 3.86 Maternity 552 6.26 Critr ey A 2N =A% 1] ycl y =, UJVU Emergency Surgery 250% 53% <__Intensive Care (Percentages are ! 4% rounded to the Mat(irnlty nearest percent) 6%
EXAMPLE 5 (Q 1.12 ON PAGE 14) The supervisor of a plant obtained a random sample of employee experience {in months) and times to com- plete a task (in minutes). Graph the data with a com- ponent bar chart. Less 5 Minutes to 10 Minutes to Experience/ Than Less Than Less Than Time 5Minutes 10 Minutes 15 Minutes Less than 10 13 25 3 months 3 < 6 months 10 13 12 6 << 9 months 9 22 8 9 < 12 months 5 18 19 gLy
ANSWER TO EXAMPLE 5 Number of Employees 60 50 40 30 20 10 Employee Performance 010 to <15 min m5to <10 min m<5min N N Less than 3 months 3 to 6 months 6 to 9 months Experience 9 to 12 months
At Home for extra practice: ORead example 1.2 on page 10 (Cross table and component bar chart). ORead example 1.3 on page 11 (Pie chart).
EXAMPLE 6 400 defective items are examined for cause of defect. Display the Pareto Diagram. Source of Manufacturing Error Number of defects Bad Weld 34 Poor Alignment 223 Missing Part 25 Paint Flaw 78 Electrical Short 19 Cracked case 21 Total 400
: LINE CHART EXAMPLE 7 Number of Park Visitors by Year 350 300 250 200 150 SIOJISIA JO spuesnoyL 100 50 1 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996
REMINDER: CLASSIFICATION OF VARIABLES Data Categorical Numerical Discrete Continuous
REMINDER: TABLES AND GRAPHS FOR CATEGORICAL VARIABLES Categorical variables Tabulating Data Graphing Data -Frequency Distribution Table -Cross table -Bar Chart -Component Bar Chart Pie Chart Pareto Diagra m -
DESCRIPTIVE STATISTICS: GRAPHS TO DESCRIBE NUMERICAL VARIABLES Numerical Data Frequency Distributions and OB Y Ss Histogram Ogive Graph Graph
EXAMPLE 8 A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature data: 24, 35,17, 21, 24, 37, 26, 46, 58, 30, 32,13,12, 38, 41, 43, 44, 27, 58, 27 Find the frequency distribution table for this variable. C\
ANSWER TO EXAMPLE 8 CTD. Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Interval Frequency Ffei:zg\rlliy Percentage 10 but less than 20 3 15 15 20 but less than 30 6 .30 30 30 but less than 40 5 .25 25 40 but less than 50 4 .20 20 50 but less than 60 2 .10 10 Total 20 1.00 100
EXAMPLE 8 CTD.: PLOT THE HISTOGRAM FOR THE MANUFACTURER OF INSULIN Interval 10 but less than 20 3 20 but less than 30 6 30 but less than 40 5 40 but less than 50 4 50 but less than 60 2 > o c o = o < w (No gaps between bars) Histogram: Daily High Tem perature 7 6 [&)] - N W b | o o 70 10 20 Degrees 30 40 50 60
A REMINDER O Frequency is the number of observations in each category. O Relative frequency is obtained by dividing each frequency by the number of observations. O Percent is obtained from dividing each frequency by the number of observations and multiplying the resulting proportion by 100%.
EXAMPLE o CID.: MANUFACTURER OF INSULIN 12,13, 17, 21, Data in ordered array: 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Class Frequency|Percentage ?:l:;ztl:::; g:m:rllatggg 10 but less than 20 3 15 3 15 20 but less than 30 6 30 9 45 30 but less than 40 5 25 14 70 40 but less than 50 4 20 18 90 50 but less than 60 2 10 20 100 Total 20 100
EXAMPLE 8 CTD.: Plot the Ogive graph. NSULIN MANUFACTURER Upper Cumulative interval Interval endpoint Percentage (Less than 10 10 ] 10 butTess than 20 20 15 20 but less than 30 30 45 30 but less than 40 40 70 40 but less than 50 50 20 50 but less than 60 60 100 Class Frequency|Percentage (;l:::ll:::::; g::?g:::‘g’: Lessthan16 e 6 6 10 but less than 20 3 15 3 15 20 but less than 30 [ 30 9 45 30 but less than 40 5 25 14 70 40 but less than 50 4 20 18 90 50 but less than 60 2 10 20 100 Total 20 100 100 Ogive: Daily High Temperature (o] o D o Cumulative Percentage N o ‘LthperS?nter?/%l
EXAMPLE 9 Suppose we have the following data: 17 62 15 65 28 51 24 65 39 41 35 15 39 32 36 37 40 21 44 37 59 13 44 56 12 54 64 59 Construct a frequency distribution table. Construct a histogram.
EXAMPLE 12: PLOT THE SCATTER DIAGRAM FOR THE TABLE. Average SAT scores by state: 1998 Verbal Alabama 562 Alaska 521 Arizona 525 Arkansas 568 California 497 Colorado 581 Connecticut 510 Delaware 501 D.C. 488 Florida 500 Georgia 486 Hawaii 483 W.Va. 525 Wis. 581 Wyo. 548 Math 558 520 528 555 516 542 509 493 476 501 482 513 13 594 546 SAT Math Score g g S g Average SAT Math vs. Verbal Scores by State » 9% é’*’ MR < 4 o 450 500 550 600 650 SAT Verbal Score