Faculty of Computer Science and Engineering**We aren't endorsed by this school
Course
CS AI
Subject
Statistics
Date
Dec 16, 2024
Pages
3
Uploaded by BaronMagpie4236
3- Statistical ChartsSo far, we have displayed data as averages and counts. Now let's look at some other statistical parameters that we will illustrate as graphics. We have not yet shown anything about variance and I think the first thing that one should look into beyond averages is to look at variance or how the data are distributed, and a good way of looking at the distribution of data, especially if it were to be a continuous variable, is to look at histograms. And if you are interested in displaying something more than the average, maybe the median, and the quartiles, then perhaps box plots should be our choice. Using the teaching evaluation data, we have plotted a histogram of teaching evaluation scores. You could see that the mean score is around 4, but then you could see very low teaching evaluation scores, not many frequently, but most frequently it's the around the average, and then you'll see that some have lower teaching evaluation scores, and some have fairly high teaching evaluation scores. The histogram approximates the normal distribution curve. Essentially, you have 3.99 or 4 as the mean. The standard deviation of 0.55, looking at 463 records. This gives you a good idea of how your data are distributed. You can, in fact, plot multiple histograms, such that you can see the difference between the subgroups. So here, you have the histograms overlaid for males and females. These frequent lower teaching evaluations, for females, is likely to influence the average teaching evaluation score for females versus for males.A box plot essentially looks like this. The thick line in the box represents the median.The top part of the box is the third quartile. The bottom part of the box is the first quartile.
The line at the bottom is the minimum value, and, the line at the top is the maximum value. And the range between the first quartile and the third quartile is called the interquartile range. In this graphic, we have created the box plots for the age variable. We can see that the median age of males is higher than the median age of females.Also, the maximum age of the males is higher than the maximum age of females. To do this in Python, we use the box plot function in the seaborne library. We will put the gender on the y-axis and the age of the instructor on the x-axis. You can play around with the x and y-axis. If you wanted a horizontal style box plot, for readability, I like to use vertical box plots. We can also add another dimension. Here we will add tenure: so those who are tenured are plotted on the right and those who are not tenured are plotted on the left; and the blue color represents the female instructors; and the orange color represents the male instructors. We can see the differences between male and female. Instructors, male tenured instructors, are older than male untenured instructors; whereas female tenured instructors are younger than female untenured instructors.To do this in Python, add the hue argument to the box plot function.A pie chart is another way of looking at your data. You can see here in this graphic that the number of courses taught by male instructors is larger than the number of courses taught by female instructors.To do this in Python, we will use the matplotlib library. First we specify the labels, get the number of courses taught both by male and females, and assign it to a sizes variable. Create a subplot, insert the sizes, labels, and percentage
to one decimal place in the pi function, and print out the pie chart with the show function.