Master Information Analytics: Key Resources & Study Guide
School
Nanyang Technological University**We aren't endorsed by this school
Course
CS 2400
Subject
Communications
Date
Dec 9, 2024
Pages
10
Uploaded by DoctorPorcupine2085
Wee Kim Wee School of Communication and InformationFoundations of Information Analytics (CS2400)Academic Year 2024/25 (Semester 1)General InformationLee Chu KeongWKWSCI #02-02 or #04-08ascklee@ntu.edu.sg67904715Lecture Time and VenueMonday:4:30P.M. 6:00P.M. (through Microsoft Teams)Tuesday:4:30P.M. 6:00P.M. (through Microsoft Teams)Tutorial TimeWednesday:2:30P.M. 5:30P.M. (face-to-face)“Main” TextbookNone. Statistics textbooks are typically very expensive (> USD100 from Amazon.com), and so I have distilled the most important parts (the examples and the questions) of twelve statistics textbooks into a study guide (a free ebook in PDF format, see below). Print only if necessary.CS2400 ResourcesThese are the most important resources for CS2400:The Study Guide (8th Edition)contains a large collection of questions that are relevant to this course. Working through the questions will help you to gain mastery of the concepts taught in class. The Study Guidefocusses on mechanique.The Compilation of Articles(a folder on NTULearn) contains articles that I refer to during the lectures. To understand how to apply statistics, real-world applicationsare critical, and the articles are a rich source of details on how the concepts we learn come alive in the world we live in. Many of the articles have to be read several times for the contents to sink in, but it is well worth the effort. The Compilation of Articlesfocusses on technique.The New Cambridge Statistical Tables (“NCST”, 2nd ed.)is a compilation of statistical tables that we will be using for tests of hypothesis we conduct. As the NCST will be used during the examinations, you are encouraged to familiarise yourself with the important tables. A PDF version of the NCST has been uploaded to NTULearn under the folder “Statistical Tables”. (CMIL: QA276.25.L746)1
Reference BooksPlenty. I will be assigning readings and questions from the following books. The assigned portions will be scanned and uploaded to NTULearn.Agresti, A., & Franklin, C. (2013). Statistics: The Art and Science of Learning from Data(3rd ed.). Boston: Pearson Education. [NIE Library: QA276.12 Agr 2013]Burleson, D.R. (1980). Elementary Statistics. Cambridge, MA: Winthrop Publishers. [NIE Library: QA 276.12 Bur]Christmann, E.P. (2012). Beyond the Numbers: Making Sense of Statistics. Arlington, VA: NSTA Press. [NTU Library: e-book]Crilly, T. (2007). 50 Mathematical Ideas You Need to Know. London: Quercus Publishing. (Chapter 34 – Distributions, Chapter 35 – The Normal Curve, and Chapter 36 – Connecting Data only)Dancey, C.P., & Reidy, J. (2002). Statistics Without Maths for Psychology(2nd ed.). Harlow, England, Pearson Education.Folks, J.L. (1981). Ideas of Statistics. New York: John Wiley & Sons. (specifically, Chapter 1: Political Arithmetic) [NIE: HA29 Lev]Johnson, R., & Kuby, P. (2012). Elementary Statistics(11th ed.). New York: BROOKS/COLE.Johnson, S. (2006). The Ghost Map: The Story of London’s Most Terrifying Epidemic and How It Changed Science, Cities, and the Modern World. New York: Riverhead Books.Kennedy, G. (1983). Invitation to Statistics. Oxford, England: Martin Robertson. (specifically, Chapter 2: From Counting to Statistics) [NIE Library: QA276.12 Ken]Krieg, E.J. (2014). Statistics and Data Analysis for Social Science. Pearson.Lane, D.M., Scott, D., Hebl, M., Guerra, R., Osherson, D., & Zimmer, H. (n.d.). Introduction to Statistics. Download PDF version from https://onlinestatbook.com/.Levin, J. (1977). Elementary Statistics in Social Research(2nd ed.). New York: Harper & Row. [NIE Library: HA29 Lev]Levin, R.I. (1984). Statistics for Management(3rd ed.). New York: Prentice-Hall.Levitin, D.J. (2016). A Field Guide to Lies: Critical Thinking in the Information Age. New York: Dutton. [NTU Library: BC177 L666]Montaña, R.A., & Bantilan, M.M. (2009). Introduction to College Algebra. Manila, Philippines: Rex Book Store.Montgomery, D.C., Runger, G.C., & Hubele, N.F. (2001). Engineering Statistics(2nd ed.). New York: John Wiley & Sons.Rugg, G. (2007). Using Statistics: A Gentle Introduction. Berkshire, England: Open University Press. [NTU Library: e-book]Salkind, N.J. (2015). 100 Questions (and Answers) About Statistics. Los Angeles, CA: Sage.2
Spiegel, M.R., & Constable, R.L. (1992). Schaum’s Outline of Theory and Problems ofStatistics(2nd ed. in SI Units). London: McGraw-Hill. [NTU Library: HA29.S7551992]Steinberg, W.J. (2011). Statistics Alive!(2nd ed.). Thousand Oaks, CA: Sage. [NTULibrary: HA29 S819 2011]Stephens-Davidowitz, S. (2017). Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are. New York: HarperCollins. [NTU Library: QA76.9.D343S832]Stroud, K.A., & Booth, D.J. (2001). Engineering Mathematics. London: Palgrave.Triola, M.F. (2008). Elementary Statistics with Multimedia Study Guide(10th ed.). New York: Pearson.Utts, J.M., & Heckard, R.F. (2015). Mind on Statistics(5th ed.). Stamford, CT: Cengage Learning.Washington, A.J. (2010). Basic Technical Mathematics with Calculus(9th ed.). Toronto, Canada: Pearson.ArticlesBrasseur, L. (2009). Florence Nightingale’s Visual Rhetoric in the Rose Diagrams. Technical Communication Quarterly, 14(2), 161182.Donoho, D. (2017). 50 Years of Data Science. Journal of Computational and Graphical Statistics, 26(4), 745-766.Ferguson, D. (2013). How supermarkets get your data – and what they do with it. The Guardian.Gurin, J. (2014). Opening Business Innovation with Open Data. Business Horizon, 12, 4249.Hess, A. (May 14, 2017). Open Secrets. The New York Times Magazine. New York: New York Times.3Think before you speak. Read before you think.Fran Lebowitz
Khan, M.A., Uddin, M.F., & Gupta, N. (2014). Seven V’s of Big Data: Understanding Big Data to Extract Value. Conference of the American Society for Engineering Education.Loukides, M. (2010). What is Data Science? Sebastopol, CA: O’Reilly.Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), 123. (Sections 1 to 3 only)VideosThe Best Stats You've Ever Seen (A TED Talk by Hans Rosling)(https://www.youtube.com/watch?v=hVimVzgtD6w)The Joy of Stats. By Hans Rosling(http://www.gapminder.org/videos/the-joy-of-stats/)Trash Trail: Episode 5 (Data)(https://www.channelnewsasia.com/news/video-on-demand/trash-trail/data-7824318)Reports on Big DataHewlett-Packard Development Company. (2015). The Disruptive Power of Big Data.The Networked Software and Services Initiative (NESSI). (2012). Big Data: A New World of Opportunities.WebsitesUniversity of Glasgow’s Statistics Glossary (http://www.stats.gla.ac.uk/steps/glossary/index.html)NIST/SEMATECH e-Handbook of Statistical Methods (http://www.itl.nist.gov/div898/handbook/)4Torture numbers, and they will confess to anything.Gregg Easterbrook
Course DescriptionToday, many organisations generate, and collect (“harvest”), unimaginable quantities of data of all types. However, merely collecting lots of data is pointless. The critical step is to analyse the data so that it can be transformed into information and action. The key idea is to transform data in such a way that it can be used for business advantage. An important tool that enables this transformation is statistics. This is the subject matter of this course. Statistics will be presented in a mathematically friendly and non-threatening manner. The course emphasises conceptual understanding of the material, and not on the exact keystrokes needed to accomplish specific statistical tests.Course Objectives•To sensitise you to the fact that data is all around us•To start you thinking about the opportunities for transforming data into action (new products / processes / …)•To lay the statistical foundations for data analytics so that you can transform the data into actionableinformationNTULearn WebsiteAll course materials will be uploaded to the NTULearn website (http://ntulearn.ntu.edu.sg/).Lecture and Work ScheduleWeekTopic1Introduction to Data AnalyticsData vsInformationPersonal DataSensitive Personal DataRelational DataBig Data (The Three-Plus-Five Vs)Open DataData ExhaustData Products2StatisticsCounting vs StatisticsEtymology of the Word StatisticsVariablesTypes of VariablesContinuous vs Discrete VariablesNominal vs Ordinal VariablesDependent vs Dependent Variables5
WeekTopicExploratory Data AnalysisFrequency PolygonsStem-and-Leaf PlotsHistogramsLower Class Limit (LCL) and Upper Class Limit (UCL)Class WidthClass MidpointEqual-width vs Unequal-width HistogramsFrequency DensityBar ChartsScattergrams / ScatterplotsPie ChartsThe Invention of the Rose / Polar Diagrams by Florence NightingaleSpecialised Diagrams or ChartsPareto ChartsPopulation PyramidsTriangular GraphsChernoff FacesDigidot Plots3Describing the Data You FindMeasures of Central TendencyMeanArithmetic Mean (and Microsoft Excel’s AVERAGE()Function)Harmonic Mean (and Microsoft Excel’s HARMEAN()Function)Geometric Mean (and Microsoft Excel’s GEOMEAN()Function)Contraharmonic MeanQuadratic MeanCubic MeanMedian (and Microsoft Excel’s MEDIAN()Function)ModeUniqueness of the Mode as a Measure of Central TendencyUnderstanding the Sigma (Σ) Notation (and Microsoft Excel’s SUM()andCOUNT()Functions)Measures of SpreadRangeInterquartile RangeSemi-Interquartile RangeVariancePopulation Variance (and Microsoft Excel’s VAR.P()Function)Sample Variance (and Microsoft Excel’s VAR.S()Function)Standard DeviationPopulation Standard Deviation (and Microsoft Excel’s STDEV.P()Function)Sample Standard Deviation (and Microsoft Excel’s STDEV.S()6
WeekTopicFunction)Measures of ShapeSkewness (and Microsoft Excel’s SKEW()Function)Kurtosis (and Microsoft Excel’s KURT()Function)4ProbabilityStatistical ExperimentsSample SpaceTree DiagramComplement RuleConditional ProbabilityThe Normal DistributionStatistical DistributionsThe Process of NormalisationCharacteristics of the Normal DistributionStatistical Tables (specifically, The New Cambridge Statistical Tables, “NCST”)5Statistical Hypothesis TestingThe Logic of Hypothesis TestingThe [Five/Six/Seven] Steps InvolvedThe Acceptance RegionThe Rejection (or Critical) RegionThe Z-TestsOne-Sample Z-TestTwo-Sample Z-Test6The T-TestsCharacteristics of the t-DistributionDegrees of Freedom (νor df)One-Sample t-TestTwo-Sample t-TestCase I: Equal Variances Can Be AssumedCase II: Unequal Variances Cannot Be AssumedPaired t-Test7Five Miscellaneous TopicsCoefficient of VariationIndex of Qualitative VariationMean Absolute DeviationChebyshev’s TheoremZ-test of ProportionsCase I: One-Sample TestCase II: Two-Sample TestApplications of Statistics in ResearchRecess Week (30 September 4 October 2024)7
WeekTopic8CorrelationCalculating the Summary Statistics (n, x, y, ∑x, ∑y, ∑x2, ∑y2, ∑xy)Calculating Sxx, Syy, and SxyPearson’s Product Moment Correlation Coefficient (PMCC)Spearman’s Rank-Order Correlation Coefficient (ρ)Kendall’s Rank Correlation Coefficient (τ)9Simple Linear RegressionCalculating the Regression Coefficients (band a)Understanding the Difference Between Interpolation and ExtrapolationCalculating the Residual (ϵi)Coefficient of Determination (r2)Coefficient of Non-determination (1−r2)Calculating Leverage (hi)Using the LINEST()Microsoft Excel array function10Multiple RegressionThe Necessity of Multiple RegressionFormulating the Standard Equations(i)in the equation format(ii)in the matrix formatSolving the Standard Equations Using Cramer’s RuleCalculating the Partial Regression CoefficientsPolynomial RegressionFormulating the Standard Equations(i)in the equation format(ii)in the matrix formatSolving the Standard Equations Using Cramer’s RuleCalculating the Partial Regression Coefficients11Characteristics of the Chi-Square DistributionDegrees of Freedom (νor df)Chi-Square AnalysisChi-Square (χ2) Goodness-of-Fit TestChi-Square (χ2) Test of Independence12Non-parametric Statistical TestsFor two groups:Mann-Whitney UTestWilcoxon TestFor three or more groups:Kruskal-Wallis HTestFriedman Analysis of Variance by Ranks13Concluding Tests8
WeekTopicRuns TestSign TestHypothesis Test for CorrelationCorrelation CoefficientsPhi (ϕ) CoefficientPoint Biserial Coefficient (rpb)Analysis of Variance (ANOVA)Removed from syllabus.Assessment ComponentsComponentDeadline / Date (Day)WeightageMid-semester Test7 October 2024 (Monday)20%Group Assignment4 November 2024 (Monday)15%Individual Assignment11 November 2024 (Monday)15%Final ExaminationTo be announced by the NTU Officeof Academic Services50%TOTAL100%For reference, this is the NTU Academic Calendar for this semester:9
Makeup LecturesThere are no public holidays on Monday and Tuesday this semester.As I am also the Assistant Chair for Lifelong Education and International Relations, I willbe representing the school in several conferences, educational fairs and roadshows, a few of which affect the lectures or tutorials. I will arrange makeups when the dates of the other conferences/meetings are confirmed.Policy on PlagiarismAll work presented in this class must be the product of your own effort. Your work should not be copied without appropriate citation from any source, including the Internet. Any student caught presenting work which is not his or her own will face disciplinary action, which may include award of zero marks for the assignment, receiving a failing grade for the class, or being expelled from the university. This policy applies to all work submitted, either through oral presentation, or written work, including outlines, briefings, group projects, self-evaluations, etc. You are encouraged to consult me if you have questions concerning the meaning of plagiarism or whether a particular use of sources constitutes plagiarism. Details on academic integrity can be found from http://www.ntu.edu.sg/ai/Pages/index.aspx.Accessibility and Wellbeing StatementWKWSCI faculty are committed to creating learning environments that meet the needs of our diverse student body. If you encounter barriers in this class, please let me know. I am happy to consider creative solutions as long as they maintain the intent of the assessment and learning activity. If you have a disability, or think you may have a disability, you may also want to contact NTU’s Accessible Education Unit(email: aeu@ntu.edu.sg). Even if you are unsure about what you might need, AEU is available todiscuss and provide support, both academicand non-academic(even assistance with funding for devices or services). Please be aware that filing a request for official accommodations is necessary if you need accommodations for exams.Students may also experience a variety of stressors that impact their academic experience and personal well-being (academic pressure, challenges associated with relationships, mental health, alcohol or other drugs, identities, finances, etc.). Seeking help is a courageous thing to do for yourself and those who care about you. If this class is the source of your stressors, please contact me so that we can find reasonable solutions together. If you are experiencing personal issues or have concerns about overall academics, please contact anyone of the following well-being resources:WKWSCI’s AC for Student Life, Nikki Draper (tnldraper@ntu.edu.sg)WKWSCI’s Student Care Manager, Ms Sumitra cs-sumitra@ntu.edu.sg)NTU’s University Counselling CentreYour conversations with members of the well-being team are confidential and will not be shared with others without your permission. If you are in urgent need of assistance, NTU’s crisis line is 6790 4462 / 6904 7041 (after office hours).10