01CE0517DataScienceEssentialsv1pdf20230824141005pdf20240704094449

.pdf
School
IIT Kanpur**We aren't endorsed by this school
Course
DSALGO DAA
Subject
Computer Science
Date
Jan 5, 2025
Pages
4
Uploaded by EarlZebra4715
Objective:1To provide a strong foundation for data science and application areas related to it and understand the underlying core concepts and emerging technologies in data science.Course Outcomes: After completion of this course, student will be able to:1Explore the fundamental concepts of data science.2Analyse data processing techniques for applications handling large data.3Understand concept of statistical and exploratory data analysis.4Understand various machine learning algorithms used in data science process.5Apply ethical frameworks to help them analyse ethical challenges.Teaching and Examination SchemeTheory HoursTutorial HoursPractical HoursESEIACSEVivaTerm Work2025030202525Contents :UnitTopicsContact Hours1Introduction to data scienceOverview of data science and its applications, Emergence of data science, Outlining the core competencies of a data scientist and data science, Linking data science with big data and AI, data science workflow and process, Role of python in data science, Tool for Data Science.42Data Acquisition and ManagementIntroduction to different data formats (structured, unstructured, semi- structured), overview of data acquisition techniques (surveys, web scraping, APIs), Data cleaning techniques to Handle missing data and outliers, Data preprocessing- Issues in high dimensional data, Dimensionality reduction and feature subset selection.8INSTITUTEFACULTY OF TECHNOLOGYPROGRAMBACHELOR OF TECHNOLOGY (COMPUTER ENGINEERING)SEMESTER5COURSE TITLEDATA SCIENCE ESSENTIALSCOURSE CODE01CE0517COURSE CREDITS3Pre-requisite of course:NADigitally signed by (Name of HOD) Digitally signed by (Name of Dean/ Principal)
Background image
Contents :UnitTopicsContact Hours3Data analysisExploratory Data Analysis: Introduction, Exploring relationships and patterns in data, Feature engineering and selection, Predictive vsDescriptive analytics., Statistics for Data Analysis: Descriptive statistics (Measures of central tendency and variability) and data summarization, Central Limit Theorem, Sampling Distribution104Machine Learning for data scienceDefinition, Types of learning, Evaluation and performance measures, overfitting and underfitting, Linear Regression: Model, Cost Function gradient descent, Simplifying Models through Regularization, Logistic Regression, Naive Bayes, Decision Tree.105Ethical Issues in Data SciencePrivacy and Data Protection: Overview of privacy concerns in data science, Ethical considerations in data collection and usage, Bias and Fairness in Data Science: Fairness considerations in machine learning models, Techniques for measuring and mitigating bias in data science, Ethical Decision-making in Data Science: frameworks and principles, Ethical dilemmas, Ethical guidelines for next-generation data scientists4Total Hours36Suggested List of Experiments:Contents :UnitTopicsContact Hours1Practical-1a. Hands on practical on Jupyter notebook and google colab., b. Explore and import the features of various packages.22Practical-2Working with Numpy23Practical-3Working with Pandas24Practical-4Hands on practical to clean noisy data by following techniques:i. Dropping ii. Mean iii. Median iv. Mode25Practical-5Hands on practical with data preprocessing techniques: a. Handling categorical data i. Label Encoding ii. Dummy Encoding iii. One-hot encoding26Practical-6Hands on practical for features scaling on a real-world dataset: a. Normalization b. Standardization27Practical-7Implement measures of central tendency and variability on diabetes dataset to learn and apply statistical analysis.2Digitally signed by (Name of HOD) Digitally signed by (Name of Dean/ Principal)
Background image
Suggested Theory Distribution:The suggested theory distribution as per Bloom’s taxonomy is as follows. This distribution serves as guidelines for teachers and students to achieve effective teaching-learning processReferences:1Data Science for Dummies, Data Science for Dummies, Lillian Pierson, Wiley Publication, 20212Practical statistics for data scientists, Practical statistics for data scientists, Peter Bruce, Andrew Bruce and Peter Gedeck, O’Reilly Publication, 20173Headfirst Statistics, Headfirst Statistics, Dawn Griffiths, O’Reilly Publication, 20084Machine Learning for Absolute Beginners, Machine Learning for Absolute Beginners, Oliver Theobald, Scatterplot Press, 20175Python for data analysis, Python for data analysis, Wes McKinney, O’Reilly Publication, 2017Suggested List of Experiments:Contents :UnitTopicsContact Hours8Practical-8Perform Exploratory Data Analysis (EDA) on student dataset to analyse performance of student.29Practical-9Hands on practical with sklearn package to build linear regression model on estate dataset and its evaluation.210Practical-10Apply Logistic Regression algorithm on Cancer Dataset and perform diagnostic classification operation.211Practical-11Apply Decision Tree algorithm on a weather forecasting dataset to predict humidity and evaluate model performance using accuracy score and mean square error.212Practical-12Write a python script: a. Implement Naïve Bayes classification Model on a real-world dataset. b. Evaluate model performance usingRMSE.213Practical-13Implement Support Vector Machine (SVM) algorithm on an insurance dataset for classification tasks.214Practical-14Conduct a case study to analyse and explore ethical issues in the field of data science.2Total Hours28Textbook :1Data Science from Scratch: First Principles with Python, Joel Grus,, O’Reilly Publication, 2019Digitally signed by (Name of HOD) Digitally signed by (Name of Dean/ Principal)
Background image
Supplementary Resources:1https://www.coursera.org/programs/milap-faculty-program- mm3kt/browse?collectionId=&productId=_Fk2Gi3cEeiHghIydZ_0lA&productType =s12n&query=data+science&showMiniModal=true&source=searchInstructional Method:1The course delivery method will depend upon the requirement of content and need of students. The teacher in addition to conventional teaching method by black board, may also use any of tools such as demonstration, role play, Quiz, brainstorming, MOOCs etc.2The internal evaluation will be done on the basis of continuous evaluation of students in the laboratory and class-room.3Practical examination will be conducted at the end of semester for evaluation of performance of students in laboratory.4Students will use supplementary resources such as online videos, NPTEL videos, e-courses, Virtual Laboratory.Distribution of Theory for course delivery and evaluationRemember / KnowledgeUnderstandApplyAnalyzeEvaluateHigher order Thinking10.0020.0040.0030.000.000.00Digitally signed by (Name of HOD) Digitally signed by (Name of Dean/ Principal)
Background image