ECPI University**We aren't endorsed by this school
Course
CIS 101
Subject
Computer Science
Date
Dec 20, 2024
Pages
4
Uploaded by JusticeAlligator4007
Kenji MossLab 1.10 Amazon Sage MakerECPI: CIS 33509/05/2024Questions:1.Explain what the commands appear to be doing in this cell.a.In this cell, the code is importing necessary libraries (pandas), setting display options for rows, columns, and display width to handle a large dataset, and reading a CSV file (imports-85.csv) into a DataFrame (df_car). Additionally, it assigns column names to the dataset and displays the first 5 rows with the head() function. The info() method shows the dataset structure, including column names, non-null counts, and data types.2.Why is it necessary to add the col_names variable to this cell? What would be the result if it were not included?a.The col_names variable defines custom column names for the dataset. Without specifying col_names, the CSV would use default numerical headers (0, 1, 2, etc.). This would make the data less readable and harder to manipulate because the columns would lack meaningful names. Using col_names improves clarity and usability in subsequent analyses.3.What column name and type identifies data set elements that are text-based?a.Based on the output, columns like aspiration, num-of-doors, drive-wheels, and num-of-cylinders are text-based and have the data type object. These columns represent categorical data, which needs to be encoded or transformed before beingused in machine learning models.4.What kind of values are found in the num-of-doors and num-of-cylinders columns? Whatactions should be performed on these values, and why?a.The num-of-doors column contains values like "two" and "four," while the num-of-cylinders column has values like "four," "six," and "eight." Since these values are categorical but represent numeric quantities, they should be converted to numeric values for easier processing. This is done using the replace() method witha mapping dictionary, transforming values like "two" into 2 and "four" into 4.
5.Which cell numbers in the Step 2 section associate the two columns in the previous question with their new format? What are these assignments called, and what Python function are they associated with?a.The columns num-of-doors and num-of-cylinders are converted into their numericcounterparts in cell Out[8] using the replace() function, as shown by the mappingsdoor_mapper and cylinder_mapper. This process is called categorical encoding, specifically label encoding, where categories are replaced by numerical values. The function used is replace() from pandas.6.Why did the drive-wheels attributes need to be assigned binary values instead of numeric values? What might have happened had they not been?a.The drive-wheels attribute was encoded using one-hot encoding (pd.get_dummies()), creating binary columns for drive-wheels_4wd, drive-wheels_fwd, and drive-wheels_rwd. This was necessary because drive-wheels is acategorical variable without any inherent order or numeric value. If numeric values were assigned instead (e.g., 4, 1, 0), it might introduce unintended ordinal relationships, leading the model to incorrectly interpret one type of drive-wheel asgreater or lesser than another, impacting the model's accuracy.