Understanding Ranking Classifiers and ROC Curve Essentials

School

Toronto Metropolitan University**We aren't endorsed by this school

Course

ITM 618

Subject

Computer Science

Date

Dec 12, 2024

Pages

Uploaded by PresidentWombatPerson447

1. What is the primary goal of a ranking classifier?A) To classify data into two categoriesB) To assign a probability score to each instanceC) To calculate accuracyD) To minimize the false positive rate2. Which line in an ROC curve represents a random classifier?A) The line that connects all pointsB) The diagonal line from (0,0) to (1,1)C) The line above all pointsD) None of the above3. In the context of overfitting, what does "chance occurrences" refer to?A) General patterns found in all datasetsB) Irrelevant data points that appear as patternsC) Errors in data entryD) Random noise that is ignored by models4. What does AUC in the ROC curve stand for?A) Area Under ConfusionB) Area Under CostC) Area Under the CurveD) Area Under Classification5. Which of the following models is most likely to be overfitted?A) A model with a high AUC and low accuracyB) A model with many parameters and high accuracy on training data but low accuracy ontest dataC) A model with a low false positive rateD) A model with consistent accuracy across training and test data

6. How does the complexity of a decision tree relate to the number of nodes?A) Complexity increases as the number of nodes decreasesB) Complexity is independent of the number of nodesC) Complexity increases as the number of nodes increasesD) Complexity decreases with more nodes7. What does the "sweet spot" in tree induction represent?A) The point where training error is highestB) The optimal trade-off between model complexity and performanceC) The minimum number of nodes in the treeD) The point where the model completely fits the training data8. Why is AUC useful in evaluating models?A) It summarizes the ROC curve with a single valueB) It measures the accuracy of a modelC) It only considers the positive classD) It only evaluates performance on the training data9. In which scenario is underfitting most likely to occur?A) A simple model applied to a complex datasetB) A complex model applied to a simple datasetC) A model with a high number of nodesD) A model with a high AUC10. What is the primary difference between logistic regression and SVM in terms of decision boundary?A) Logistic regression can capture non-linear boundariesB) SVM tries to maximize the margin, whereas logistic regression finds a probability-basedboundaryC) SVM always has a linear decision boundaryD) Logistic regression always performs better than SVM1

11. What is the main purpose of a confusion matrix in model evaluation?a) To display the predicted probabilities for each class.b) To show the relationship between true positive, false positive, true negative, and false negative predictions.c) To calculate the loss function for the model.d) To visualize feature importances in a model.12. If a model has high accuracy but performs poorly on unseen data, what is this a sign of?a) Underfittingb) Proper generalizationc) Overfittingd) Class imbalance13. What metric would you use to evaluate the proportion of correctly predicted positive cases out of all actual positive cases?a) Precisionb) Accuracyc) Recalld) F1 Score14. Which of the following statements is true regarding precision and recall?a) Precision measures the proportion of true positive results among all predictions made by the model.b) Recall measures the proportion of true negatives among all negative predictions.c) Precision and recall are unrelated and cannot be used together.d) Precision is used to measure the overall accuracy of a model.15. What type of data issue does resampling techniques like SMOTE aim to address?a) Missing datab) Data duplicationc) Class imbalanced) Data normalization

16. In a decision tree, what does the leaf node represent?a) A decision that requires further branchingb) The starting point of the treec) The final output or class label for a pathd) A condition for data splitting17. What is the main goal of feature selection in model building?a) To add more features to increase model complexity.b) To remove irrelevant or redundant features to simplify the model and avoid overfitting.c) To create synthetic features for more data points.d) To randomly shuffle the dataset.18. When analyzing the feature importance of a decision tree, which feature would be considered the most influential?a) The feature with the highest normalized Gini index.b) The feature that appears most frequently at the leaf nodes.c) The feature that results in the highest information gain.d) The feature that splits the data into equal parts.19. Which of the following metrics combines both precision and recall into a single value?a) Accuracyb) ROC-AUCc) F1 Scored) Mean Squared Error20. What is an advantage of using ROC curves for model evaluation?a) It focuses only on true positives.b) It evaluates the classifier's performance across all classification thresholds.c) It only works for binary classification.d) It measures the overall computational efficiency of a model.

21. What does the "True Negative" value in a confusion matrix represent?a) Cases where the model incorrectly predicts the negative class.b) Cases where the actual class is negative and the model also predicts negative.c) Cases where the model predicts the positive class correctly.d) Cases where the model's output is uncertain.22. If a confusion matrix has a high number of false positives, what metric is likely to be impacted the most?a) Recallb) Accuracyc) Precisiond) F1 Score23. What does the Area Under the ROC Curve (AUC) indicate?a) The probability of the model making an incorrect prediction.b) The model's performance across all possible classification thresholds.c) The time taken to build the model.d) The number of false positives divided by true positives.24. What does an AUC value of 0.5 signify?a) The model has perfect predictive performance.b) The model is performing worse than random guessing.c) The model is performing at the level of random guessing.d) The model's predictions are 100% correct.25. Which statement about ROC curves is correct?a) The closer the curve is to the diagonal, the better the model's performance.b) The closer the curve is to the top left corner, the better the model's performance.c) ROC curves only evaluate the performance of regression models.d) ROC curves only consider true positives and false negatives.26. In a binary classification scenario, which of the following metrics can be directly derived from a confusion matrix?a) AUCb) Precision, recall, and F1 score

c) R-squaredd) Mean Absolute Error27. What is the primary use of an ROC curve?a) To measure the accuracy of a classification model.b) To visualize the trade-off between true positive rate and false positive rate at various thresholds.c) To compare models based on their training times.d) To identify feature importance.28. If a model's ROC curve lies below the diagonal line, what does this imply?a) The model performs better than random guessing.b) The model is highly accurate.c) The model performs worse than random guessing.d) The model is perfectly predictive.29. What is the formula for calculating the true positive rate (TPR)?a) TP / (TP + FP)b) TN / (TN + FP)c) TP / (TP + FN)d) FP / (FP + TN)30. Why is the AUC a valuable metric for comparing different classifiers?a) It shows the overall time complexity of each model.b) It summarizes the model's performance across all classification thresholds.c) It provides the exact number of true positive predictions.d) It highlights the computational efficiency of a model.