Hw2

.pdf

School

İhsan Doğramacı Bilkent University**We aren't endorsed by this school

Course

CS 464

Subject

Computer Science

Date

Dec 28, 2024

Pages

Uploaded by PresidentRockSparrow39

CS464 Introduction to Machine LearningFall 2024Homework 2Due: November 30, 2024 23:59Instructions•For this homework, you may code in any programming language of your choice.•You are NOT allowed to use any machine learning packages, libraries or toolboxes for this assignment(such as scikit-learn, tensorflow, keras, theano, MATLAB Statistics and Machine Learning Toolboxfunctions, e1071, nnet, kernlab etc.) unless otherwise stated.•Submit a soft copy of your homework to Moodle.•Upload your code and written answers to the related assignment section on Moodle (.TAR or .ZIP).Submitting hard copy, handwritten or scanned files is NOT allowed.•The name of your compressed folder must be “CS464HW2Section#FirstnameLastname” (i.e.,CS464HW21sheldoncooper). Please do not use any Turkish characters in your compressed foldername.•Your code should be in a format that is easy to run and must include a driver script serving as anentry point. You must also provide a README file with clear instructions on how to execute yourprogram.•This is an individual assignment for each student. That is, you are NOT allowed to share your workwith your classmates.•If you do not follow the submission routes, deadlines and specifications (codes, report, etc), it will leadto significant grade deduction.•If you have any questions about the questions, you can contact:–ipek.oztas@bilkent.edu.tr1

1PCA Analysis [50 pts]In this task, you will analyze fake face images using Principal Component Analysis (PCA). Specifically, youwill work with a subset of the 140k Real and Fake Faces dataset1that contains images generated by Style-GAN. You are required to use only the test set of fake images, provided in the fileStyleGAN/fake.zip2.This dataset contains 10,000 images of fake faces.For this question, you may not use any libraries toperform PCA calculations. Instead, you are expected to implement the PCA algorithm yourself. To findeigenvalues and eigenvectors, it is recommended to use thenumpy.linalg.eigfunction as part of yourimplementation.The images are resized to 64×64 pixels using bilinear interpolation3. Before analysis, each image (originallyof size 64×64×3) must be flattened to a 4096×3 matrix. Note that the PIL library reads image files intheuint8format. Since unsigned integer values cannot be negative, this format may lead to issues in latercalculations. To avoid this, consider converting the data type tointorfloat32.Note that all images are 3-channelRGB. Create a 3-D array,X, of size 10000×4096×3 by stackingflattened matrices of the images provided in the dataset. SliceXasXi=X[:,:, i], where i corresponds tothe three indexes (0:Red, 1:Green, and 2:Blue), to obtain color channel matrix (10000×4096) of allimages for each channel.Question 1.1[15 pts]Apply PCA onXi’s to obtain first 10 principal components for eachXi. Reportthe proportion of variance explained (PVE) for each of the principal components and their sum for eachXi.Discuss your results and find the minimum number of principal components that are requiredto obtain at least70%PVE for all channels.Question 1.2[15 pts]Using the first 10 principal components found for each color channel, reshapeeach principal component to a 64×64 matrix. Later, normalize the values of each of them between 0 and 1using the min-max scaling method5. After scaling them, stack corresponding color channels (R, G, and B) ofeach principal component to obtain 10 RGB images of size 64×64×3, which are the visuals of eigenvectors.Display all and discuss your results.Question 1.3[20 pts]Describe how you can reconstruct a face image using the principal componentsyou obtained in Question 1.1.Use the firstkprincipal components to analyze and reconstruct the firstimage6in the dataset wherek∈ {1,50,250,500,1000,4096}. In order to reconstruct an image, you shouldfirst calculate the dot product with principle components and the image. Later, you project the data youobtained back onto the original space using the firstkeigenvectors. Discuss your results in the report.Hint:Do not forget to add up the mean values at the end of the reconstruction process if you subtractedthem from the data in Question 1.1 to calculate the principle components.1https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces/data2https://drive.google.com/file/d/1NLfnWvmIlP9dvQugOugxAKXgzI2qK_i7/view?usp=sharing3Implemented withPIL.Image.open(imagepath).resize((64, 64), Image.BILINEAR)in the PIL library45https://www.oreilly.com/api/v2/epubs/9781788627306/files/assets/ffb3ac78-fd6f-4340-aa92-cde8ae0322d6.png6The order of the images may differ in different operating systems. The name of the second image isimage9577.png2

2Logistic Regression[50 pts]For this question, you are asked to develop a Multinomial Logistic Regression Classifier model to classifyfashion images extracted from the FASHION MNIST dataset7. The FASHION MNIST database is 70,000grayscale images of fashion articles, including 10 classes given in Figure1.It comprises 70,000 examples,60,000 being the training data and the remaining 10,000 being the test data. The images are size-normalizedand centered in a 28x28 image. Since the dataset only contains training and test data, you must create yourown validation dataset by separating the first 10,000 images from your training data and their correspondinglabels. Ultimately, you will have 50,000 training, 10,000 test, and 10,000 validation images. You are providedwith a script to upload and read this data. Please check the script for Question 2 given in Moodle. Thecorresponding files are as follows:•train-images-idx3-ubyte•train-labels-idx1-ubyte•t10k-images-idx3-ubyte•t10k-labels-idx1-ubyteFigure 1: FASHION MNIST DatasetSince you are asked to implement multinomial classification, you need to turn the labels into their one-hot-encoded version and initialize as many weight vectors as the number of your unique labels. Also, unlikein the Binomial Logistic Regression, you will use Softmax as the activation function instead of Sigmoid. Theformula for Softmax activation function is provided in equation (2.1), wherezis the input vector,iis theindex of the element in the output vector, andKis the total number of classes.σ(z)i=ezi∑Kj=1ezj(2.1)Following this formulation, the update rule for weights is based on the derivative of the cross-entropy loss.It can be calculated using the difference between the target values and softmax outputs.For a detailedmathematical derivation of softmax, please visit the link provided8.While updating the weights of yourmodel, remember to add theL2 regularization term given in (2.2).You need to take its derivative andcombine it with the gradient in your loss formula.L2reg=λ2NXi=1w2i(2.2)As in the first question, you need to flatten your images of size 28x28 to get a 784 dimensional vector foreach image.Also, in the dataset, feature scales are significantly different from each other.You need tonormalize the data to train a model not influenced by feature scales. You can apply min-max normalizationto scale the features in the range [0,1]. Formulation of this normalization is provided in equation (2.3). Sinceimages consist of arrays of integers ranging from 0 to 255,Xminhere is 0, andXmaxis 255. So, accordingto equation (2.3), you need to divide your flattened image arrays into 255 to normalize them.ˆx=x−XminXmax−Xmin(2.3)7https://www.kaggle.com/code/ohwhykate/fashion-mnist-classification8https://peterroelants.github.io/posts/cross-entropy-softmax/3

Note:Use the same data split in all parts of the assignment to perform a fair split between classifiers forparameter selection. Also, you should train each model for 100 epochs for your experimentsHint:Try to use as much as vectorized operations using numpy instead of for loops.Implement a Logistic Regression Classifier for the aforementioned task. For this part, initialize your weightsfrom a Gaussian distribution, where the weights are initialized asN(µ= 0, σ= 1). Also, you should initializethe batch size as 200, the learning rate as 5x10−4, and the L2 regularization coefficient (λ) as 10−4, this willbe your default model. Afterward, you will experiment with these hyperparameters to find the best model.While doing your experiments, you will only change the requested hyperparameters and keep the others astheir mentioned default values.Question 2.1[15 pts]Train the default model described above. Display the test accuracy and confusionmatrix for that case.Question 2.2[15 pts]For this part of the question, you will do separate experiments on the hyperpa-rameters mentioned. Remember, you only need to change the mentioned hyperparameter types according tothe given values and keep the other default ones. You will change one hyperparameter at a time. Try yourmodel using the hyperparameters given below and compare their performances:•Batch size: 1, 64, 3000•Weight initialization technique: zero initialization, uniform distribution, normal distribution•Learning rate: 0.01, 10−3, 10−4, 10−5•Regularization coefficient (λ): 10−2, 10−4, 10−9To exemplify, you need to evaluate your model performance based on batch sizes 1, 64, and 3000 by keep-ing other default hyperparameter values. After running your model with these values, you need a graph withepochs at the x-axis and resulting accuracies at the y-axis. You need to use legends to show the individualperformances of given batch sizes. Do not forget to adjust the titles of the tables accordingly. You shouldperform this procedure for each hyperparameter given above.Question 2.3[5 pts]After you perform the above experiments, you need to select the best values for eachof the hyperparameters (and the best-performing initialization technique for weights) and create the optimalmodel. You need to display the test accuracy and confusion matrix for the best model.Question 2.4[10 pts]As mentioned in the earlier parts of this section, you have initialized 10 (numberof labels) weight vectors for your classification task. In this part, you need to visualize your finalized weightvectors (after your best model is trained) and print them as images. One line of code is provided for you tovisualize your weights; please check the script that you are given in Moodle. Keep in mind that we expectsome blurriness in the finalized weight images. After you obtain your results, comment on their look andwhat they might represent.Question 2.5 [5 pts]Using the best model, calculate precision, recall,F1score andF2score for each class.Comment on the results using the confusion matrix you obtained in Question 2.3 and the weight images youobtained in Question 2.4.4