Building a Three-Layer MLP for Binary Classification Success

School

Srm Institute Of Science & Technology**We aren't endorsed by this school

Course

CSE 569

Subject

Computer Science

Date

Dec 11, 2024

Pages

Uploaded by chetanahar

1. IntroductionThe objective of this task is to implement a three-layer Multi-Layer Perceptron (MLP) fromscratch for binary classification. The dataset contains two classes of 2D data points, and thetask is to train, validate, and test the model while experimenting with different numbers ofhidden nodes (nH). The performance is evaluated based on the accuracy of predictions onthe test set.2. Methodology2.1 ArchitectureThe MLP consists of:•Input Layer: 2 features.•Hidden Layer:nHnodes, with a sigmoid activation function.•Output Layer: 1 node, with a sigmoid activation function to output probabilities.Forward PassThe forward pass computes the activations using the following equations:Z1=X·W1+b1A1=σ(Z1),whereσ(x) =11 +e−xfor the hidden layer, andZ2=A1·W2+b2A2=σ(Z2)for the output layer.Loss FunctionThe Mean Squared Error (MSE) is used as the loss function:MSE Loss =1mmXi=1(yi−ˆyi)2whereyiis the true label, ˆyiis the predicted value, andmis the number of samples.1

2.2 TrainingBackward PassThe backpropagation algorithm computes gradients for each parameter:dZ2=A2−YdW2=1mA⊤1·dZ2,db2=1mXdZ2dZ1= (dZ2·W⊤2)⊙σ′(Z1),σ′(x) =σ(x)·(1−σ(x))dW1=1mX⊤·dZ1,db1=1mXdZ1where⊙denotes element-wise multiplication.Parameter UpdateParameters are updated using gradient descent:W1←W1−η·dW1,b1←b1−η·db1W2←W2−η·dW2,b2←b2−η·db2whereηis the learning rate.Early StoppingTraining stops when the validation loss no longer decreases.3. Results and Observations3.1 Learning CurvesSeparate learning curves were plotted for training, validation, and testing loss for each valueofnH. Below are some observations:•Training Loss: Consistently decreased for all configurations, indicating the modelwas learning.•Validation Loss: Showed a clear minimum, beyond which overfitting was observedfor largernH.•Testing Loss: Remained stable for the best-performing configuration.3.2 Test AccuracyThe test accuracy for eachnHis summarized as follows:Hidden Nodes (nH)Test Accuracy (%)2XX.XX4XX.XX6XX.XX8XX.XX10XX.XXThe best accuracy ofXX.XX%was achieved withnH= 6. (Replace placeholders withactual results.)2

3.3 Observations•SmallernHvalues underfit the data due to insufficient capacity.•LargernHvalues overfit the data, leading to poor generalization.•Random weight initialization caused slight variations in results.4. ConclusionThis experiment demonstrated the importance of selecting an appropriate hidden layer sizefor an MLP. The optimal configuration wasnH= 6, which balanced training capacity andgeneralization. Early stopping and feature normalization were critical for stable and effectivetraining.3