NNDLUnit4

.pptx
School
Reva Institute of Technology & Mangement**We aren't endorsed by this school
Course
COMPUTER S R22ML5390
Subject
Computer Science
Date
Dec 19, 2024
Pages
253
Uploaded by DeaconMask15814
TopicsinDeepLearningLaxmanAsst. Professor School of C and IT
Background image
TOPICS IN DEEP LEARNINGUnit-4Generative Adversarial Networks
Background image
Neural Network and Deep LearningIntroductionA generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014.
Background image
Neural Network and Deep LearningWhat is GAN?Or fully connected netwrok
Background image
Neural Network and Deep LearningWhat is GAN?
Background image
Neural Network and Deep LearningWhat is GAN?
Background image
Neural Network and Deep LearningWhat is GAN?
Background image
Neural Network and Deep LearningWhat is GAN?
Background image
Neural Network and Deep Learning5 Steps of GAN1. DefineGAN architecture based on the application2. Train discriminator to distinguishreal Vs fake data3. Train the generator to fake datathat can fool the discriminator4. Continuediscriminator and generatortraining for multiple epochs5. Save generator model to create new, realistic data.Note :When training the discriminator, hold the generator values constant; when training the generator,hold the discriminatorconstant.Each should train against a static adversary.
Background image
Neural Network and Deep LearningApplications of GAN1.Generate fake data for augumentingother machine learning algorithms2.Generate faces (https://this-person-does-not-exist.com/en)3.Image to image translation4.Text to image translation5.Super resolution
Background image
Neural Network and Deep LearningApplications of GAN
Background image
Neural Network and Deep LearningApplications of GAN
Background image
Neural Network and Deep LearningApplications of GAN
Background image
Neural Network and Deep LearningApplications of GAN
Background image
Neural Network and Deep LearningIntroduction
Background image
Neural Network and Deep LearningIntroduction( GAN completely by passes the previous slide)
Background image
Neural Network and Deep LearningIntroduction
Background image
Neural Network and Deep LearningGenerativeAdversarialNetwork
Background image
Neural Network and Deep LearningGenerativeAdversarialNetwork
Background image
Neural Network and Deep LearningGenerativeAdversarialNetwork
Background image
Neural Network and Deep LearningGenerativeAdversarialNetwork
Background image
Neural Network and Deep LearningHowtolearnthis?
Background image
Neural Network and Deep LearningGeneratorandDiscriminator
Background image
Neural Network and Deep LearningGenerativeAdversarialNetwork
Background image
Neural Network and Deep LearningGenerativeAdversarialNetwork
Background image
Neural Network and Deep LearningGAN-Objectivefunction
Background image
Neural Network and Deep LearningGAN-ObjectivefunctionThe discriminator outputs a value D(x) indicating the chance that x is a real image.Our objective is to maximize the chance to recognize real images as real and generated images as fake. i.e. the maximum likelihood of the observed data.To measure the loss, we use cross-entropyas in most Deep Learning: p log(q).For real image, p (the true label for real images) equals to 1. For generated images, we reverse the label (i.e. one minus label). So the objective becomes:
Background image
Neural Network and Deep LearningObjectiveofGenerator
Background image
Neural Network and Deep LearningObjectiveofGenerator
Background image
Neural Network and Deep LearningObjectiveofGenerator
Background image
UE20CS342-TDL-GANsObjectiveofGenerator
Background image
UE20CS342-TDL-GANsObjectiveofGenerator
Background image
UE20CS342-TDL-GANsObjectiveofDiscriminator
Background image
UE20CS342-TDL-GANsObjectiveofDiscriminator
Background image
UE20CS342-TDL-GANsObjectiveofDiscriminator
Background image
UE20CS342-TDL-GANsObjectiveofDiscriminator
Background image
UE20CS342-TDL-GANsObjectiveofDiscriminatorDiscriminator trained well
Background image
UE20CS342-TDL-GANsObjectiveofDiscriminatorDiscriminator wrongly classifiesMaximizing this costfunction makes the discriminator work optimally
Background image
UE20CS342-TDL-GANsObjectiveofDiscriminator
Background image
UE20CS342-TDL-GANsObjectiveofDiscriminator
Background image
UE20CS342-TDL-GANsCombiningObjectivesofGeneratorandDiscriminator
Background image
UE20CS342-TDL-GANsCombiningObjectivesofGeneratorandDiscriminatorTraining Objective of GAN in “minmax”objective
Background image
UE20CS342-TDL-GANsOverallTraining-StepsSteps
Background image
UE20CS342-TDL-GANsOverallTraining-Steps
Background image
UE20CS342-TDL-GANsOverallTraining-Steps
Background image
UE20CS342-TDL-GANsGenerativeAdversarialNetwork
Background image
UE20CS342-TDL-GANsGAN-SlightlymodifiedobjectiveHowever, we encounter a gradient diminishing problem for the generator.The discriminator usually wins early against the generator.It is always easier to distinguish the generated images from real images in early training.That makes V approaches 0. i.e. - log(1 -D(G(z))) 0.The gradient for the generator will also vanish which makes the gradient descent optimization very slow.To improve that, the GAN provides an alternative function to backpropagate the gradient to the generator.
Background image
UE20CS342-TDL-GANsGAN-Slightlymodifiedobjective
Background image
UE20CS342-TDL-GANsGAN-Slightlymodifiedobjective
Background image
UE20CS342-TDL-GANsGAN-Slightlymodifiedobjective
Background image
UE20CS342-TDL-GANsGenerativeAdversarialNetwork-Algorithm
Background image
UE20CS342-TDL-GANsGenerativeAdversarialNetwork
Background image
UE20CS342-TDL-GANsTrainingAlgorithm-GAN
Background image
UE20CS342-TDL-GANsTrainingAlgorithm-GAN
Background image
UE20CS342-TDL-GANsTrainingAlgorithm-GAN
Background image
UE20CS342-TDL-GANsTrainingAlgorithm-GAN
Background image
UE20CS342-TDL-GANsTrainingAlgorithm-GAN
Background image
UE20CS342-TDL-GANsTrainingAlgorithm-GAN
Background image
UE20CS342-TDL-GANsGAN: Architecture
Background image
UE20CS342-TDL-GANsGAN: Architecture
Background image
UE20CS342-TDL-GANsGAN: Architecture
Background image
UE20CS342-TDL-GANsGAN: Architecture
Background image
UE20CS342-TDL-GANsGAN: Architecture
Background image
UE20CS342-TDL-GANsGAN: Architecture
Background image
UE20CS342-TDL-GANsReferences1.https://www.youtube.com/watch?v=1ju4qmdtRdY2.http://www.cse.iitm.ac.in/~miteshk/CS7015/Slides/Teaching/pdf/Lecture23.pdf3.https://jonathan-hui.medium.com/gan-whats-generative-adversarial-networks-and-its-application-f39ed278ef09
Background image
Thank YouTOPICS IN DEEP LEARNING
Background image
TOPICS IN DEEPLEARNINGDeep Convolutional Generative Adversarial Networks
Background image
UE20CS342-TDL-GANsIntroduction - Recollection of GANGAN is Generative Adversarial Network having two neural networks: Generator and Discriminator thatare pitted against each other and are simultaneously trained by an adversarial process.Generator: learns to generate plausible data that is very similar to the training data. Data generated fromthe Generator should be indistinguishable from the real data.Discriminator: the key objective is to distinguish between the generator’s fake data from the real data and is a simple classification network.A classic analogy used is that the Generator is a forger who wants to create fake art while thediscriminator is an investigator/cop whose job is to catch fake art. Discriminator penalizes the Generator for generating fake data when failing to fool the Discriminator. The Discriminator’s classification loss helps update the Generator’s weights through backpropagation to get better at generating data close to real data.During training, the generator progressively becomes better at creating images that look real, while the discriminator becomes better at telling them apart. The process reaches equilibrium when the discriminator can no longer distinguish real images from fakes.
Background image
UE20CS342-TDL-GANsIntroduction - Recollection of GAN
Background image
UE20CS342-TDL-GANsIntroduction - Recollection of GAN
Background image
UE20CS342-TDL-GANsDCGAN- IntroductionDCGAN stands for Deep Convolutional Generative Adversarial Network, a type of generative model that uses convolutional neural networks (CNNs) to generate new images that are similar to those in the training set. The use of Deep Conv Nets results in a stable architecture and better results.The key difference between DCGAN and the original GAN is the use of convolutional layers in the generator and discriminator networks. This allows the networks to learn spatial features and patterns in the images, which is especially important for generating realistic-looking images.
Background image
UE20CS342-TDL-GANsWhy DCGANs over normal GANs?Spatial feature learning: GANs use fully connected layers, which do not take into account the spatial structure of the image. In contrast, DCGANs use convolutional layers, which can capture spatial features and correlations in the image data. This makes DCGANs better suited for tasks such as image synthesis and style transfer.Stable training: Training GANs can be challenging, as the generator and discriminator networks are trained adversarially and can become unstable, resulting in mode collapse or other problems. DCGANs have been designed with specific architectural guidelines, such as using convolutional layers with stride and poolinginstead of max pooling and avoiding fully connected layers in the discriminator, which can help make trainingmore stable.Better quality results: DCGANs are often able to generate higher quality images than GANs, due to their ability to learn spatial features and correlations in the image data. This has been demonstrated in several benchmarks and applications, where DCGANs have outperformed GANs in terms of image quality and visual fidelity.Transfer learning: Because DCGANs are based on CNNs, they can be used as pre-trained models for transferlearning on other computer vision tasks, such as object recognition or image segmentation. This makes
Background image
UE20CS342-TDL-GANsGenerator - DCGANThe generator network in DCGAN typically consists of a series of transposed convolutional layers, also known as "deconvolutional" layers, which gradually increase the spatial resolution of the generated image. The input to the generator is a random noise vector, which is passed through several fully connected layers to produce a low-dimensional representation.This representation is then reshaped into a 3D tensor, and processed by a series of transposed convolutional layers, which gradually increase the spatial resolution of the generated image. The output of the generator is an image with the same dimensions as the training images.(Fig). This is the DCGAN generator presented in the LSUN scene modeling paper.
Background image
UE20CS342-TDL-GANsDiscriminator and Training - DCGANThe discriminator network in DCGAN is a binary classifier that distinguishes between real and generated images. It consists of a series of convolutional layers that gradually reduce the spatial resolution of the input image, followed by several fully connected layers that output a single scalar value between 0 and 1.The output of the discriminator represents the probability that the input image is real, with values closerto 0 indicating that the image is generated, and values closer to 1 indicating that the image is real.To train a DCGAN, the generator and discriminator networks are trained simultaneously using a minimax game. The training process is iterative, with the generator and discriminator networks updating their weights in opposite directions to minimize their respective losses.The loss function used in DCGAN is typically the binary cross-entropy loss, which measures the difference between the predicted probability and the true label (0 for generated images, 1 for real images).
Background image
UE20CS342-TDL-GANsDCGAN - Example on MNIST dataset + Code - Setup
Background image
UE20CS342-TDL-GANsDCGAN - Load and prepare the datasetWe will use the MNIST dataset to train the generator and the discriminator. The generator will generate handwritten digits resembling the MNIST data. Loading the MNIST dataset, normalise the images and creating the training dataset ( random noise).
Background image
UE20CS342-TDL-GANsDCGAN - Creation of generatorWe will pass some random noise to the Generator. The random noise will be upscaled using Conv2DTranspose.Conv2DTranspose is used to transform a vector going in the opposite direction of a normal convolution.Use LeakyReLU for the layers except for the output layer. For the output layer, we use “tanh” activation function. LeakyReLU allows the gradients to flow better through the model architecture.Leaky ReLU?The ReLU activation function has a value between 0 and 1. When a positive input is passed to the ReLU or LeakyReLU the output will be a positive value however, when a negative input is passed to ReLU it will output 0, but Leaky ReLU will output a controlled negative value.tanh is used in the final output layer instead of sigmoid to ensure that gradients are maintained between -1 and 1 and are not set to zero as that stops the learning for the model.Batch Normalization is used for faster convergence as it standardizes the data to have a stabilizing impact on the training process.
Background image
UE20CS342-TDL-GANsDCGAN - Generator code
Background image
UE20CS342-TDL-GANsDCGAN - Viewing an image by an untrained generatorIf we use the (as yet untrained) generator to create an image.
Background image
UE20CS342-TDL-GANsDCGAN - Creation of the DiscriminatorThe Discriminator is a CNN-based image classifier that classifies between the real and the fake image. It takes fake images generated from the Generator using random noise and the real images from the training dataset as input to classify.
Background image
UE20CS342-TDL-GANsDCGAN - Viewing the prediction by an untrained discriminatorLet us use the (as yet untrained) discriminator to classify the generated images as real or fake. Themodel will be trained to output positive values for real images, and negative values for fake images.
Background image
UE20CS342-TDL-GANsDCGAN - Loss and optimisers - Discriminator lossThis method quantifies how well the discriminator is able to distinguish real images from fakes. It compares the discriminator's predictions on real images to an array of 1s, and the discriminator's predictions on fake (generated) images to an array of 0s. Thus the Discriminator loss is a sum of loss functions for the distribution of the data for both fake as well as real data.
Background image
UE20CS342-TDL-GANsDCGAN - Loss and optimisers - Generator loss & optimisersThe generator's loss quantifies how well it was able to trick the discriminator. Intuitively, if the generator is performing well, the discriminator will classify the fake images as real (or 1). Here, compare the discriminators’ decisions on the generated images to an array of 1s.The loss function used in DCGAN is typically the binary cross-entropy loss, which measures the difference between the predicted probability and the true label (0 for generated images, 1 for real images). The discriminator and the generator optimizers are different since you will train two networks separately.
Background image
UE20CS342-TDL-GANsDCGAN - Use of checkpointsHere we use checkpoints to save and restore models, which can be helpful in case, a long-runningtraining task is interrupted.
Background image
UE20CS342-TDL-GANsDCGAN - Training loopThe training loop begins with the generator receiving a random seed as input. That seed is used to produce an image. The discriminator is then used to classify real images (drawn from the training set) and fake images (produced by the generator).
Background image
UE20CS342-TDL-GANsDCGAN - Training loop - Code
Background image
UE20CS342-TDL-GANsDCGAN - Training loop code explanationOverall, the following code represents one iteration of the training loop for a DCGAN model on the MNIST dataset. During each iteration, a batch of real images is passed through the discriminator, and a batch of random noise is passed through the generator to produce a batch of generated images.Two gradient tapes are created, one for the generator and one for the discriminator. Gradient tapes are used to record gradients during the forward pass, which can be used to compute gradients for backpropagation during training. The real and generated images are passed through the discriminator to produce a batch of outputs.The losses for both the generator and discriminator are computed based on the discriminator's output for the real and generated images, and the gradients of the losses with respect to the trainable variables for each network are computed using gradient tapes.These gradients are then used to update the trainable variables of the generator and discriminator using the optimizer. The process is repeated for a specified number of iterations or epochs to train the model.
Background image
UE20CS342-TDL-GANsDCGAN - Train function codeTraining of the DCGAN starts, checkpoint happens every 15 epochs and produces images for GIF as the process goes through.
Background image
UE20CS342-TDL-GANsDCGAN - Generate and save imagesThe function displays the images generated by the model passed as aparameter.
Background image
UE20CS342-TDL-GANsDCGAN - Train the modelCall the train( ) method defined above to train the generator and discriminator simultaneously. Note, training GANs can be tricky. It's important that the generator and discriminator do not overpower each other (e.g., that they train at a similar rate).At the beginning of the training, the generated images look like random noise. As training progresses, the generated digits will look increasingly real. After about 50 epochs, they resemble MNIST digits. This may take about one minute/epoch with the default settings on Colab.Call the train function and restore the checkpoint.
Background image
UE20CS342-TDL-GANsDCGAN - Display image at final epoch
Background image
UE20CS342-TDL-GANsKey differences/additions - DCGANHere is the summary of DCGAN:Replace all max pooling with convolutional strideUse transposed convolution for upsampling.Eliminate fully connected layers.Use Batch normalization except for the output layer for the generator and the input layer of the discriminator.Use ReLU in the generator except for the output which uses tanh.Both Generator and Discriminator do not use a Max pooling.The Generator uses Leaky Relu as the activation function for all layers except the output, andthe Discriminator uses Leaky Relu for all layers.
Background image
UE20CS342-TDL-GANsDCGAN - Key problem and summaryOne of the key challenges in training DCGANs is achieving stable convergence. Because the generator and discriminator networks are trained simultaneously, they can sometimes become stuck in a cycle where the generator produces the same images repeatedly, and the discriminator learns to distinguish these images from the real images. To prevent this from happening, several techniques have been proposed, including batch normalization, weight initialization, and gradient penalty regularization.In summary, DCGAN is a type of generative model that uses deep convolutional neural networks to generate new images. By learning hierarchical representations of the image data, DCGANs can capture both low-level and high-level features, producing images that are visually similar to those in the training set. While DCGANs are powerful models, they also require careful tuning and attention to detail to achieve stable convergence and good-quality image generation
Background image
UE20CS342-TDL-GANsReferenceshttps://www.tensorflow.org/tutorials/generative/dcganhttps://machinelearningmastery.com/how-to-develop-a-generative-adversarial-network-for-an-mnist-handwritten-digits-from-scratch-in-keras/https://jonathan-hui.medium.com/gan-whats-generative-adversarial-networks-and-its-application-f39ed278ef09
Background image
Thank YouTOPICS IN DEEP LEARNING
Background image
TOPICS IN DEEP LEARNING
Background image
Topics in Deep LearningAutoencoder
Background image
Topics in Deep LearningWhat is Autoencoder ?An autoencoder is a type of neural network that is commonly used for unsupervisedlearning, data compression, and feature extraction.It consists of two parts: an encoder that maps the input data into a lower- dimensional space, and a decoder that reconstructs the original input data from the encoded representation.The primary objective of an autoencoder is to learn a compressed representation of the input data that captures its most salient features. This compressed representation, also known as the latent space, can be used for a variety of tasks such as data compression, denoising, image generation, and anomaly detection.One of the key advantages of autoencoders is that they can learn useful features from raw data without the need for explicit labels or supervision. This makes them particularly useful in cases where labeled data is scarce or expensive to obtain.Additionally, autoencoders can be used to pretrain deep neural networks, which canimprove the performance of supervised learning tasks.
Background image
Topics in Deep LearningAutoencoder
Background image
Topics in Deep LearningAutoencoder
Background image
Topics in Deep LearningAutoencoderThe basic idea of an autoencoder is to have an output layer with the same dimensionality as the inputs.The idea is to try to reconstruct each dimension exactly by passing it through the network.An autoencoder replicates the data from the input to the output, and is therefore sometimes referred toas a replicator neural network.Although reconstructing the data might seem like a trivial matter by simply copying the data forward from one layer to another, this is not possible when the number of units in the middle are constricted.In other words, the number of units in each middle layer is typically fewer than that in the input (or output).As a result, these units hold a reduced representation of the data, and the final layer can no longer reconstruct the data exactly. ( Reconstruction Loss)Therefore, this type of reconstruction is inherently lossy.
Background image
Topics in Deep LearningThe Basic Autoencoder
Background image
Topics in Deep LearningAutoencoder- Example1- Image generation
Background image
Topics in Deep LearningAutoencoder- Example2
Background image
Topics in Deep LearningAutoencoder- PropertiesAutoencoders are mainly a dimensionality reduction (or compression) algorithm with a couple of important properties:1.Data-specific: Autoencoders are only able to meaningfully compress data similar to what they have been trained on. Since they learn features specific for the given training data, they are different than a standard data compression algorithm like gzip. So we can’t expect an autoencoder trained on handwritten digits to compress landscape photos.2.Lossy: The output of the autoencoder will not be exactly the same as the input, it will be a close but degraded representation. If you want lossless compression they are not the way to go.3.Unsupervised: To train an autoencoder we don’t need to do anything fancy, just throw the raw input data at it. Autoencoders are considered an unsupervised learning technique since they don’t need explicit labels to train on. But to be more precise they are self-supervised because they generate their own labels from the training data.
Background image
Topics in Deep LearningHow to Train a Autoencoder ?The encoder takes an input vector x and maps it to a compressed representation z, also known as the latent space, using an encoder function f(x):z = f(x)The decoder then takes this compressed representation z and maps it back to the original input vector x, using a decoder function g(z):x' = g(z)The goal of training an autoencoder is to minimize the difference between the original input vector x and the reconstructed vector x'. This is typically done by minimizing the mean squared error between the two vectors:L(x, x') = ||x - x'||^2where ||.|| represents the Euclidean norm.
Background image
Topics in Deep LearningHow to Train a Autoencoder ?
Background image
Topics in Deep LearningAutoencoder- Tied WeightsAn autoencoder with tied weights has decoder weights that are the transpose of the encoder weights; this is a form of parameter sharing, which reduces the number of parameters of the model.Advantages of tying weights include increased training speed and reduced risk of overfitting, while yielding comparable performance than without weight tying in many cases (Li et al. (2019)).It is therefore a common practice to tie weights when building a symmetrical autoencoder.
Background image
Topics in Deep LearningHow to Train a Autoencoder ?You need to set 4 hyperparameters before training an autoencoder:1.Code size: The code size or the size of the bottleneck is the most important hyperparameter used to tune the autoencoder. The bottleneck size decides how much the data has to be compressed. This can also act as a regularisation term.2.Number of layers: Like all neural networks, an important hyperparameter to tune autoencoders is the depth of the encoder and the decoder. While a higher depth increases model complexity, a lower depth is faster to process.3.Number of nodes per layer: The number of nodes per layer defines the weights we use per layer. Typically, the number of nodes decreases with each subsequent layer in the autoencoder as the input to each of these layers becomes smaller across the layers.4.Reconstruction Loss: The loss functionwe use to train the autoencoder is highly dependent on the type of input and output we want the autoencoder to adapt to. If we are working with image data, the most popular loss functions for reconstruction are MSE Loss and L1 Loss. In case the inputs and outputs are within the range [0,1], as in MNIST, we can also make use of Binary Cross Entropy as the reconstruction loss.
Background image
Topics in Deep LearningAutoencoder- Loss Function
Background image
Topics in Deep LearningAutoencoder-Application-1Linear Autoencoders & Principal Component AnalysisSo one of the main applications of Autoencoders is dimensionality reduction, just likea Principal Component Analysis (PCA). In fact, if the decoder is linear and the cost function is the Mean Square Error, an Autoencoder learns to span the same subspace as the PCA.
Background image
Topics in Deep LearningAutoencoder- Application-1T-SNE visualization for clustering on MNIST datasetMNIST -28*28
Background image
Topics in Deep LearningAutoencoder- Application 2
Background image
Topics in Deep LearningAutoencoder- Application 2Blue –Anomaly Red- Non Anomaly
Background image
Topics in Deep LearningAutoencoder- Application 3
Background image
Topics in Deep LearningAutoencoder- Application 4
Background image
Topics in Deep LearningAutoencoder- Types1. Under Complete2. Over Complete
Background image
Topics in Deep LearningAutoencoder- TypesFeature ReductionFeature Enhancement Adding Noise is a kind of regularization- DenoisingAutoencoder
Background image
Topics in Deep LearningAutoencoder- Under CompleteAn autoencoder is said to be undercomplete if the dimensionality of the latent space is smaller than the dimensionality of the input space.In other words, the encoder is forced to learn a compressed representation of the input data that captures only the most essential features.This can help prevent overfitting and encourage the autoencoder to learn a more general representation of the input data.However, an undercomplete autoencoder may not be ableto capture all the essential features of the input data.
Background image
Topics in Deep LearningAutoencoder- OvercompleteAn autoencoder is said to be overcomplete if the dimensionality of the latent space (i.e., the number of neurons in the encoder's output layer) is greater than the dimensionality of the input space.In other words, there are more hidden units in the encoder than necessary to capture all the essential features of the input data.This means that an overcomplete autoencoder can learn multiple compressed representations of the input data, each capturing a different aspect of the data.However, this can also lead to overfitting, where the autoencoder learns to simply copy the input data to the latent space without capturing meaningful features.
Background image
Topics in Deep LearningAutoencoder- OvercompleteAn autoencoder is said to be overcomplete if the dimensionality of the latent space (i.e., the number of neurons in the encoder's output layer) is greater than the dimensionality of the input space.In other words, there are more hidden units in the encoder than necessary to capture all the essential features of the input data.This means that an overcomplete autoencoder can learn multiple compressed representations of the input data, each capturing a different aspect of the data.However, this can also lead to overfitting, where the autoencoder learns to simply copy the input data to the latent space without capturing meaningful features.
Background image
Topics in Deep LearningAutoencoder- Overcomplete
Background image
Topics in Deep LearningRegularisation in overcomplete autoencodersWhile poor generalization could happen even in undercomplete autoencoders it is an even more serious problem for overcomplete auto encodersThe simplest solution is to add a L2-regularization term to the objectivefunctionAnother trick is to tie the weights of the encoder anddecoder i.e., W = WTThis effectively reduces the capacity of Autoencoder and acts as a regularizer
Background image
Topics in Deep LearningTypes of Autoencoders
Background image
Topics in Deep LearningTypes of Autoencoders1.Vanilla Autoencoders: The most basic type of autoencoders are vanilla autoencoders. Vanilla autoencoders are composed of an input layer, an output layer and one hidden layer in between. The hidden layer has fewer nodes than the input layer, which forces the autoencoders to compress the input data.2.Sparse Autoencoders: Sparse autoencoders are similar to vanilla autoencoders, but they include an additional regularization term that encourages the model to use only a small subset of the input nodes. This results in a more compact representation of the input data.3.Denoising Autoencoders: Denoising autoencoders are similar to vanilla autoencoders, but they are trained using corrupted input data. This forces the autoencoders to learn the structure of the input data and disregard the corruptions.4.Contractive Autoencoders: Contractive autoencoders are similar to sparse autoencoders, but they include an additional regularization term that encourages the model to learn a sparse, but also robust representation of the input data.
Background image
Topics in Deep LearningTypes of Autoencoders5.Convolutional Autoencoders: Autoencoders that use convolutional neural networks (CNNs) to reduce the input dimensionality.6.Variational Autoencoders: Autoencoders that encode input data as a set of latent variables that are randomly sampled from a specific distribution.7.Generative Adversarial Autoencoders: Autoencoders that are trained in an adversarial manner togenerate new data.
Background image
Topics in Deep LearningVariants inAutoencodersRegularized Auto encoders1.Denoising Autoencoders2.Sparse Autoencoders3. Variational Autoencoders
Background image
Topics in Deep LearningDenoising Autoencoders1, 2,3, 4,5, 61.002, 2.009,3.0067 …..You shoud get 1,2,3,4,5,6These nodes will learn how to remove thenoise and learn data
Background image
Topics in Deep LearningDenoising AutoencodersKeeping the code layer small forced our autoencoder to learn an intelligent representation of the data. There is another way to force the autoencoder to learn useful features, which is adding random noise to its inputs and making it recover the original noise-free data.This way the autoencoder can’t simply copy the input to its output because the input also contains random noise. We are asking it to subtract the noise and produce the underlying meaningful data. This is called a denoising autoencoder.
Background image
Topics in Deep LearningDenoising AutoencodersAdd random Gaussian noise to them and the noisy data becomes the input to the autoencoder.The autoencoder doesn’t see the original image at all.But then we expect the autoencoder to regenerate the noise-free original image.
Background image
Topics in Deep LearningVisualising Autoencoders - % of Noise addedMore corruption, filters are learned well.Too much corruption can lead to lower constructionThe vanilla AE does not learn many meaningful patternsThe hidden neurons of the denoising AEs seem to act like pen-stroke detectors (for example, in thehighlighted neuron the black region is a stroke that you would expect in a ’0’ or a ’2’ or a ’3’ or a ’8’ or a ’9’)As the noise increases the filters become more wide because the neuron has to rely on more adjacent pixels to feel confident about a stroke
Background image
Topics in Deep LearningVisualising Noisy AutoencodersMethod for visualizing autoencoder (AE) representations by computing the inputs that maximally activate specific hidden neurons.noise_factor = 0.2x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape) x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)x_train_noisy = np.clip(x_train_noisy, 0., 1.)x_test_noisy = np.clip(x_test_noisy, 0., 1.)We start with defining a noise factor which is a hyperparameter. The noise factor is multiplied with a random matrix that has a mean of 0.0 and a standard deviation of 1.0. This matrix will draw samples from a normal (Gaussian) distribution.Why does this clip function is used here?To ensure that our final images array item values are within the range of 0 to 1, we may use np.clip method. The clip is a Numpy function that clips the values outside of the Min-Max range and replaces them with the designated min or max value.
Background image
Topics in Deep LearningDenoising AutoencodersA denoising autoencoder is a type of autoencoder that is designed to remove noise from input data. It does this by corrupting the input data with some form of noise (e.g., Gaussian noise or dropout) and then training the autoencoder to reconstruct the original, noise-free data.A denoising encoder simply corrupts the input data using a probabilistic process (P ( ̃xij |xij )) before feeding it to the networkA simple P could beWhy noise?This helps because the objective is still to reconstruct the original (uncorrupted) xiIt no longer makes sense for the model to copy thecorrupted ̃xi into h( ̃xi) and then into ˆxi (the objective functionwill not be minimized by doing so)Instead the model will now have to capture the characteristics of the data correctly.
Background image
Topics in Deep LearningExample of Denoising Autoencoders
Background image
Topics in Deep LearningVisualising AutoencodersThus the inputswill respectively cause hidden neurons 1 to nto maximally fireLet us plot these images (xi’s) whichmaximally activate the first k neurons of the hidden representations learned by a vanilla autoencoder and different denoising autoencodersThese xi’s are computed by the above formula using the weights (W1,W2 ...Wk) learned by the respective autoencoders
Background image
Topics in Deep LearningVisualising Autoencoders
Background image
Topics in Deep LearningVisualising AutoencodersThe hidden neurons essentially behave like edge detectorsPCA does not give such edge detectors
Background image
Topics in Deep LearningUses:Denoising AutoencodersDe-noising autoencoders are useful when dealing with data that is corrupted. Therefore, the main application of such autoencoders is to reconstruct corrupted data.The inputs to the autoencoder are corrupted training records, and the outputs are the uncorrupted data records.As a result, the autoencoder learns to recognize the fact that the input is corrupted,and the true representation of the input needs to be reconstructed.Therefore, even if there is corruption in the test data (as a result of application- specific reasons), the approach is able to reconstruct clean versions of the test data.Note that the noise in the training data is explicitly added, whereas that in the test data is already present as a result of various application-specific reasons.
Background image
Topics in Deep LearningApplications of Autoencoders
Background image
Topics in Deep LearningApplications of Autoencoders
Background image
Topics in Deep LearningTypes of Autoencoders- Deep Autoencoders
Background image
Topics in Deep LearningTypes of Autoencoders- ConvolutionalAutoencoders
Background image
Topics in Deep LearningAcknowledgements & Referenceshttp://www.cse.iitm.ac.in/~miteshk/CS7015/Slides/Handout/Lecture21.pdfhttp://www.cse.iitm.ac.in/~miteshk/CS7015/Slides/Handout/Lecture7.pdfhttps://medium.com/@lmayrandprovencher/building-an-autoencoder-with-tied-weights-in-keras-c4a559c529a2https://medium.com/@syoya/what-happens-in-sparse-autencoder-b9a5a69da5c6
Background image
THANK YOU
Background image
TOPICS IN DEEP LEARNING
Background image
Topics in Deep LearningTypes of Autoencoders
Background image
Topics in Deep LearningTypes of Autoencoders1.Vanilla Autoencoders: The most basic type of autoencoders are vanilla autoencoders. Vanilla autoencoders are composed of an input layer, an output layer and one hidden layer in between. The hidden layer has fewer nodes than the input layer, which forces the autoencoders to compress the input data.2.Sparse Autoencoders: Sparse autoencoders are similar to vanilla autoencoders, but they include an additional regularization term that encourages the model to use only a small subset of the input nodes. This results in a more compact representation of the input data.3.Denoising Autoencoders: Denoising autoencoders are similar to vanilla autoencoders, but they are trained using corrupted input data. This forces the autoencoders to learn the structure of the input data and disregard the corruptions.4.Contractive Autoencoders: Contractive autoencoders are similar to sparse autoencoders, but they include an additional regularization term that encourages the model to learn a sparse, but also robust representation of the input data.
Background image
Topics in Deep LearningTypes of Autoencoders5.Convolutional Autoencoders: Autoencoders that use convolutional neural networks (CNNs) to reduce the input dimensionality.6.Variational Autoencoders: Autoencoders that encode input data as a set of latent variables that are randomly sampled from a specific distribution.7.Generative Adversarial Autoencoders: Autoencoders that are trained in an adversarial manner togenerate new data.
Background image
Topics in Deep LearningTypes of AutoencodersSparse Autoencoders
Background image
Topics in Deep LearningSparse AutoencodersAutoencoders are an unsupervised learning technique in which we leverage neural networks for the task of representation learning. Specifically, we'll design a neural network architecture such that we impose a bottleneck in the network which forces a compressed knowledge representation of the original input.The ideal autoencoder model balances the following:Sensitive to the inputs enough to accurately build a reconstruction.Insensitive enough to the inputs that the model doesn't simply memorize or overfit the training data.This trade-off forces the model to maintain only the variations in the data required to reconstruct the input without holding on to redundancies within the input.For most cases, this involves constructing a loss function where one term encourages our model to be sensitive to the inputs (ie. reconstruction loss L(x,x^)) and a second term discourages memorization/overfitting (ie. an added regularizer).
Background image
Topics in Deep LearningSparse AutoencodersA generic sparse autoencoder is visualized below where the opacity of a node corresponds with the level of activation.It's important to note that the individual nodes of a trained model which activate are data-dependent, different inputs will result in activations of different nodes through the network.
Background image
Topics in Deep LearningSparse AutoencodersA sparse autoencoder is simply an autoencoder whose training criterion involves a sparsitypenalty.In most cases, we would construct our loss function by penalizing activations of hidden layers so that only a few nodes are encouraged to activate when a single sample is fed into the network.The intuition behind this method is that, for example, if a man claims to be an expert in mathematics, computer science, psychology, and classical music, he might be just learning some quite shallow knowledge in these subjects.However, if he only claims to be devoted to mathematics, we would like to anticipate some useful insights from him.And it’s the same for autoencoders we’re training — fewer nodes activating while still keeping its performance would guarantee that the autoencoder is actually learning latent representations instead of redundant information in our input data.There are actually two different ways to construct our sparsitypenalty: L1 regularization and KL-divergence
Background image
Topics in Deep LearningSparse AutoencodersL1 regularization
Background image
Topics in Deep LearningSparse AutoencodersA sparse autoencoder is a type of autoencoder that is designed to learn a compressed representation of the input data that is also sparse. In other words, it aims to learn a representation that contains only a small number of active neurons (i.e., neurons with non-zero outputs) at a time. This can be useful for feature selection and dimensionality reduction, as well as for reducing overfitting.The sparsity constraint is typically enforced by adding a penalty term to theloss function of the autoencoder that encourages sparsity. The penalty term can be defined in different ways, but one common approach is to use the L1 norm of the hidden layer activations: L = L(x, x') + λ ||h||_1Where L(x, x') is the reconstruction loss as defined for standardautoencoders, h is the vector of hidden layer activations, ||.||_1 represents the L1 norm, and λ is a hyperparameter that controls the strength of the sparsity penalty.The L1 penalty encourages the autoencoder to learn a representation wheremany of the hidden neurons have zero output, which means that the representation is sparse. This can be seen as a form of regularization, as it prevents the autoencoder from learning redundant or irrelevant features.
Background image
Topics in Deep LearningSparse Autoencoders- L1 RegularizationDue to the sparsity of L1 regularization, sparse autoencoder actually learns betterrepresentations and its activations are more sparse which makes it perform better than original autoencoder without L1regularization.
Background image
Topics in Deep LearningVariational Autoencoders
Background image
Topics in Deep LearningVariational AutoencodersCan we do generation with autoencoders ?In other words, once the autoencoder is trained can I remove the encoder, feed a hiddenrepresentation h to the decoder and decode a ˆX from it ?In principle, yes! But in practice there is a problem with this approachh is a very high dimensional vector and only a few vectors in this space would actuallycorrespond to meaningful latent representations of our inputBut Autoencoders do not have direct probabilistic interpretation. While they do learn a hidden representation h, they do not learn a probability distribution over the hiddenvariables given the input data X. Similarly, the decoder in an autoencoder is deterministicand does not learn a distribution over the output variables given the hidden state h.This means that in the context of autoencoders, we cannot directly sample from the learnedhidden representation to generate new data, nor can we estimate the probability of a given
Background image
Topics in Deep LearningVariational Autoencoders
Background image
Topics in Deep LearningLow Dimensional Latent Space1.Determine low dimensional Latent space which maps to an image2.Images are generated by sampling from the latent space and mapping to an outputimage.3.Example: MNIST digit generationTraining data X of MNIST digits. Generate digits like X but not found in data setLatent structure would then be the different strokes that make up the digitorientationangle , size of the font.A latent space , z- a random vector, may be with lesser dimensions than thetrainingdata would encode these latent structures.The latent space z can be sampled from the distribution p(z) ie p(z|x).( Ideally we want a probabilistic or a probability density function for z which areclose to the training data z)
Background image
Topics in Deep LearningLow Dimensional Latent SpaceTraining data ( input image) is mapped to a latent space using a neural net Latent space posterior distribution and prior distribution are modelled as GaussianOutput of the net is two parameters – mean and covariance- Parameters ofposterior distribution.A random sample from the latent space distribution is assumed to generate the inputdataThe latent space vector is mapped to input image using another neural network – reconstructed output is assumed to correspond to mean of Gaussian – leads to reconstruction loss.
Background image
Topics in Deep LearningVariational Autoencoders- 3 components1.Encoder2.Decoder3.Regularized loss function.
Background image
Topics in Deep LearningVariational Autoencoders- Regularized loss function
Background image
Topics in Deep LearningDifference between AE andVAE
Background image
Topics in Deep LearningVariational Autoencoders- ExampleTo provide an example, let's suppose we've trained an autoencoder model on a large dataset of faces with a encoding dimension of 6. An ideal autoencoder will learn descriptive attributes of faces such as skin color, whether or not the person is wearing glasses, etc. in an attempt to describe an observation in some compressed representation.In this example, we've described the input image in terms of its latent attributes using a single value to describe each attribute.
Background image
Topics in Deep LearningVariational AutoencodersHowever, we may prefer to represent each latent attribute as a range of possible values.For instance, what single value would you assign for the smile attribute if you feed in aphoto of the Mona Lisa? Using a variational autoencoder, we can describe latentattributes in probabilistic terms.
Background image
Topics in Deep LearningVariational Autoencoders
Background image
Topics in Deep LearningVariational AutoencodersWith this approach, we'll now represent each latent attribute for a given input as a probability distribution. When decoding from the latent state, we'll randomly sample from each latent state distribution to generate a vector as input for our decoder model.By constructing our encoder model to output a range of possible values from which we'll randomly sample to feed into our decoder model, we're essentially enforcing a continuous, smooth latent space representation.For any sampling of the latent distributions, we're expecting our decoder model to be able to accurately reconstruct the input.Thus, values which are nearby to one another in latent space should correspond with very similar reconstructions.
Background image
Topics in Deep LearningVariational AutoencodersLets look at the image on the left. This is generated from an autoencoder. The encoder has learnt only a single point in the latent space.Contrast this to the image on the right. The encoder by the fact that it assumes the inputcoming from some distribution generates the parameters of the distribution and therefore generates a continuous localspace
Background image
Topics in Deep LearningVariational AutoencodersSo each of the attributes has some degree of freedom in a locally continuous space.Ideally, we want overlap between samples that are not very similar too, in order to interpolate between classes.However, since there are no limits on what values vectors μ and σ can take on for eachsample setthe encoder can learn to generate very different μ for different classes, clustering them apartIt would minimize σ, making sure the encodings themselves don’t vary much for the samesample
Background image
Topics in Deep LearningVariational AutoencodersWhat we ideally want are encodingsall of which are as close as possible to each otherwhile still being distinct,allowing smooth interpolationenabling the construction of new samples.
Background image
Topics in Deep LearningVariational AutoencodersWe have to regularise both the covariance matrix and the mean of the distributions returned by theencoder. In practice, this regularisation is done by enforcing distributions to be close to a standard normal distribution (centred and reduced).This way, we require the covariance matrices to be close to the identity, preventing punctualdistributions, and the mean to be close to 0, preventing encoded distributions to be too far apart fromeach others.
Background image
Topics in Deep LearningWhy Continuous?The fundamental problem with autoencoders, for generation, is that the latent space they convert their inputs to and where their encoded vectors lie, may not be continuous, or allow easy interpolation.For example, training an autoencoder on the MNIST dataset, and visualizing the encodings from a 2D latent space reveals the formation of distinct clusters. This makes sense, as distinct encodings for each image type makes it far easier for the decoder to decode them. This is fine if you’re just replicating the same images
Background image
Topics in Deep LearningWhy Continuous?But when we’re building a generative model, we don’t want to prepare toreplicate the same image you put in. You want to randomly sample from the latent space, or generate variations on an input image, from a continuous latent spaceIf the space has discontinuities (eg. gaps between clusters) and you sample/generate a variation from there, the decoder will simply generate an unrealistic output, becausethedecoderhasnoideahowtodealwiththatregionofthelatentspace.During training, it never saw encoded vectors coming from that region of latent space.
Background image
Topics in Deep LearningWhat is Kullback-Leibler divergence?Kullback–Leibler Divergence ( KL Divergence ) is a measure of how one probability distributiondiffers from a second, reference probability distribution.In VAE, our primary objective is to learn the underlying data distribution so that we can generate new data samples from that distribution.VAE is a parametric model in which we assume the distribution and distribution parameterslike and , and we try to estimate that distribution.To estimate a distribution, we need to assume that data comes from a specific distribution like Gaussian, Bernoulli, etc. Hence, in VAE, the assumption is that the data distribution is Gaussian.
Background image
Topics in Deep LearningWhat is Kullback-Leibler divergence?We train our VAE to minimize the KL divergence between the encoder’sdistribution P(Z|X) and P(Z).In VAE, P(Z) ~ N(0,1).If the encoder outputs encoding Z far from a standard normal distribution, KL-divergence loss will penalize it more.The KL-divergence acts as a regularize, which keeps the encodings Z sufficiently diverse. If we omitted the regularizer, the encoder could learn to cheat and give each datapoint an encoding in a different Euclidean space region. In other words, KL divergence optimizes the probability distribution parameters and to closely resemble the unit gaussian distribution N(0,1).
Background image
Topics in Deep LearningVariational AutoencodersVariational autoencoders (VAEs) are a type of generative model that combines the power of traditional autoencoders with the probabilistic framework of variational inference. VAEs can be used for various tasks, such as data compression, data reconstruction, and data generation.VAEs differ from traditional autoencoders in that they learn a probability distribution over the latent space instead of a fixed encoding function. This allows VAEs to generate new datapoints that are similar to the training data, but not identical to it. The key advantage of VAEs over traditional autoencoders is that they can generate new data samples by sampling from the learned probability distribution over the latent space.
Background image
Topics in Deep LearningLatent State Distributions
Background image
Topics in Deep LearningSummary on VAE Architecture: 1.EncoderThe main idea behind VAEs is to learn a lower-dimensional representation (latent variable) of high-dimensional data by mapping it to a probability distribution. The architecture consists of two main components: an encoder and a decoder.The encoder:The encoder is a neural network that takes an input data point x and maps it to a distribution over latent variables z, which is assumed to be a normal distribution with mean μ and standard deviation σ.The encoder is trained to output these parameters such that thedistribution over z captures the most important features of the input data.The encoder is typically implemented as a neural network with parameters , which takes an input x and outputs two vectors μ and σ that represent the mean and standard deviation of the normal distribution over z.These parameters are computed using a deterministic function f(x; ), which is typically a deep neural network
Background image
Topics in Deep LearningSummary on VAE Architecture: 1.DecoderThe DecoderThe decoder is the second part of the variational autoencoder (VAE) architecture, which is responsible for mapping a sample from the latent space back to the original input space. It takes a sample z from the latent space and maps it to a reconstruction x' in the input space. The decoder is also a neural network with parameters , and it is typically a mirror image of the encoder network.The input to the decoder is a latent vector z sampled from theencoder output, and it outputs a reconstruction x' that isexpected to be as close as possible to the original input x. The decoder takes the form of a probabilistic generative model, with its parameters learned through minimizing a loss function that measures the reconstruction error between the generated output and the original input.
Background image
Topics in Deep LearningSummary on to VAE Architecture: 3. LossThere are two loss functions in training a Variational AutoEncoder: 1. Mean Square Error (MSE) loss to compute the loss between the input image and the reconstructed image, and2. KL divergence to compute the encoded distribution and the normal distribution with 0mean and 1.0 variance.
Background image
Topics in Deep LearningSummary on VAE Architecture: 1.Decoder
Background image
Topics in Deep LearningRevisit to VAE Architecture: 3. Loss
Background image
Topics in Deep LearningDifference between AE andVAE
Background image
Topics in Deep LearningReconstruction LossThe training of a VAE involves minimizing a loss function that consists of two terms: the reconstruction loss and the KL divergence loss.Reconstruction loss: The reconstruction loss measures how well the decoder can reconstruct the original input from the sampled latent variable. It is often chosen to be the negative log-likelihood of the input under the reconstructed distribution. Assuming that the likelihood is Gaussian, the reconstruction loss can be defined as follows:L_recon(x, x') = -log(p(x|x')) = (1/2)*||x - x'||^2 + (1/2)*log(2πσ^2)where x is the input, x' is the reconstructed input, σ is the standard deviation of the latentvariable, and ||.||^2 denotes the squared L2 distance.
Background image
Topics in Deep LearningKL Diveregnce LossKL divergence loss: The KL divergence loss measures the difference between the distribution of the latent variables obtained from the encoder and the prior distribution assumed for the latent variables. Assuming that the prior distribution is also Gaussian with mean 0 and variance 1, the KL divergence loss can be defined as follows:L_KL = -0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2)where mu and sigma are the mean and standard deviation of the distribution over the latent variables obtained from the encoder.Why KL loss?KL divergence loss is used in VAE to ensure that the distribution over the latent variablesproduced by the encoder is close to a standard normal distribution. This is important because a standard normal distribution is a convenient prior distribution to use for the latent variables.The KL divergence loss measures the difference between two probability distributions, in this case the distribution produced by the encoder and the standard normal distribution. By minimizing the KL divergence loss, the VAE is able to encourage the encoder to produce a distribution over the latent variables that is as close as possible to the standard normal distribution.
Background image
Topics in Deep LearningKLLossderivationThe Kullback-Leibler (KL) divergence is a measure of the difference between two probability distributions, denoted as P and Q. It is defined as:KL(P || Q) = ∑_x P(x) log [P(x) / Q(x)]where x represents the possible outcomes of a random variable.The KL divergence measures the amount of information lost when using Q to approximate P. Itis not a symmetric measure, meaning that KL(P || Q) is not necessarily equal to KL(Q || P).To derive the KL divergence, we start with the definition of the entropy H(P) of a probability distribution P:H(P) = -∑_x P(x) log P(x)where log is the natural logarithm.The entropy measures the uncertainty or randomness of a probability distribution. It is maximum when all outcomes are equally likely and minimum when only one outcome is possible.Next, we introduce a new probability distribution Q and define the cross-entropy H(P,Q) as: H(P,Q) = -∑_x P(x) log Q(x)
Background image
Topics in Deep LearningVariational AutoencodersThe cross-entropy measures the amount of information needed to encode the outcomes of Pusing Q.The KL divergence can then be derived by subtracting the entropy of P from the cross-entropy of P and Q:KL(P || Q) = H(P,Q) - H(P)Substituting the expressions for H(P) and H(P,Q), we get:KL(P || Q) = -∑_x P(x) log Q(x) + ∑_x P(x) log P(x)= ∑_x P(x) log [P(x) / Q(x)]This is the final expression for the KL divergence.Note that the KL divergence is always non-negative, and it is zero only when P and Q are identical. Therefore, it can be used as a measure of dissimilarity between two probability distributions.
Background image
Topics in Deep LearningApplications:Variational Autoencoders
Background image
Topics in Deep LearningAcknowledgements & Referenceshttp://www.cse.iitm.ac.in/~miteshk/CS7015/Slides/Handout/Lecture21.pdfhttp://www.cse.iitm.ac.in/~miteshk/CS7015/Slides/Handout/Lecture7.pdfhttps://medium.com/@lmayrandprovencher/building-an-autoencoder-with-tied-weights-in-keras-c4a559c529a2https://medium.com/@syoya/what-happens-in-sparse-autencoder-b9a5a69da5c6https://www.jeremyjordan.me/variational-autoencoders/https://towardsdatascience.com/intuitively-understanding-variational-autoencoders-1bfe67eb5daf
Background image
THANK YOU
Background image
UE20CS342Topicsin Deep Learningrof. V R BADRI PRASADAssociate Professor,Department of Computer Science & Engineering badriprasad@pes.eduAcknowledgement: Prof. Srikanth H R, Prof. Srinivas K SDepartment of Computer Science
Background image
UE20CS342 :Topics in Deep LearningPrajwal ATDL-Teaching AssistantMeta Learning
Background image
Topics in Deep LearningMeta LearningA good machine learning model often requires training with a large number of samples. Humans, in contrast, learn new concepts and skills much faster and more efficiently. Kids who have seen cats and birds only a few times can quickly tell them apart. People who know how to ride a bike are likely to discover the way to ride a motorcycle fast with little or even no demonstration. Is it possible to design a machine learning model with similar properties — learning new concepts and skills fast with a few training examples? That’s essentially what meta-learning aims to solve.The performance of a learning model depends on its training dataset, the algorithm and the parameters of the algorithm.Many experiments are required to find the best performing algorithm and parameters of the algorithm.Meta learning approaches help find these and optimize the number of experiments. This results in betterpredictions in shorter time.Meta learning algorithms can learn to use best predictions from machine learning algorithms to make better predictions.Meta learning can be used for different machine learning models (e.g. few-shotlearning, reinforcementlearning, naturallanguageprocessing, etc.).
Background image
Topics in Deep LearningMeta LearningMeta-learning, also known as "learning to learn," is a subfield of machine learning that focuses on learning algorithms that can learn from experience to solve new problems quickly with minimal data.In meta-learning, the goal is to develop a learning algorithm that can learn how to learn, meaning that it can learn how to adapt to new tasks and new environments quickly and efficientlyThe basic idea behind meta-learning is to train a model on a diverse set of tasks and use this experience to learn how to generalize to new tasks with minimal training.Meta-learning algorithms typically involve two stages: an inner loop and an outer loop. In the inner loop, the model is trained on a specific task with a limited amount of data. In the outer loop, the model is evaluated on its performance across a range of tasks, and the learning algorithm is updated to improve its ability to generalize to new tasks.One common approach to meta-learning is to use gradient-based optimization methods, such as gradient descent, to learn a set of model parameters that can be quickly adapted to newtasks. Another approach is to use memory-augmented neural networks, which allow the model to store and retrieve information about previous tasks to facilitate learning on new tasks.
Background image
Topics in Deep LearningSiamese NetworksOne popular approach to meta-learning is the use of siamese networks.Siamese networks are a type of neural network architecture that involves training two identical networks on separate but similar inputs, and then using the learned representations to compare and classify new inputs.This is often used in tasks such as face recognition, where the networks are trained on pairsof images and learn to identify whether they contain the same person or not.In the context of meta-learning, Siamese networks can be used to learn how to quickly adapt to new tasks.For example, imagine we have a set of tasks, each with a small amount of labeled data.We can use a Siamese network to learn a shared representation of the data across all tasks,such that the distance between two inputs in the shared space reflects their similarity or dissimilarity.This shared representation can then be used to quickly adapt to new tasks with minimalamounts of labeled data.
Background image
Topics in Deep LearningSiamese Networks
Background image
Topics in Deep LearningSiamese Networks- APPLICATIONSignature Verification With Siamese Networks
Background image
Topics in Deep LearningMeta LearningTo do this, we first train the Siamese network on a set of tasks using a contrastive loss function that encourages the network to learn similar representations for similar inputs and dissimilar representations for dissimilar inputs.We can then fine-tune the network on a new task by simply updating the final layers of the network with the labeled data from that task while keeping the shared representation fixed.This allows the network to quickly learn a task-specific classifier while leveraging the knowledge learned from previous tasks.
Background image
Topics in Deep LearningSiamese NetworkLet’s dive deeper into Siamese NetworksSimply put, A SN takes two images, and output a probability of how similar they areA Siamese network is a type of neural network architecture that contains two or moreidentical subnetworks.Eachsubnetworkhasthesamearchitectureandparameters,andtheyaretrained simultaneously on different but related tasks.The subnetworks are then combined to form a single network that can be used for a varietyof tasks, such as classification, regression, or similarity matching.The Siamese network is commonly used in tasks that involve measuring the similarity or dissimilarity between two inputs, such as image or text matching.The architecture of a Siamese network can be divided into three main components:input processing,shared feature extraction, anddistance calculation.
Background image
Topics in Deep LearningSiamese Network1.Input Processing:The first step in a Siamese network is to process the input data.This can involve any preprocessing steps necessary for the specific task at hand.For example, in an image matching task, the input images may need to be resized or normalized.2.Shared Feature Extraction:After the input data is processed, it is fed into two or more identical subnetworks.Each subnetwork consists of multiple layers of neurons that are trained to extract meaningful features from the input data.These subnetworks share the same architecture and weights, so they learn the same features from theinput data.The feature extraction process can be expressed mathematically as:h1 = f1(x1; theta1)h2 = f2(x2; theta1)where x1 and x2 are the input data, f1 and f2 are the subnetworks with parameters theta1, and h1 and h2 are the extracted features.
Background image
Topics in Deep LearningSiamese Network3.Distance Calculation:Once the features are extracted from the input data, the distance between the features is calculated to determine their similarity or dissimilarity.The distance metric can vary depending on the specific task at hand, but common choicesinclude Euclidean distance, Manhattan distance, or cosine similarity.The distance calculation process can be expressed mathematically as: d = g(h1, h2; theta2)where g is a function that takes the extracted features h1 and h2 as input, and outputs the distance d between them. The parameters theta2 are learned during training to optimize the distance metric for the specific task at hand.
Background image
Topics in Deep LearningOutput of Siamese NetworkThe output of a Siamese network can be a single distance value or a probability score, depending on the specific task at hand.For example, in an image matching task, the output could be a distance value that represents the similarity between two images.In a classification task, the output could be a probability score that indicates the likelihood oftwo inputs belonging to the same class.
Background image
Topics in Deep LearningSiamese NetworkIn a Siamese network, backpropagation is used to optimize the parameters of the subnetworks to minimize the loss function. The loss function is typically designed to measure the similarity or dissimilarity between the two input samples.Let's assume we have a Siamese network with two subnetworks thatcorrespondingfeaturevectors(h1andtakesinpairsofinputsamples(x1andx2)andoutputstheirh2).Thelossfunctioniscalculated based on the distance between the feature vectors, as follows:L(y, y-hat) = y * (1/2)*d^2 + (1-y) * (1/2)*max(0, m-d)^2where y is the ground truth label indicating whether the two input samples are similar (1) or dissimilar (0), y-hat is the predicted label, d is the distance between the feature vectors of the two input samples, and m is a margin that controls the separation between the similar and dissimilar samples.
Background image
Topics in Deep LearningSiamese NetworkThe first term in the loss function penalizes the network when similar samples have a large distance between their feature vectors, while the second term penalizes the network when dissimilar samples have a small distance between their feature vectors. The margin m is used to control how much separation is required between the two types of samples.During training, the parameters of the subnetworks are updated using backpropagation to minimize the loss function. The gradient of the loss with respect to the parameters is calculated using the chain rule, and the weights are updated using an optimizer such as stochastic gradient descent (SGD) or Adam.The backpropagation algorithm calculates the gradients for each of the weights in the subnetworks. The gradients are then used to update the weights in the direction of steepest descent in order to minimize the loss function. This process is repeated for each batch of training data until the network converges to a minimum of the loss function.
Background image
Topics in Deep LearningSiamese NetworksDisadvantages1.Computational complexity: GNNs can be computationally expensive, especially for large graphs with many nodes and edges. This can make training and inference time-consuming and resource-intensive.2.Limited scalability: GNNs may not be suitable for extremely large graphs due to memory limitations and computational complexity.3.Difficulty in handling dynamic graphs: GNNs assume that the graph structure is fixed and known a priori. Handling dynamic graphs, where the structure of the graph changes over time, is still an active area of research.4.Lack of standardization: There is currently no standard architecture or training procedure for GNNs, which can make it difficult to compare results across different studies and applications.
Background image
Topics in Deep LearningAcknowledgements & Referenceshttps://distill.pub/2021/gnn-intro/
Background image
THANK YOU
Background image
UE20CS342Topicsin Deep Learning
Background image
UE20CS342Graph Neural Network - GNN
Background image
Topics in Deep LearningMotivation
Background image
Topics in Deep LearningMotivationImagine you are working for a social media company that wants to recommend new friends to its users.We can use other methods of deep learning for social media recommendation,One issue with traditional neural networks is that they are designed to process fixed-size data, such as images or text. However, social networks are inherently variable in size and structure, which makes it difficult to use traditional neural networksAnother issue with traditional neural networks is that they may not be able to effectively capture the relational information that is present in the social network.Furthermore, social network graphs are usually sparse, meaning that there are many missing connections between nodes.
Background image
Topics in Deep LearningMotivationOne way to do this is to analyze the social network graph, which is a graph where each node represents a user, and each edge represents a connection between two users (for example, if they are friends or follow each other).To recommend new friends, you need to analyze the graph and identify users who are similar to each other based on their connections. However, this is not a straightforward task because the graph is large and complex, and the connections between users are constantly changing.This is where Graph Neural Networks (GNNs) come in. GNNs are a type of machine learning model that can be used to analyze graph-structured data like social network graphs. They can identify patterns in the graph and make predictions based on those patterns.For example, a GNN can learn to predict which users are likely to be friends based on their shared connections in the graph. It can also learn to recommend new friends to users based on their similarities to other users in the graph.
Background image
Topics in Deep LearningGraph Neural Network - GNNTo start, let’s establish what a graph is. A graph represents the relations(edges) between a collection of entities (nodes).
Background image
Topics in Deep LearningGNNTo further describe each node, edge or the entire graph, we can store information in each of these pieces of the graph.We can additionally specialize graphs by associating directionality to edges (directed, undirected).
Background image
Topics in Deep LearningGraphs and where to find them (some examples)Images as graphsAnother way to think of images is as graphs with regular structure, where each pixel represents a node and is connected via an edge to adjacent pixels. Each non-border pixel has exactly 8 neighbors, and the information stored at each node is a 3-dimensional vector representing the RGB value of the pixel.
Background image
Topics in Deep LearningGraphs and where to find them (some examples)https://distill.pub/2021/gnn-intro/
Background image
Topics in Deep LearningGraphs and where to find them (some examples)Text as graphsWe can digitize text by associating indices to each character, word, or token, and representing text as a sequence of these indices. This creates a simple directed graph, where each character or index is a node and is connected via an edge to the node that follows it.
Background image
Topics in Deep LearningGraphs and where to find them (some examples)Molecules as graphsIt’s a very convenient and common abstraction to describe this 3D object as a graph,where nodes are atoms and edges are covalent bonds
Background image
Topics in Deep LearningGraphs tasksThere are three general types of prediction tasks on graphs: graph-level, node-level, and edge- level.In a graph-level task, we predict a single property for a whole graph.For a node-level task, we predict some property for each node in a graph.For an edge-level task, we want to predict the property or presence of edges in a graph.Graph levelIn a graph-level task, our goal is to predict the property of an entire graph. For example, for a molecule represented as a graph, we might want to predict what the molecule smells like, or whether it will bind to a receptor implicated in a disease.
Background image
Topics in Deep LearningThree general types of prediction tasks on graphs: graph-level, node-level, and edge-level.
Background image
Topics in Deep LearningGraphs tasks-Node level tasks
Background image
Topics in Deep LearningGraphs tasks- Edge level tasksOne example of edge-level inference is in image scene understanding.Beyond identifying objects in an image, deep learning models can be used to predict the relationship between them.We can phrase this as an edge-level classification: given nodes that represent the objects in the image, we wish to predict which of these nodes share an edge or what the value of that edge is.If we wish to discover connections between entities, we could consider the graph fully connected and based on their predicted value prune edges to arrive at a sparse graph.
Background image
Topics in Deep LearningGraphs tasks- Edge level tasks
Background image
Topics in Deep LearningGNNA GNN is an optimizable transformation on all attributes of the graph (nodes, edges, global-context) that preserves graph symmetries (permutation invariances)GNNs adopt a “graph-in, graph-out” architecture meaning that these model types accept a graph as input, with information loaded into its nodes, edges and global-context, and progressively transform these embeddings, without changing the connectivity of the input graph.Here is a simple GNN
Background image
Topics in Deep LearningGNN - Basic overview of the GNN architecture-message passing neural network”frameworkA basic GNN architecture consists of the following components:1.Node Embedding: Each node in the graph is represented as a low-dimensional vector, called a node embedding. These embeddings capture the features of the node, such as its attributes, connections, orposition in the graph.2.Message Passing: The key idea behind GNNs is to update the node embeddings by aggregating information from their neighboring nodes. This is done through a message passing algorithm, where each node sends and receives messages to and from its neighbors, and updates its embedding based on the received messages. The messages typically consist of the embeddings of neighboring nodes and some edge-specific information.3.Node Aggregation: After receiving messages from their neighbors, each node aggregates the received information into a new embedding vector, by applying some aggregation function, such as summation or averaging.4.Readout: Finally, the aggregated node embeddings are used to make predictions or solve the task at hand. This is done through a readout function that maps the aggregated embeddings to the desired output.The above steps are typically repeated for multiple iterations or layers, each time updating the node embeddings
Background image
Topics in Deep LearningGNN- How do Graph Neural Networks work?1. Node EmbeddingEach node in the graph is represented as a low-dimensional vector, called a node embedding. These embeddings capture the features of the node, such as its attributes, connections, or position in the graph.
Background image
Topics in Deep LearningGNN- How do Graph Neural Networks work?2.Message Passing:The key idea behind GNNs is to update the node embeddings by aggregating information from their neighboring nodes. This is done through a message passing algorithm, where each node sends and receives messages to and from its neighbors, and updates its embedding based on the received messages. The messages typically consist of the embeddings of neighboring nodes and some edge-specific information.
Background image
Topics in Deep LearningGNN- Steps performed on every nodeWhat is happening in the message passage layers?Similar to filter in CNN
Background image
Topics in Deep LearningGNN- Steps performed on every nodeWhat is happening in the message passage layers?
Background image
Topics in Deep LearningGNN- Steps performed on every nodeWhat is happening in the message passage layers?
Background image
Topics in Deep LearningGNN- Steps performed on every node3. Aggregate
Background image
Topics in Deep LearningGNN- Steps performed on every node4. Update
Background image
Topics in Deep LearningGNN- Steps performed on every nodeWhat is happening in the message passage layers?
Background image
Topics in Deep LearningGNN- Steps performed on every nodeWhat is happening in the message passage layers?
Background image
Topics in Deep LearningGNN- Steps performed on every nodeWhat is happening in the message passage layers?
Background image
Topics in Deep LearningGNN- Steps performed on every nodeWhat is happening in the message passage layers?
Background image
Topics in Deep LearningGNN- Steps performed on every nodeA single Graph Neural Network (GNN) layer has a bunch of steps that’s performed on every nodein the graph:1.Message Passing2.Aggregation3.UpdateTogether, these form the building blocks that learn over graphs. Innovations in GDL mainly involve changes to these 3 steps.Each node has features which can be represented using vectors in Rd. This vector is either a latent-dimensional embedding or is constructed in a way where each entry is a different property of the entity.For instance, in a social media graph, a user node has the properties of age, gender,political inclination, relationship status, etc. that can be represented numerically.Likewise, in a molecule graph, an atom node might have chemical properties like affinity to water, forces, energies, etc. that can also be represented numerically.Formally, every node i has associated node features xiRd and labels yi
Background image
Topics in Deep LearningGNN - Message passingGNNs are known for their ability to learn structural information. Usually, nodes with similar features orproperties are connected to each other (this is true in the social media setting). The GNN exploits this fact andlearns how and why specific nodes connect to one other while some do not. To do so, the GNN looks at the Neighbourhoods of nodes.
Background image
Topics in Deep LearningGNN - Message passingA person is shaped by the circle he is in. Similarly, a GNN can learn a lot about a node i by looking at the nodes in its neighbourhood Ni. To enable this sharing of information between a source node i and its neighbours j, GNNs engage in Message Passing.For a GNN layer, Message Passing is defined as the process of taking node features of the neighbours, transforming them, and “passing” them to the source node. This process is repeated, in parallel, for all nodes in the graph. In that way, all neighbourhoods are examined by the end of this step.
Background image
Topics in Deep LearningGNN - Message passingLet’s zoom into node 6 and examine the neighbourhood N6={1, 3, 4}. We take each of the node features x1, x3, and x4, and transform them using a function F, which can be a simple neural network (MLP or RNN) or affine transformSimply put, a “message” is the transformed node feature coming in from source node.F can be a simple affine transform or neural network.For now, let’s sayfor mathematical convenience. Here, □□ represents simple matrix multiplication.
Background image
Topics in Deep LearningGNN - Aggregation
Background image
Topics in Deep LearningGNN - UpdateUsing these aggregated messages, the GNN layer now has to update the source node i’s features. At the end of this update step, the node should not only know about itself but its neighbours as well. This is ensured by taking the node i’s feature vector and combining it with the aggregated messages. Again, a simple addition or concatenation operation takes care of this.
Background image
Topics in Deep LearningGNN - Message passing
Background image
Topics in Deep LearningGNN - Stacking layers
Background image
Topics in Deep LearningGNN – Problems GNN can solveNode ClassificationOne of the powerful applications of GNNs is adding new information to nodes or filling gaps where information is missing.In node classification, the task is to predict the node embedding for every node in a graph, i.e to determine the labelling of samples (represented as nodes) by looking at the labels of their neighboursThis type of problem is usually trained in a semi-supervised way,where only part of the graph is labelled.Typical applications for node classification include citation networks, Reddit posts, YouTube videos, etc.For example, say you are running a social network and you have spotted a few bot accounts. Now you want to find out if there are other bot accounts in your network. You can train a GNN to classify other users in the social network as “bot” or “not bot” based on how close their graph embeddings are to those of the known bots.
Background image
Topics in Deep LearningGNN - AplicationsGraph Neural Networks (GNNs) have a wide range of applications in various domains. Here are some examples:Social network analysis:GNNs can be used for various tasks in social network analysis, such as node classification, link prediction, community detection, and recommendation systems.For example, in node classification, the task is to predict the category or label of a node in the network based on its connections to other nodes. GNNs can be used to learn node representations that capture the structural information of the network, which can then be used to classify the nodes.Similarly, in link prediction, the task is to predict whether there will be a link between two nodes in the future.GNNs can be used to learn representations of the nodes and the links between them, which can then be used to predict whether a link will exist or not.
Background image
Topics in Deep LearningGNN - AplicationsGraph Neural Networks (GNNs) have a wide range of applications in various domains. Here are some examples:Chemistry and drug discovery:GNNs can be used for various tasks in chemistry and drug discovery, such as molecular property prediction, drug design, and chemical reaction prediction.For example, in molecular property prediction, the task is to predict the properties of a molecule, such as its solubility, toxicity, or bioactivity.GNNs can be used to learn representations of the atoms and bonds in the molecule, which can then be used to predict the molecule's properties.Similarly, in drug design, the task is to design new molecules that can bind to a specific target protein.GNNs can be used to generate new molecules by iteratively modifying existing ones and evaluating their properties using a GNN-based molecular property predictor.
Background image
Topics in Deep LearningGNN – Aplications- Natural language processing:GNNs can be used for various tasks in natural language processing, such as text classification,sentiment analysis, and machine translation.For example, in text classification, the task is to classify a text document into one or more predefined categories. GNNs can be used to learn representations of the words or phrases in the document, which can then be used to classify the document.Similarly, in sentiment analysis, the task is to determine the sentiment or emotion expressed in atext document.GNNs can be used to learn representations of the words or phrases and their relationships with other words or phrases in the document, which can then be used to predict the sentiment.
Background image
Topics in Deep LearningGNN – Aplications- Computer vision and image processing:GNNs can be used for various tasks in computer vision and image processing, such as image segmentation, object detection, and image captioning.For example, in image segmentation, the task is to partition an image into regions that correspond todifferent objects or parts of objects.GNNs can be used to propagate information between adjacent pixels or regions to refine the segmentation. Similarly, in object detection, the task is to detect the presence and location of objects in an image.GNNs can be used to learn representations of the objects and their relationships with other objects in theimage, which can then be used to detect the objects.
Background image
Topics in Deep LearningGNN- Advantages1.Ability to handle structured data: GNNs are designed to work with structured data such as graphs, which is a common representation for many real-world problems, such as social network analysis, recommendation systems, and molecular design.2.Incorporation of graph topology: GNNs can capture the graph topology by using the node connections as a way to propagate information through the network, which makes them more powerful than traditional neural networks that treat input data as a set or a sequence.3.Transfer learning: GNNs can leverage the pre-trained graph representations on one task and apply them to another related task. This can save computational resources and improve performance on the target task.4.Interpretable: The message passing process in GNNs can be interpreted as passinginformation between nodes in a graph, which makes them more interpretable than traditional neural networks.
Background image
Topics in Deep LearningGNN- Disadvantages1.Computational complexity: GNNs can be computationally expensive, especially for large graphs with many nodes and edges. This can make training and inference time-consuming and resource-intensive.2.Limited scalability: GNNs may not be suitable for extremely large graphs due to memory limitationsand computational complexity.3.Difficulty in handling dynamic graphs: GNNs assume that the graph structure is fixed and known a priori. Handling dynamic graphs, where the structure of the graph changes over time, is still an active area of research.4.Lack of standardization: There is currently no standard architecture or training procedure for GNNs, which can make it difficult to compare results across different studies and applications.
Background image
Topics in Deep LearningAcknowledgements & Referenceshttps://distill.pub/2021/gnn-intro/https://www.youtube.com/watch?v=ABCGCf8cJOE&list=PLV8yxwGOxvvoNkzPfCx2i8an--Tkt7O8Z&index=2https://rish-16.github.io/posts/gnn-math/
Background image
THANK YOU
Background image