University of Michigan**We aren't endorsed by this school
Course
DATASCI 315
Subject
Computer Science
Date
Dec 20, 2024
Pages
7
Uploaded by dodo225
Notebook credit: Based on the original D2L notebook here.Often, as we process images, we want to gradually reduce the spatial resolution of our hiddenrepresentations, aggregating information so that the higher up we go in the network, the larger thereceptive ±eld (in the input) to which each hidden node is sensitive.Often our ultimate task asks some global question about the image, e.g., does it contain a cat?Sotypically the units of our ±nal layer should be sensitive to the entire input. By gradually aggregatinginformation, yielding coarser and coarser maps, we accomplish this goal of ultimately learning aglobal representation, while keeping all of the advantages of convolutional layers at theintermediate layers of processing.Moreover, when detecting lower-level features, such as edges, we often want our representations tobe somewhat invariant to translation. For instance, if we take the image Xwith a sharp delineationbetween black and white and shift the whole image by one pixel to the right, i.e., Z[i, j] = X[i, j+ 1], then the output for the new image Zmight be vastly different. The edge will have shifted byone pixel. In reality, objects hardly ever occur exactly at the same place. In fact, even with a tripodand a stationary object, vibration of the camera due to the movement of the shutter might shifteverything by a pixel or so (high-end cameras are loaded with special features to address thisproblem).This section introduces pooling layers, which serve the dual purposes of mitigating the sensitivity ofconvolutional layers to location and of spatially downsampling representations.PoolingLike convolutional layers, poolingoperators consist of a ±xed-shape window that is slid over allregions in the input according to its stride, computing a single output for each location traversed bythe ±xed-shape window (sometimes known as the pooling window). However, unlike the cross-correlation computation of the inputs and kernels in the convolutional layer, the pooling layercontains no parameters (there is no kernel). Instead, pooling operators are deterministic, typicallycalculating either the maximum or the average value of the elements in the pooling window. Theseoperations are called maximum pooling(max poolingfor short) and average pooling, respectively.In both cases, as with the cross-correlation operator, we can think of the pooling window as startingfrom the upper-left of the input tensor and sliding across the input tensor from left to right and topMaximum Pooling and Average Pooling12/17/24, 2:59 AMlec20_pooling_demo.ipynb - Colabhttps://colab.research.google.com/drive/13ylXByIHjYMiLNG2fYZkzHehgjwtkz6l?usp=sharing#printMode=true1/7
to bottom. At each location that the pooling window hits, it computes the maximum or averagevalue of the input subtensor in the window, depending on whether max or average pooling isemployed.The output tensor has a height of 2 and a width of 2. The four elements are derived from themaximum value in each pooling window:A pooling layer with a pooling window shape of is called a pooling layer. The poolingoperation is called pooling.Let us return to the object edge detection example mentioned at the beginning of this section. Nowwe will use the output of the convolutional layer as the input for maximum pooling. Set theconvolutional layer input as Xand the pooling layer output as Y. Whether or not the values of X[i,j]and X[i, j + 1]are different, or X[i, j + 1]and X[i, j + 2]are different, the poolinglayer always outputs Y[i, j] = 1. That is to say, using the maximum pooling layer, we canstill detect if the pattern recognized by the convolutional layer moves no more than one element inheight or width.In the code below, we (implement the forward propagation of the pooling layer) in the pool2dfunction. This function is similar to the corr2dfunction. However, here we have no kernel,computing the output as either the maximum or the average of each region in the input.max(0, 1, 3, 4) = 4,max(1, 2, 4, 5) = 5,max(3, 4, 6, 7) = 7,max(4, 5, 7, 8) = 8.×××2 × 22 × 2import tensorflow as tfdef pool2d(X, pool_size, mode='max'):p_h, p_w = pool_sizeY = tf.Variable(tf.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w + 1)))for i in range(Y.shape[0]):for j in range(Y.shape[1]):if mode == 'max':Y[i, j].assign(tf.reduce_max(X[i: i + p_h, j: j + p_w]))elif mode =='avg':Y[i, j].assign(tf.reduce_mean(X[i: i + p_h, j: j + p_w]))return Y12/17/24, 2:59 AMlec20_pooling_demo.ipynb - Colabhttps://colab.research.google.com/drive/13ylXByIHjYMiLNG2fYZkzHehgjwtkz6l?usp=sharing#printMode=true2/7
We can construct the input tensor Xin the ±gure above to [validate the output of the two-dimensional maximum pooling layer].X = tf.constant([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])pool2d(X, (2, 2))<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=array([[4., 5.],[7., 8.]], dtype=float32)>Also, we experiment with (the average pooling layer).pool2d(X, (2, 2), 'avg')<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=array([[2., 3.],[5., 6.]], dtype=float32)>As with convolutional layers, pooling layers can also change the output shape. And as before, wecan alter the operation to achieve a desired output shape by padding the input and adjusting thestride. We can demonstrate the use of padding and strides in pooling layers via the built-in two-dimensional maximum pooling layer from the deep learning framework. We ±rst construct an inputtensor Xwhose shape has four dimensions, where the number of examples (batch size) andnumber of channels are both 1.[Padding and Stride]It is important to note that tensor²ow prefers and is optimized for channels-lastinput.X = tf.reshape(tf.range(16, dtype=tf.float32), (1, 4, 4, 1))X<tf.Tensor: shape=(1, 4, 4, 1), dtype=float32, numpy=array([[[[ 0.],[ 1.],[ 2.],[ 3.]],[[ 4.],[ 5.],[ 6.],[ 7.]],12/17/24, 2:59 AMlec20_pooling_demo.ipynb - Colabhttps://colab.research.google.com/drive/13ylXByIHjYMiLNG2fYZkzHehgjwtkz6l?usp=sharing#printMode=true3/7
[[ 8.],[ 9.],[10.],[11.]],[[12.],[13.],[14.],[15.]]]], dtype=float32)>By default, (the stride and the pooling window in the instance from the framework's built-in classhave the same shape.) Below, we use a pooling window of shape (3, 3), so we get a stride shapeof (3, 3)by default.pool2d = tf.keras.layers.MaxPool2D(pool_size=[3, 3])pool2d(X)<tf.Tensor: shape=(1, 1, 1, 1), dtype=float32, numpy=array([[[[10.]]]], dtype=float32)>[The stride and padding can be manually speci±ed.]paddings = tf.constant([[0, 0], [1,0], [1,0], [0,0]])X_padded = tf.pad(X, paddings, "CONSTANT")X, X_padded(<tf.Tensor: shape=(1, 4, 4, 1), dtype=float32, numpy=array([[[[ 0.],[ 1.],[ 2.],[ 3.]],[[ 4.],[ 5.],[ 6.],[ 7.]],[[ 8.],[ 9.],[10.],[11.]],[[12.],[13.],[14.],[15.]]]], dtype=float32)>,<tf.Tensor: shape=(1, 5, 5, 1), dtype=float32, numpy=array([[[[ 0.],12/17/24, 2:59 AMlec20_pooling_demo.ipynb - Colabhttps://colab.research.google.com/drive/13ylXByIHjYMiLNG2fYZkzHehgjwtkz6l?usp=sharing#printMode=true4/7
[ 7.]],[[13.],[15.]]]], dtype=float32)>When processing multi-channel input data, [the pooling layer pools each input channel separately],rather than summing the inputs up over channels as in a convolutional layer. This means that thenumber of output channels for the pooling layer is the same as the number of input channels.Below, we will concatenate tensors Xand X + 1on the channel dimension to construct an inputwith 2 channels.Multiple ChannelsNote that this will require a concatenation along the last dimension for TensorFlow due to thechannels-last syntax.X = tf.concat([X, X + 1], 3)# Concatenate along `dim=3` due to channels-last syntaX<tf.Tensor: shape=(1, 4, 4, 2), dtype=float32, numpy=array([[[[ 0., 1.],[ 1., 2.],[ 2., 3.],[ 3., 4.]],[[ 4., 5.],[ 5., 6.],[ 6., 7.],[ 7., 8.]],[[ 8., 9.],[ 9., 10.],[10., 11.],[11., 12.]],[[12., 13.],[13., 14.],[14., 15.],[15., 16.]]]], dtype=float32)>As we can see, the number of output channels is still 2 after pooling.12/17/24, 2:59 AMlec20_pooling_demo.ipynb - Colabhttps://colab.research.google.com/drive/13ylXByIHjYMiLNG2fYZkzHehgjwtkz6l?usp=sharing#printMode=true6/7
paddings = tf.constant([[0, 0], [1,0], [1,0], [0,0]])X_padded = tf.pad(X, paddings, "CONSTANT")pool2d = tf.keras.layers.MaxPool2D(pool_size=[3, 3], padding='valid',strides=2)pool2d(X_padded)<tf.Tensor: shape=(1, 2, 2, 2), dtype=float32, numpy=array([[[[ 5., 6.],[ 7., 8.]],[[13., 14.],[15., 16.]]]], dtype=float32)>Taking the input elements in the pooling window, the maximum pooling operation assigns themaximum value as the output and the average pooling operation assigns the average value asthe output.One of the major bene±ts of a pooling layer is to alleviate the excessive sensitivity of theconvolutional layer to location.We can specify the padding and stride for the pooling layer.Maximum pooling, combined with a stride larger than 1 can be used to reduce the spatialdimensions (e.g., width and height).The pooling layer's number of output channels is the same as the number of input channels.Summary12/17/24, 2:59 AMlec20_pooling_demo.ipynb - Colabhttps://colab.research.google.com/drive/13ylXByIHjYMiLNG2fYZkzHehgjwtkz6l?usp=sharing#printMode=true7/7