These slides focus on layers of neural networks.
1. Hello and welcome to this lecture on neural network layers.
2. The first layer is always the input layer. It takes the data from independent variables. Typically the data is taken in batches. Tensorflow can take only data in numerical form. This means if you have binned data or categorical data this needs to be transformed to numerical data. You can do this manually of course, but there are also functions that can take over this task for you, so that you can directly feed binned or categorical data.
3. The most simple layer type is a dense layer. Each input layer node is connected to each output layer node. It is also called fully-connected layer. And usually a non-linear activation function is applied afterwards.
4. A convolutional layer consists of filters. Sequentially a subset of the input data is processed. Finally, all nodes of the input layer are used. In the graph at the right side, you can see that for the first node of the convolutional layer, the first three points are used. Then, for the next point of the convolutional layer, the next three points are used. If there were many more points, you would sweep through all points of input layer to come up with all nodes of convolutional layer. Typically this layer is applied several times in convolutional neural networks and followed by a pooling layer. Both types will be discussed in the lecture on convolutional neural networks in more depth.
5. Other layer types will only be shown shortly. The network layout that you can see at the right side represents both types that I will explain on this slide.
a. Recurrent neural networks make use of recurrent cells. These are special because they receive their own output as an input, but with a delay. Why does this make sense? Well, it does make sense when the context of some current data depends on something that happened some time ago. A good example are words. Take the word pool as an example. It can be an area filled with water, like a swimming pool. Or it could mean the game pool billard. So it depends on the context the word is used in to get to the right conclusion.
b. Another derived type is long short term memory – LSTMs. These work similar. Here, a memory cell is used. In the context of natural language processing this means that some previous words are kept in mind. So it is used for temporal sequences and it is not surprising that this technique is used for text and speech analyses. There are many more layer types and architectures.
6. The final layer is always the output layer. Here you get one target variable or multiple variables. In the table you can see some best practices on the number of nodes and the activation function. If you have a regression problem, you will use one output node and a linear activation function. If you want to solve a multi-target regression problem, you have N output nodes, and also a linear activation function. For binary classification you have one output node and typically a sigmoid activation function. And finally, for multi-label classification you have N output nodes, where N represents the number of labels. The activation function is usually softmax.
That’s it for this lecture on deep learning layers. Thank you very much for watching and see you in the next one.
2. The first layer is always the input layer. It takes the data from independent variables. Typically the data is taken in batches. Tensorflow can take only data in numerical form. This means if you have binned data or categorical data this needs to be transformed to numerical data. You can do this manually of course, but there are also functions that can take over this task for you, so that you can directly feed binned or categorical data.
3. The most simple layer type is a dense layer. Each input layer node is connected to each output layer node. It is also called fully-connected layer. And usually a non-linear activation function is applied afterwards.
4. A convolutional layer consists of filters. Sequentially a subset of the input data is processed. Finally, all nodes of the input layer are used. In the graph at the right side, you can see that for the first node of the convolutional layer, the first three points are used. Then, for the next point of the convolutional layer, the next three points are used. If there were many more points, you would sweep through all points of input layer to come up with all nodes of convolutional layer. Typically this layer is applied several times in convolutional neural networks and followed by a pooling layer. Both types will be discussed in the lecture on convolutional neural networks in more depth.
5. Other layer types will only be shown shortly. The network layout that you can see at the right side represents both types that I will explain on this slide.
a. Recurrent neural networks make use of recurrent cells. These are special because they receive their own output as an input, but with a delay. Why does this make sense? Well, it does make sense when the context of some current data depends on something that happened some time ago. A good example are words. Take the word pool as an example. It can be an area filled with water, like a swimming pool. Or it could mean the game pool billard. So it depends on the context the word is used in to get to the right conclusion.
b. Another derived type is long short term memory – LSTMs. These work similar. Here, a memory cell is used. In the context of natural language processing this means that some previous words are kept in mind. So it is used for temporal sequences and it is not surprising that this technique is used for text and speech analyses. There are many more layer types and architectures.
6. The final layer is always the output layer. Here you get one target variable or multiple variables. In the table you can see some best practices on the number of nodes and the activation function. If you have a regression problem, you will use one output node and a linear activation function. If you want to solve a multi-target regression problem, you have N output nodes, and also a linear activation function. For binary classification you have one output node and typically a sigmoid activation function. And finally, for multi-label classification you have N output nodes, where N represents the number of labels. The activation function is usually softmax.
That’s it for this lecture on deep learning layers. Thank you very much for watching and see you in the next one.