Neural Networks / Artificial Neural Networks
A Perceptron is a single layer neural network and a multi-layer perceptron is called Neural Networks. Neural networks are networks of interconnected artificial neurons thus called as Artificial neural networks (ANNs). Their structure is heavily inspired by the brain’s neuron network.
A neural network is generally used to create supervised machine learning models for classification, similar to a Logistic Regression model, and is useful in cases where Logistic Regression may not provide reasonable accuracy. Neural networks form the basis of many of the complex applications and algorithms of machine learning.
Neural networks are also used in unsupervised learning for compressed representation and dimensionality reduction.
Extended Logistic Regression
Suppose Logistic Regression isn’t working for us, and we are thinking of combining two Logistic Regression models in some way to make a more powerful model.
Let us consider two logistic regressions models m1 and m2, connected in series to form a new model.
We have the training examples with classification labels. The first model m1 will be fed with the input of the training example, and its output is an intermediate value, which is then fed into the next model m2. m2’s output should match the classification label in the training example.
Now we already know the inputs for m1 and the outputs for m2. How to obtain the intermediate values? is the main question now (i.e., output from m1 that is fed as input to m2) corresponding to training data.
Both these models m1 and m2 could be trained independently, if we have the intermediate values. The trick here is to iterate on the parameters for both m1 and m2, in a systematic manner.
The key idea is to tweak the intermediate data going as input to m2 when minimizing m2’s loss function. That is, from m2’s perspective, intermediate data is essentially treated as parameters that can be modified, and not as fixed inputs.
So when using gradient descent method, we would have to compute the gradients of intermediate data as well.
If we observe this algorithm carefully, we may discover a way to simultaneously train both m1 and m2 for faster convergence. If we consider the loss function for the combined model, it is a function of both m1’s and m2’s parameters.
So we can compute the gradient of the combined model loss function for each of m1’s and m2’s parameters and use gradient descent to simultaneously update both sets of parameters to minimize the loss function.
What we just learnt is essentially an example of an artificial neural network (ANN) with a single hidden layer!
Visualizing Neural Network Equations
Basic Summary of Neural network in various terms:
Neural Network
Neural networks are a class of machine learning algorithms used to model complex patterns in datasets using multiple hidden layers and non-linear activation functions. A neural network takes an input, passes it through multiple layers of hidden neurons (mini-functions with unique coefficients that must be learned), and outputs a prediction representing the combined input of all the neurons.
Neural networks are trained iteratively using optimization techniques like gradient descent. After each cycle of training, an error metric is calculated based on the difference between prediction and target. The derivatives of this error metric are calculated and propagated back through the network using a technique called backpropagation. Each neuron’s coefficients (weights) are then adjusted relative to how much they contributed to the total error. This process is repeated iteratively until the network error drops below an acceptable threshold.
Neuron
A neuron takes a group of weighted inputs, applies an activation function, and returns an output.
Inputs to a neuron can either be features from a training set or outputs from a previous layer’s neurons. Weights are applied to the inputs as they travel along synapses to reach the neuron. The neuron then applies an activation function to the “sum of weighted inputs” from each incoming synapse and passes the result on to all the neurons in the next layer.
Synapse
Synapses are like roads in a neural network. They connect inputs to neurons, neurons to neurons, and neurons to outputs. In order to get from one neuron to another, you have to travel along the synapse paying the “toll” (weight) along the way. Each connection between two neurons has a unique synapse with a unique weight attached to it. When we talk about updating weights in a network, we’re really talking about adjusting the weights on these synapses.
Weights
A weight represents the strength of the connection between units.
Bias
Bias terms are additional constants attached to neurons and added to the weighted input before the activation function is applied. Bias terms help models represent patterns that do not necessarily pass through the origin. For example, if all your features were 0, would your output also be zero? Is it possible there is some base value upon which your features have an effect? Bias terms typically accompany weights and must also be learned by your model.
Input Layer
Holds the data your model will train on. Each neuron in the input layer represents a unique attribute in your dataset (e.g. height, hair color, etc.).
Hidden Layer
Sits between the input and output layers and applies an activation function before passing on the results. There are often multiple hidden layers in a network. In traditional networks, hidden layers are typically fully-connected layers — each neuron receives input from all the previous layer’s neurons and sends its output to every neuron in the next layer. This contrasts with how convolutional layers work where the neurons send their output to only some of the neurons in the next layer.
Output Layer
The final layer in a network. It receives input from the previous hidden layer, optionally applies an activation function, and returns an output representing your model’s prediction.
Weighted Input
A neuron’s input equals the sum of weighted outputs from all neurons in the previous layer. Each input is multiplied by the weight associated with the synapse connecting the input to the current neuron. If there are 3 inputs or neurons in the previous layer, each neuron in the current layer will have 3 distinct weights — one for each each synapse.
Activation Functions
Activation functions live inside neural network layers and modify the data they receive before passing it to the next layer. Activation functions give neural networks their power — allowing them to model complex non-linear relationships. By modifying inputs with non-linear functions neural networks can model highly complex relationships between features. Popular activation functions include relu and sigmoid.
Activation functions typically have the following properties:
Non-linear — In linear regression we’re limited to a prediction equation that looks like a straight line. This is nice for simple datasets with a one-to-one relationship between inputs and outputs, but what if the patterns in our dataset were non-linear? (e.g. x2, sin, log). To model these relationships we need a non-linear prediction equation.¹ Activation functions provide this non-linearity.
Continuously differentiable — To improve our model with gradient descent, we need our output to have a nice slope so we can compute error derivatives with respect to weights. If our neuron instead outputted 0 or 1 (perceptron), we wouldn’t know in which direction to update our weights to reduce our error.
Fixed Range — Activation functions typically squash the input data into a narrow range that makes training the model more stable and efficient.
Loss Functions
A loss function, or cost function, is a wrapper around our model’s predict function that tells us “how good” the model is at making predictions for a given set of parameters. The loss function has its own curve and its own derivatives. The slope of this curve tells us how to change our parameters to make the model more accurate! We use the model to make predictions. We use the cost function to update our parameters. Our cost function can take a variety of forms as there are many different cost functions available. Popular loss functions include: MSE (L2) and Cross-entropy Loss.
Steps in Designing a Neural Network
Neural networks have many possible topologies. our neural network design
depends on the following factors:
- Number of layers.
• Number of nodes (circles in above diagram) in each layer, which could be a different value for each layer.
• Level of connectivity between nodes of each adjacent pair of layers (fully connected means every node is directly connected to every other node)
• Activation function decides, whether a neuron should be activated or not by calculating weighted sum and further adding bias with it. The activation functions used in each node (except for the nodes in input layer which have no activation function). Choices include but are not limited to ReLu, Sigmoid, Tanh, etc. Or, using SoftMax for output layer.
• Whether some of the weights are shared between multiple connections (e.g.,CNNs share a lot of weights.)
• Any feedback edges within the network. Recurrent Neural Networks (RNNs) make use of such feedback.
• Loss function learns to reduce the error in prediction, Loss function also known as cost function. Typically, the loss function is always the negative log likelihood function.
Several of the above characteristics define the size, complexity, and overall architecture of the network. Such parameters are called hyperparameters.
One must try out different combinations of these hyperparameters and decide which set of hyperparameters is well suited for a model based on training data, available compute capabilities, and other factors