Summary: The `neuralnet`

package requires an all numeric input data.frame / matrix. You control the hidden layers with `hidden=`

and it can be a vector for multiple hidden layers. To predict with your neural network use the `compute`

function since there is not predict function.

**Tutorial Time**: 40 minutes

**Libraries Needed**: neuralnet

This tutorial does not spend much time explaining the concepts behind neural networks. See the method page on the basics of neural networks for more information before getting into this tutorial.

**Data Needed**: http://archive.ics.uci.edu/ml/datasets/Bank+Marketing (bank.Zip)

neuralnet(formula, data, hidden = 1, threshold = 0.01, stepmax = 1e+05, rep = 1, startweights = NULL, learningrate.limit = NULL, learningrate.factor = list(minus = 0.5, plus = 1.2), learningrate=NULL, lifesign = "none", lifesign.step = 1000, algorithm = "rprop+", err.fct = "sse", act.fct = "logistic", linear.output = TRUE, exclude = NULL, constant.weights = NULL, likelihood = FALSE)

## Data Understanding

It’s important to note that **the neuralnet package requires numeric inputs** and does not play nicely with factor variables. As a result, we need to investigate which variables need to be transformed.

We can get an overall structure of our data by using `str()`

str(bnk) 'data.frame': 4521 obs. of 17 variables: $ age : int 30 33 35 30 59 35 36 39 41 43 ... $ job : Factor w/ 12 levels "admin.","blue-collar",..: 11 8 5 5 2 5 7 10 3 8 ... $ marital : Factor w/ 3 levels "divorced","married",..: 2 2 3 2 2 3 2 2 2 2 ... $ education: Factor w/ 4 levels "secondary","primary",..: 2 1 3 3 1 3 3 1 3 2 ... $ default : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ... $ balance : int 1787 4789 1350 1476 0 747 307 147 221 -88 ... $ housing : Factor w/ 2 levels "no","yes": 1 2 2 2 2 1 2 2 2 2 ... $ loan : Factor w/ 2 levels "no","yes": 1 2 1 2 1 1 1 1 1 2 ... $ month : Factor w/ 12 levels "apr","aug","dec",..: 11 9 1 7 9 4 9 9 9 1 ... $ campaign : int 1 1 1 4 1 2 1 2 2 1 ... $ poutcome : Factor w/ 4 levels "failure","other",..: 4 1 1 4 4 1 2 4 4 1 ... $ y : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

As neural networks use activation functions between -1 and +1, it’s important to scale your variables down. Otherwise, the neural network will have to spend training iterations doing that scaling for you.

#Min Max Normalization bnk$balance <-(bnk$balance-min(bnk$balance)) / (max(bnk$balance)-min(bnk$balance)) bnk$age <- (bnk$age-min(bnk$age)) / (max(bnk$age)-min(bnk$age)) bnk$previous <- (bnk$previous-min(bnk$previous)) / (max(bnk$previous)-min(bnk$previous)) bnk$campaign <- (bnk$campaign-min(bnk$campaign)) / (max(bnk$campaign)-min(bnk$campaign))

### neuralnet and the model.matrix function

In order to represent factor variables, we need to convert them into dummy variables. A dummy variable takes the N distinct values and converts it into N-1 variables. We use N-1 because the final value is represented by all dummy values set to zero.

For any given row either one or none of the dummy variables will be active with a one (1) or inactive with zero (0).

In our bank data set, the variable education has four distinct values with “primary” being the base case (i.e. the first level)

table(bnk$education) # primary secondary tertiary unknown # 678 2306 1350 187 levels(bnk$education) #[1] "primary" "secondary" "tertiary" "unknown"

In order to make this factor variable useful for the neuralnet package, we need to use the `model.matrix()`

function.

head(model.matrix(~education, data=bnk)) # (Intercept) educationsecondary educationtertiary educationunknown #1 1 0 0 0 #2 1 1 0 0 #3 1 0 1 0

The `model.matrix`

function split the education variable into **all possible values except the base case**. It adds an intercept variable which we will eventually drop. If you wanted to include that particular value, you need to `relevel()`

your data.

bnk$education <- relevel(bnk$education, ref = "secondary") head(model.matrix(~education, data=bnk)) # (Intercept) educationprimary educationtertiary educationunknown #1 1 1 0 0 #2 1 0 0 0 #3 1 0 1 0

Once we have decided on all of our numeric and factor variables, we can call the function one last time.

bnk_matrix <- model.matrix(~age+job+marital+education +default+balance+housing +loan+poutcome+campaign +previous+y, data=bnk)

We now have a matrix with 28 columns (27 excluding the intercept). Using this new variable, we can start building our neural networks.

Before we get to model building, we need to make sure all of the column names are acceptable model inputs. It looks like two columns have special characters and we need to fix that before entering it into a model.

colnames(bnk_matrix) # [1] "(Intercept)" "age" "jobblue-collar" # [4] "jobentrepreneur" "jobhousemaid" "jobmanagement" # [7] "jobretired" "jobself-employed" "jobservices" colnames(bnk_matrix)[3] <- "jobbluecollar" colnames(bnk_matrix)[8] <- "jobselfemployed"

Now that we have all of the column names cleaned up, we need to get it into a formula. We have to combine the column names (separated by a plus symbol) and then tack on the response variable.

col_list <- paste(c(colnames(bnk_matrix[,-c(1,28)])),collapse="+") col_list <- paste(c("yyes~",col_list),collapse="") f <- formula(col_list)

Finally, we’re ready to use this formula in our models.

## Modeling

With the complexity of neural networks, there are lots of options to explore in the neuralnet package. We’ll start with the default parameters and then explore a few different options.

### Default Neural Network With One Hidden Node (RPROP+)

library(neuralnet) set.seed(7896129) nmodel <- neuralnet(f,data=bnk_matrix,hidden=1, threshold = 0.01, learningrate.limit = NULL, learningrate.factor = list(minus = 0.5, plus = 1.2), algorithm = "rprop+")

- By default, you’re using the Resilient Backpropogation algorithm (RPROP+).
- This requires a
`learningrate.limit`

and`learningrate.factor`

. - The LIMIT sets the upperbound that the learning rate could reach.
- The rate factor is the multiplier that the rate will change if…
`minus`

: the model has jumped over the local minima.`plus`

: the model is going in the right direction.

RPROP is a fast algorithm and doesn’t require as much tuning as classic backpropogation since you’re not setting a static learning rate. Instead, the

`learningrate.factor`

.But as usual, you can accept the default parameters and your cod will work!

### Changing Hidden Nodes and Backprop

A neural network with a single hidden node isn’t anything better than a linear combination really. In order to change the number of hidden nodes, we simply use the

`hidden`

parameter.##### Varying RPROP Hidden Nodes and Repetitions set.seed(7896129) nn5 <- neuralnet(f,data=bnk_matrix,hidden=5)

Easy.

Now what if we wanted to change the algorithm? That requires adjusting some of the parameters.

#This results in an error!! nn_backprop <- neuralnet(f, data=bnk_matrix, algorithm = "backprop") #Error: 'learningrate' must be a numeric value, if the #backpropagation algorithm is used #This works (but likely won't converge)! nn_backprop <- neuralnet(f, data=bnk_matrix, algorithm = "backprop", learningrate = 0.0001)

The important parameter for backprop is the

`learningrate`

which is called “alpha” in the classic description of backprop.An important note is that you need to set the learningrate small enough or you will run into this error

Error in if (reached.threshold < min.reached.threshold) { : missing value where TRUE/FALSE needed

The best solution I have found is to just keep the learningrate very small.

### Multiple Hidden Layers

Deep Learning is partially about having multiple hidden layers in a neural network. The neuralnet package allows you to change the

`hidden`

parameter to a vector.#RPROP Multiple layers set.seed(1973549813) nn_rprop_multi <- neuralnet(f, data=bnk_matrix, algorithm = "rprop+", hidden=c(10,3), threshold=0.1, stepmax = 1e+06)

If your model converges or you have some parallel processing abilities, it's worth experimenting with multi-level neural networks.

### Some Advanced Features

There are a handful of other parameters that are worth looking at.

`threshold`

By default, neuralnet requires the model partial derivative error to change at least 0.01 otherwise it will stop changing.`stepmax`

will control how long your neural network trains. By default, it uses 100,000 iterations.`startweights`

is a vector of weights you want to start from. You could use this as a way of using an existing neural network and updating the weights.`lifesign`

and`lifesign.step`

provide an update for you as sit and wait for your model to finish. The "full" lifesign looks like this...

hidden: 10 thresh: 0.1 rep: 1/1 steps: 1000 min thresh: 0.5792730128 2000 min thresh: 0.4859884745 3000 min thresh: 0.4276816262 4000 min thresh: 0.4276816262 5000 min thresh: 0.4267513944 ...

Another parameter that I wanted to explain in a little more detail is the `rep`

parameter. The rep allows you to repeat your training with different starting weights (assuming you haven't defined them already).

By defining rep=5, you will get the results from 5 different neural networks.

#Creating a neural network stopped at threshold = 0.5 nn_rprop_rep5 <- neuralnet(f, data=bnk_matrix, algorithm = "rprop+", hidden=c(15), threshold=0.5, rep=5, stepmax = 1e+06) nn_rprop_rep5

5 repetitions were calculated. Error Reached Threshold Steps 4 166.1777017 0.4936484239 5613 2 166.5174888 0.4444659596 6887 5 169.9497032 0.4721955880 4167 3 173.3583823 0.4495817644 3043 1 175.6160528 0.4841837841 3205

If you go to visualize your neural network that used repetitions, be warned that, by default, it will print every repetition that converged. You can control the plotting by using the rep parameter. `plot(nn_rprop_rep5, rep = 4)`

### How to Predict With neuralnet (use compute)

For some reason, the authors of this package decided to ignore the standard predict function and instead use `compute`

instead.

output <- compute(nmodel, bnk_matrix[,-c(1,28)],rep=1) summary(output) # Length Class Mode #neurons 2 -none- list #net.result 4521 -none- numeric

Important notes on the `compute`

function.

- Do not pass any extra data when predicting. Otherwise you'll receive the error
`Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments`

. - Make sure you set up your train and test data in a similar way.
- In this example, I had to remove the first and 28th column to make it match the training data
- If you used multiple repetitions (
`rep=`

) when training, you can specify which result to use. Otherwise it defaults to the first - The output of compute produces...
`neurons`

: A list containing the weights between each node for each layer.`net.result`

: The predictions for each test input. This is what you really want.- If your neural network has multiple outputs, you'll receive a matrix with a column for each output node.

### Training a Multi-Class Neural Network

In all honesty, I had to google this and I saw this StackOverflow post and I wanted to expand on it slightly.

If you want to predict multiple classes with one neural network, you simply have to define your formula and create dummy variables for each class. Here is my quick and dirty solution.

data(iris) #Add a "fake" class to allow for all factors levels(iris$Species) <- c(levels(iris$Species),"fake") #Relevel to make the fake class the factor iris$Species <- relevel(iris$Species,ref = "fake") #Create dummy variables and remove the intercept iris_multinom <- model.matrix(~Species+Sepal.Length+ Petal.Length+Petal.Width, data=iris)[,-1] colnames(iris_multinom)[1:3] <- c("setosa","versicolor","virginica") nn_multi <- neuralnet(setosa+versicolor+virginica~ Sepal.Length+Petal.Length+Petal.Width, data=iris_multinom,hidden=5,lifesign = "minimum", stepmax = 1e+05,rep=10) res <- compute(nn_multi, iris_multinom[,-c(1:3)])

## Interpreting neuralnet Output

Lastly, there are some attributes you might want to keep around after you have built a neural network model.

`model$weights`

is a list that contains a nested list of matrices where each level is one layer. The list contains a nested list for every repetition.`model$result.matrix`

is a collapsed version of the weight results with names like "variablex.to.1layhid1". There is a column for every repetition

### That's It!

You now know just about everything on using the neuralnet package in R. There are lots of different parameters to mess around with and you can generate quite a few complicated neural network layouts with a few simple commands.