Name Prediction with RNN using PyTorch – Part I

What is Pytorch

According to official documentation, Pytorch is a Python-based scientific computing package targeted at two sets of audiences:

  • A replacement for NumPy to use the power of GPUs
  • a deep learning research platform that provides maximum flexibility and speed – built on a tape-based automatic differentiation system

It is primarily developed by Facebook’s AI research group. PyTorch can be used with Python as well as a C++. Naturally, the Python interface is more polished.

Note: This article assumes that you have a basic understanding of deep learning concepts.

Why Pytorch

PyTorch is an open source machine learning library based on the Torch library used for applications such as computer vision and natural language processing

Unlike most other popular deep learning frameworks like TensorFlow, which use static computation graphs, PyTorch uses dynamic computation, which allows greater flexibility in building complex architectures. Pytorch uses core Python concepts like classes, structures and conditional loops — that are a lot familiar, hence a lot more intuitive to understand.

What is language modeling

Language modeling is a technique where given a sequence of words, the model will predict the next words.

The Problem

Let’s take a problem of identifying the Country of a person based on their name.

We’ll use RNN for approaching the problem; But instead of word level RNN, we’ll use character level RNN. A character-level RNN reads words as a series of characters – outputting a prediction and “hidden state” at each step like normal RNN architecture.

It’s previous hidden state is fed into each next step. Once all the letters are fed into the network, we take the final hidden state as the output. This output is fed into softmax function to restrict the output values between 0 and 1.

Specifically, we’ll train on a few thousand surnames from different languages of origin, and predict which language a name is from based on the spelling:

Data Preparation

The data is available in text files with each file corresponding to each language or country.

We need to make sure that all the characters are in ASCII.

Hence we’ll have to convert all the unicode characters into ASCII.

We’ll next create a list and dictionary.

List will store all the available category. Basically this will be the file names from the dataset. Dictionary will hold category as key and list of all names under that category as values.

There are are 18 category of names and they are stored in a list called – all_categories. category_lines is a dictionary that holds category and list of all names.

Turn Names into Tensors

To process text in Pytorch we need to convert names into numbers a.k.a tensors in Pytorch.

Pytorch assumes that data to be fed in form of tensors. For our example we’ll use a batch of 1 tensor. So, we’ll have two dimensional tensor for each letter as <1 x n_letters>. All the letters will be one hot encoded.

We’ll have two helper functions that will convert letter to tensor and a word to tensor respectively.

Create a Simple Recurrent Neural Network

As discussed above, we’ll use a vanilla RNN which takes input at every time step and calculates hidden state. This hidden state will be fed into the network again when a new letter comes in during next time step.

Above equations are the standard ones used in RNN implementations. In our example, instead of having two different weights for input and hidden state, we’ll use one weight matrix itself. We’ll combine input and hidden states into a single matrix. Hidden layer is initialized to zero for every new word that is being trained and weight matrix keeps updating (learning) based on the gradient descent.

Using combined matrix (Input+hidden) we’ll create values for both output (i2o) and hidden layers (i2h).

Training the Network

We can break down the training the network into three sections

  1. Get random training example
  2. Create a function to train one word (calling RNN class created early). This function will take care of defining the error function and back propagation. It will also store loss in a list for visualization
  3. Run the train function on bunch of examples (For eg: 100,000).

Random Training Example

We’ll define a function that will randomly pick one category from the list of categories and will also pick one random word from the list of words in that category using the dictionary of words.

Here we’re generating random names from 10 random categories.

Function to train single sample

We need to define the error function to be used by the network for calculating the gradients. Negative log likelihood loss suits better for this example.

We then develop a train function and each loop of training will :

  • Create input and target tensors
  • Create a zeroed initial hidden state
  • Read each letter in and Keep hidden state for next letter
  • Compare final output to target
  • Back-propagate
  • Return the output and loss

Function to run the training on Bunch of Examples

Finally run the train function through thousands of different examples. On each iteration you could print the loss to the screen if needed.

Conclusion

In this post, we have seen an approach to create and train a RNN network using PyTorch for detecting names. In the next post we’ll focus more on the code and also about evaluation of the model