My experiments with NLP Question Answering system on Google Cloud

After failing to run on my laptop many times, I wanted to train Question Answering model somewhere else. I just concluded it, after running the model for 7 days on Google Cloud. Here’s the summary of the experiment

What is Question Answering model?

Question Answering is a Natural Language Processing system which answers the question based on a given context.

Q&A model can be broadly classified as Open Domain and Closed Domain. An open-domain model has access to a knowledge repository which it will tap on when answering an incoming Query. On the other hand, a closed-form model doesn’t rely on pre-existing knowledge; rather, such a model requires a Context to answer a Query. It expects a paragraph of text as context and a question and it tries to answer the question based on the context. There are models that are capable of answering “No” if there’s no answer available in the context.

The base model used here, is a closed domain one and it can answer only factoid questions that begin with “who”, “where” and “when”. It can’t answer Non-factoid questions like “why” or “how” and those that involve mathematical calculations, ranking, sorting, etc


SQuAD – Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowd-workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

Compute Engine Summary on Google Cloud

Cost : Rs 5,500 (~ $80)

Training Duration : 7 days (~168 Hrs)

Configuration : n1-highmem-8 (8 vCPUs, 52 GB memory)

Model Performance

Training Dataset : 120k

Training : 2.5 Million

Epochs : 20

Dev NLL: 03.25

F1: 59.01

EM: 55.74

Loss(Negative Log Loss) on training set is reduced from ~8.45 to ~2.4. It continues to decrease with more epochs

The Loss(NLL) on dev set reduces till 1 Million examples and starts increasing from there onwards indicating overfitting.

Findings on Question answering

After training 2.5 million examples over 20 epochs, model is able to return correct answers for good number of questions. It is also able to return NA for those where there’s no answer. See examples below:

Next Steps

  • Improving model performance
  • Hosting the model on cloud with simple UI to play around