Q & A With Transformers

Created 2 years ago

47 Views

0 Comments

@CHITHIRACH

Q & A(Question-answering) is the method of extracting answers from multiple paragraphs and a question. They can be of Closed and open type. The former extracts answer from a given paragraph or document and the latter, from a large corpus of documents like Wikipedia for a given question. Question answering systems can be develop using different types of approaches, as it is one of the most important concepts of Natural language processing.

NLP is dominated by these incredible models called transformers. The real-world implementation of transformers is carried out almost exclusively using a library called transformers built by a collection of people that refer to themselves as Hugging Face. This library is open-source, and it has an incredibly active community we can begin using state-of-the-art models from Google, OpenAI, and

We can install transformers for Python easily with:

pip install transformers

or we can install TensorFlow with:

pip install tensorflow
OR
conda install tensorflow

After that, we can find the two models we will be testing in this article — deepset/bert-base-cased-squad2 and deepset/electra-base-squad2.

Both of these models have been built by Deepset.AI hence the deepset/. They have pre-trained for Q&A on the SQuAD 2.0 dataset as denoted by squad2 at the end.

Our question-answering process evolves:

Model and tokenizer initialization

Query tokenization

Pipeline and Prediction

Model and Tokenizer Initialization - this is our very first step, here we will import transformers and initialize our model and tokenizer using deepset/bert-base-cased-squad2.

Input Data Tokenization - We’ve initialized our tokenizer, and now it’s time to feed it some text to convert into Bert-readable token IDs. Our Bert tokenizer is in-charge of converting human-readable text into Bert-friendly data, called token IDs. First, the tokenizer takes a string and splits it into tokens.

Pipeline and Prediction - Now we’ve initialized our model and tokenizer, and understood how our tokenizer is converting our human-readable strings into lists of token IDs, we can move to integrating this and asking questions but it is genuinely all we need to code a Q&A model and begin asking questions. However, let’s break-down this pipeline function a little so we can understand what is actually happening. First, we are passing the question and the context. When feeding these into a Q&A model, a specific format is expected — which looks like:

At last, the pipeline will pull out the relevant token IDs according to Bert’s start-end token index prediction. These token IDs will then be fed back into our tokenizer to be decoded into human-readable text and there we have it, our answer.

This is how transformers work as Question answering system. A lot of future research are happening now in this field and we can explore it in future.

Comments

Please login to comment.