Real-World AI

Real-World AI

Share this post

Real-World AI
Real-World AI
Fine-Tune Transformer Models For Question Answering On Custom Data

Fine-Tune Transformer Models For Question Answering On Custom Data

A tutorial on fine-tuning the Hugging Face RoBERTa QA Model on custom data and obtaining significant performance boosts

Skanda Vivek's avatar
Skanda Vivek
Jan 18, 2023
∙ Paid

Share this post

Real-World AI
Real-World AI
Fine-Tune Transformer Models For Question Answering On Custom Data
Share
Extractive Question Answering | Skanda Vivek

Question Answering and Transformers

BERT is a transformer model that took the world by storm in 2019. BERT was trained on unlabeled data by masking words and training the model to predict these masked words based on context. BERT was later fine-tuned on multiple tasks and achieved state of the art performance on many specific language tasks. In particular, BERT was fine-tuned on 100k+ question answer pairs from the SQUAD dataset, consisting of questions posed on Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding passage.

BERT Transformer Architecture from https://arxiv.org/abs/1810.04805

The RoBERTa model released soon after built on BERT by modifying key hyperparameters and improved training. The model we are interested in is the fine-tuned RoBERTA model on huggingface released by deepset which was downloaded 1M+ times last month.

As an example, let’s use data from the SubjQA dataset — containing 10,000 questions over reviews from 6 different domains: books, movies, grocery, electronics, TripAdvisor (i.e. hotels), and restaurants.

Cyber-Physical is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

In particular since I’m illustrating the power of fine-tuning, I’m going to go with questions and answers generated from movie reviews. These are conveniently split into 2 csv files for training (train.csv) and testing (test.csv).

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Skanda Vivek
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share