How I approached Shopee Code League — Address Elements Extraction
Shopee Code League 2021 had three different competitions. One of which was Address Elements Extraction.
Github repository
chingfhen/Shopee-Address-Elements-Extraction — -Kaggle-competition (github.com)
Problem Statement
Given an Indonesian address, find its point of interest(POI) and street name.
Caveats: Observe that POI and street are spans of the raw_address. However, some raw_address do not have POI/street.
Approach
We can break the problem down into 2 tasks: Text Classification and Question Answering.
Text Classification
Since some addresses do not have POI/street, I fine tuned a BERT model to predict if the POI and street name exists in the raw_address. If the street/POI does not exist, we can just return an empty string as the solution.
The model I used was a fine-tuned text classification model shared on HuggingFace, w11wo/indonesian-roberta-base-sentiment-classifier. However, further fine-tuning is needed to tune the model to our dataset.
Question Answering:
On the other hand, if the street/POI exists, we use a question answering model to predict the answer span by asking two questions: 1. apa gunanya minat?(what is the point of interest?) 2. Siapa nama jalannya?(what is the street name?)
The model I used was a fine-tuned question answering model, cahya/bert-base-indonesian-tydiqa, shared on HuggingFace. However, further fine-tuning is needed to tune the model to our dataset.
Results
Using this approach I achieved a private leaderboard score of 57% accuracy, which was ranked 150 out of 1000 competitors
Alternative approach
Rather than breaking the problem into two, this problem can also be solved using only question answering. This can be done by simply training the question answering model to predict an empty string if the POI/street does not exist.
Final words
If you are interested to know how I created the text classification and question answering datasets or how I fine-tuned the text classification and question answering models, refer to my github!
Github repository
chingfhen/Shopee-Address-Elements-Extraction — -Kaggle-competition (github.com)