How I approached Shopee Code League — Address Elements Extraction

Ching Fhen
3 min readJul 15, 2021

--

Shopee Code League 2021 had three different competitions. One of which was Address Elements Extraction.

Github repository

chingfhen/Shopee-Address-Elements-Extraction — -Kaggle-competition (github.com)

Problem Statement

Given an Indonesian address, find its point of interest(POI) and street name.

Address Elements Extraction Dataset

Caveats: Observe that POI and street are spans of the raw_address. However, some raw_address do not have POI/street.

Approach

We can break the problem down into 2 tasks: Text Classification and Question Answering.

Text Classification

Since some addresses do not have POI/street, I fine tuned a BERT model to predict if the POI and street name exists in the raw_address. If the street/POI does not exist, we can just return an empty string as the solution.

Training data for text classification for street name
Training data for text classification for POI

The model I used was a fine-tuned text classification model shared on HuggingFace, w11wo/indonesian-roberta-base-sentiment-classifier. However, further fine-tuning is needed to tune the model to our dataset.

Question Answering:

On the other hand, if the street/POI exists, we use a question answering model to predict the answer span by asking two questions: 1. apa gunanya minat?(what is the point of interest?) 2. Siapa nama jalannya?(what is the street name?)

Predictions made by question answering model

The model I used was a fine-tuned question answering model, cahya/bert-base-indonesian-tydiqa, shared on HuggingFace. However, further fine-tuning is needed to tune the model to our dataset.

Results

Using this approach I achieved a private leaderboard score of 57% accuracy, which was ranked 150 out of 1000 competitors

Alternative approach

Rather than breaking the problem into two, this problem can also be solved using only question answering. This can be done by simply training the question answering model to predict an empty string if the POI/street does not exist.

Final words

If you are interested to know how I created the text classification and question answering datasets or how I fine-tuned the text classification and question answering models, refer to my github!

Github repository

chingfhen/Shopee-Address-Elements-Extraction — -Kaggle-competition (github.com)

--

--

Ching Fhen

An aspiring Data Scientist. An undergraduate student at Nanyang Technological University majoring in Data Science and Artificial Intelligence.