Introduction
In this work we used BERT language model to identify idiomatic phrases in english language. Also we published a dataset for idiom detection. Here the problem is modelled as a token classification problem and tagged each word as/not as a part of idiom. The approach was evaluated against public datasets and got more than 90% accuracies in 7 different experiments.
Framework
Following figure depicts lead generation pipeline.
Technologies and areas
BERT, Huggingface, Token Classification, Python
Team
Gihan Gamage(me), A. Prof. Daswin De Silva, Achini Adikari, Prof. Damminda Alahakoon
Publications
A BERT-based Idiom Detection Model
2022 15th International Conference on Human System Interaction (HSI)
G Gamage, D De Silva, A Adikari, D Alahakoon