Introduction

In this work we used BERT language model to identify idiomatic phrases in english language. Also we published a dataset for idiom detection. Here the problem is modelled as a token classification problem and tagged each word as/not as a part of idiom. The approach was evaluated against public datasets and got more than 90% accuracies in 7 different experiments.

Framework

Following figure depicts lead generation pipeline.

Technologies and areas

BERT, Huggingface, Token Classification, Python

Team

Gihan Gamage(me), A. Prof. Daswin De Silva, Achini Adikari, Prof. Damminda Alahakoon

Publications

A BERT-based Idiom Detection Model
2022 15th International Conference on Human System Interaction (HSI)
G Gamage, D De Silva, A Adikari, D Alahakoon

See project on Github