Comparison of Conversational Corpus and News Corpus on Gender Bias in Indonesian-English Transformer Model Translation
Comparison of Conversational Corpus and News Corpus on Gender Bias in Indonesian-English Transformer Model Translation
Blog Article
Gender bias in machine translation is a significant issue that affects text translation and gender perception, often leading to misunderstandings, such as the tendency to default to using male pronouns.For example, the word "dia" in Indonesian is often translated as "he" rather than "she," even when the context suggests otherwise, as seen in the case of President Megawati.Reducing this bias requires ongoing research, particularly in understanding how different corpora affect translation accuracy.
Studies have shown that formal news corpora, which have less gender bias, produce different results compared to conversational corpora that are more altitude sunscreen informal and exhibit gender bias.This research uses a training dataset of the Indonesian-English conversational parallel corpus from Open Subtitles, which contains many gendered pronouns.Additionally, a news corpus from Tanzil, with fewer gendered words, was also used.
These corpora were sourced from Opus, widely used by previous researchers.For the testing dataset, biographies of female presidents were used, which are often translated as masculine by popular machine translation systems by default.Each corpus was trained using a Transformer model, resulting in a translation model.
Each sentence from the generated translations was then detected for gender, and chicago cubs earrings compared with the gender of sentences from the test data to evaluate accuracy.The results showed that the accuracy of gender translation from the conversational corpus was 84%, while the news corpus achieved an accuracy of 8%.