Overcoming rare-language discrimination in multi-lingual sentiment analysis

Lampert, Jasmin; Lampert, Christoph

Overcoming rare-language discrimination in multi-lingual sentiment analysis

Lampert J, Lampert C. 2022. Overcoming rare-language discrimination in multi-lingual sentiment analysis. 2021 IEEE International Conference on Big Data. Big Data: International Conference on Big Data, 5185–5192.

Download

No fulltext has been uploaded. References only!

DOI

10.1109/bigdata52589.2021.9672003

Conference Paper | Published | English

Author

Lampert, Jasmin; Lampert , Christoph^ISTA

Department

Lampert Group

Abstract

The digitalization of almost all aspects of our everyday lives has led to unprecedented amounts of data being freely available on the Internet. In particular social media platforms provide rich sources of user-generated data, though typically in unstructured form, and with high diversity, such as written in many different languages. Automatically identifying meaningful information in such big data resources and extracting it efficiently is one of the ongoing challenges of our time. A common step for this is sentiment analysis, which forms the foundation for tasks such as opinion mining or trend prediction. Unfortunately, publicly available tools for this task are almost exclusively available for English-language texts. Consequently, a large fraction of the Internet users, who do not communicate in English, are ignored in automatized studies, a phenomenon called rare-language discrimination.In this work we propose a technique to overcome this problem by a truly multi-lingual model, which can be trained automatically without linguistic knowledge or even the ability to read the many target languages. The main step is to combine self-annotation, specifically the use of emoticons as a proxy for labels, with multi-lingual sentence representations.To evaluate our method we curated several large datasets from data obtained via the free Twitter streaming API. The results show that our proposed multi-lingual training is able to achieve sentiment predictions at the same quality level for rare languages as for frequent ones, and in particular clearly better than what mono-lingual training achieves on the same data.

Publishing Year

2022

Date Published

2022-01-13

Proceedings Title

2021 IEEE International Conference on Big Data

Page

5185-5192

Conference

Big Data: International Conference on Big Data

Conference Location

Orlando, FL, United States

Conference Date

2021-12-15 – 2021-12-18

ISBN

9781665439022

IST-REx-ID

10752

Cite this

Lampert J, Lampert C. Overcoming rare-language discrimination in multi-lingual sentiment analysis. In: 2021 IEEE International Conference on Big Data. IEEE; 2022:5185-5192. doi:10.1109/bigdata52589.2021.9672003

Lampert, J., & Lampert, C. (2022). Overcoming rare-language discrimination in multi-lingual sentiment analysis. In 2021 IEEE International Conference on Big Data (pp. 5185–5192). Orlando, FL, United States: IEEE. https://doi.org/10.1109/bigdata52589.2021.9672003

Lampert, Jasmin, and Christoph Lampert. “Overcoming Rare-Language Discrimination in Multi-Lingual Sentiment Analysis.” In 2021 IEEE International Conference on Big Data, 5185–92. IEEE, 2022. https://doi.org/10.1109/bigdata52589.2021.9672003.

J. Lampert and C. Lampert, “Overcoming rare-language discrimination in multi-lingual sentiment analysis,” in 2021 IEEE International Conference on Big Data, Orlando, FL, United States, 2022, pp. 5185–5192.

Lampert, Jasmin, and Christoph Lampert. “Overcoming Rare-Language Discrimination in Multi-Lingual Sentiment Analysis.” 2021 IEEE International Conference on Big Data, IEEE, 2022, pp. 5185–92, doi:10.1109/bigdata52589.2021.9672003.

Export

Marked Publications

Open Data ISTA Research Explorer

Web of Science

View record in Web of Science®

Search this title in

Google Scholar
ISBN Search