Can You Guess the Title? Generating Emoji Sequences for Movies

Authors

  • Anna BAJCSI Faculty of Mathematics and Computer Science, Babes–Bolyai University, Cluj-Napoca, Romania Email address: anna.bajcsi@stud.ubbcluj.ro
  • Barbara BOTOS Faculty of Mathematics and Computer Science, Babes–Bolyai University, Cluj-Napoca, Romania Email address: barbara.botos@stud.ubbcluj.ro
  • Péter BAJKÓ Faculty of Mathematics and Computer Science, Babes–Bolyai University, Cluj-Napoca, Romania, Email address: peter.bajko@stud.ubbcluj.ro
  • Zalán BODÓ Faculty of Mathematics and Computer Science, Babes–Bolyai University, Cluj-Napoca, Romania Email address: zbodo@cs.ubbcluj.ro https://orcid.org/0000-0002-4857-878X

DOI:

https://doi.org/10.24193/subbi.2022.1.01

Keywords:

natural language processing, emoji, keyword extraction, movie scripts, lexical matching, word embedding.

Abstract

In the culture of the present emojis play an important role in written/typed communication, having a primary role of supplementing the words with emotional cues. While in different cultures emojis can be interpreted and thus used differently, a small set of emojis have clear meaning and strong sentiment polarity. In this work we study how to map natural language texts to emoji sequences, more precisely, we automatically assign emojis to movie subtitles/scripts. The pipeline of the proposed method is as follows: first the most relevant words are extracted from the movie subtitle, and then these are mapped to emojis. In order to perform the mapping, three methods are proposed: a lexical matching-based, a word embedding-based and a combined approach. To demonstrate the viability of the approach, we list some of the generated emojis for a randomly selected movie subset, showing also the deficiencies of the method in generating guessable sequences. Evaluation is performed via quizzes completed by human participants.

Received by the editors: 3 November 2021.

2010 Mathematics Subject Classification. 68T50, 68T30.

1998 CR Categories and Descriptors. I.2.7 [ARTIFICIAL INTELLIGENCE]: Natural Language Processing – Text analysis; I.2.m [ARTIFICIAL INTELLIGENCE]:Miscellaneous.

References

Baccianella, S., Esuli, A., and Sebastiani, F. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In LREC (2010), vol. 10, pp. 2200–2204.

Bai, Q., Dan, Q., Mu, Z., and Yang, M. A systematic review of emoji: Current research and future perspectives. Frontiers in Psychology 10 (2019), 2221.

Cappallo, S., Mensink, T., and Snoek, C. G. Image2emoji: Zero-shot emoji prediction for visual media. In Proceedings of the 23rd ACM International Conference on Multimedia (2015), pp. 1311–1314.

Cappallo, S., Mensink, T., and Snoek, C. G. Query-by-emoji video search. In Proceedings of the 23rd ACM International Conference on Multimedia (2015), pp. 735–736.

Dice, L. R. Measures of the amount of ecologic association between species. Ecology 26, 3 (1945), 297–302.

Dresner, E., and Herring, S. C. Functions of the nonverbal in CMC: Emoticons and illocutionary force. Communication Theory 20, 3 (2010), 249–268.

Eisner, B., Rocktaschel, T., Augenstein, I., Bo ¨ ˇsnjak, M., and Riedel, S. Emoji2vec: Learning emoji representations from their description, 2016.

Go, A., Bhayani, R., and Huang, L. Twitter sentiment classification using distant supervision. CS224N project report, Stanford 1, 12 (2009), 2009.

Hovy, E. Text summarization. In The Oxford Handbook of Computational Linguistics, R. Mitkov, Ed. Oxford University Press, Oxford, 2004, ch. 32.

Karthik, V., Nair, D., and Anuradha, J. Opinion mining on emojis using deep learning techniques. Procedia Computer Science 132 (2018), 167–173.

Kralj Novak, P., Smailovic, J., Sluban, B., and Mozetic, I. Sentiment of emojis. PloS One 10, 12 (2015), e0144296.

Kumari, R., and Gangwar, R. Use of expression based digital pictograms in interpersonal communication: a study on social media and social apps. International Journal of Innovative Knowledge Concepts 6 (2018), 11.

Lin, C.-Y. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out (2004), ACL, pp. 74–81.

Lison, P., and Tiedemann, J. OpenSubtitles2016: Extracting large parallel corpora from movie and TV subtitles. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (Portoroz, Slovenia, May 2016), European Language Resources Association (ELRA), pp. 923–929.

Mei, Q. Decoding the new world language: Analyzing the popularity, roles, and utility of emojis. In Companion Proceedings of The 2019 World Wide Web Conference (New York, NY, USA, 2019), WWW ’19, Association for Computing Machinery, p. 417–418.

Mihalcea, R., and Csomai, A. Wikify! Linking documents to encyclopedic knowledge. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (2007), pp. 233–242.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. Efficient estimation of word representations in vector space, 2013.

Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., and Joulin, A. Advances in pre-training distributed word representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC) (2018).

Nenkova, A., and McKeown, K. A survey of text summarization techniques. In Mining Text Data. Springer, 2012, pp. 43–76.

Pennington, J., Socher, R., and Manning, C. D. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1532–1543.

Radford, W., Chisholm, A., Hachey, B., and Han, B. :telephone::person::sailboat::whale::okhand:; or “Call me Ishmael” – How do you translate emoji? In Proceedings of Australasian Language Technology Association Workshop (2016), pp. 150–154.

Schutze, H., Manning, C. D., and Raghavan, P. Introduction to information retrieval. Cambridge University Press, 2008.

Sebastiani, F. Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34, 1 (2002), 1–47.

Stark, L., and Crawford, K. The conservatism of emoji: Work, affect, and communication. Social Media + Society 1, 2 (2015), 2056305115604853.

Taggart, C. New words for old: Recycling our language for the modern world. Michael O’Mara Books, 2015.

Wang, H., and Castanon, J. A. Sentiment expression via emoticons on social media. In International Conference on Big Data (2015), IEEE, pp. 2404–2408.

Wartena, C., Brussee, R., and Slakhorst, W. Keyword extraction using word cooccurrence. In International Workshops on Database and Expert Systems Applications (2010), IEEE, pp. 54–58.

Wijeratne, S., Balasuriya, L., Sheth, A., and Doran, D. EmojiNet: An open service and API for emoji sense discovery. In Proceedings of the International AAAI Conference on Web and Social Media (2017), vol. 11.

Yadav, P., and Pandya, D. Sentireview: Sentiment analysis based on text and emoticons. In 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA) (2017), pp. 467–472.

Downloads

Published

2022-07-03

How to Cite

BAJCSI, A., BOTOS, B., BAJKÓ, P., & BODÓ, Z. (2022). Can You Guess the Title? Generating Emoji Sequences for Movies. Studia Universitatis Babeș-Bolyai Informatica, 67(1), 5–20. https://doi.org/10.24193/subbi.2022.1.01

Issue

Section

Articles