Recognizing multimodal entailment

Ilharco, Cesar; Shirazi, Afsaneh; Gopalan, Arjun; Nagrani, Arsha; Bratanič, Blaž; Bregler, Chris; Liu, Christina; Ferreira, Felipe; Barcik, Gabriek; Ilharco, Gabriel; Osang, Georg F; Bulian, Jannis; Frank, Jared; Smaira, Lucas; Cao, Qin; Marino, Ricardo; Patel, Roma; Leung, Thomas; Imbrasaite, Vaiva

Recognizing multimodal entailment

Ilharco C, Shirazi A, Gopalan A, Nagrani A, Bratanič B, Bregler C, Liu C, Ferreira F, Barcik G, Ilharco G, Osang GF, Bulian J, Frank J, Smaira L, Cao Q, Marino R, Patel R, Leung T, Imbrasaite V. 2021. Recognizing multimodal entailment. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Tutorial Abstracts. ACL: Association for Computational Linguistics ; IJCNLP: International Joint Conference on Natural Language Processing, 29–30.

Download

2021_ACL_Ilharco.pdf 1.23 MB

Download (ext.)

https://aclanthology.org/2021.acl-tutorials.6/

DOI

10.18653/v1/2021.acl-tutorials.6

Conference Paper | Published | English

Scopus indexed

Author

Ilharco, Cesar; Shirazi, Afsaneh; Gopalan, Arjun; Nagrani, Arsha; Bratanič, Blaž; Bregler, Chris; Liu, Christina; Ferreira, Felipe; Barcik, Gabriek; Ilharco, Gabriel; Osang, Georg F^ISTA; Bulian, Jannis
All

Department

Edelsbrunner Group

Abstract

How information is created, shared and consumed has changed rapidly in recent decades, in part thanks to new social platforms and technologies on the web. With ever-larger amounts of unstructured and limited labels, organizing and reconciling information from different sources and modalities is a central challenge in machine learning. This cutting-edge tutorial aims to introduce the multimodal entailment task, which can be useful for detecting semantic alignments when a single modality alone does not suffice for a whole content understanding. Starting with a brief overview of natural language processing, computer vision, structured data and neural graph learning, we lay the foundations for the multimodal sections to follow. We then discuss recent multimodal learning literature covering visual, audio and language streams, and explore case studies focusing on tasks which require fine-grained understanding of visual and linguistic semantics question answering, veracity and hatred classification. Finally, we introduce a new dataset for recognizing multimodal entailment, exploring it in a hands-on collaborative section. Overall, this tutorial gives an overview of multimodal learning, introduces a multimodal entailment dataset, and encourages future research in the topic.

Publishing Year

2021

Date Published

2021-08-01

Proceedings Title

59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Tutorial Abstracts

Acknowledgement

We would like to thank Abby Schantz, Abe Ittycheriah, Aliaksei Severyn, Allan Heydon, Aly Grealish, Andrey Vlasov, Arkaitz Zubiaga, Ashwin Kakarla, Chen Sun, Clayton Williams, Cong Yu, Cordelia Schmid, Da-Cheng Juan, Dan Finnie, Dani Valevski, Daniel Rocha, David Price, David Sklar, Devi Krishna, Elena Kochkina, Enrique Alfonseca, Franc¸oise Beaufays, Isabelle Augenstein, Jialu Liu, John Cantwell, John Palowitch, Jordan Boyd-Graber, Lei Shi, Luis Valente, Maria Voitovich, Mehmet Aktuna, Mogan Brown, Mor Naaman, Natalia P, Nidhi Hebbar, Pete Aykroyd, Rahul Sukthankar, Richa Dixit, Steve Pucci, Tania Bedrax-Weiss, Tobias Kaufmann, Tom Boulos, Tu Tsao, Vladimir Chtchetkine, Yair Kurzion, Yifan Xu and Zach Hynes.

Page

29-30

Conference

ACL: Association for Computational Linguistics ; IJCNLP: International Joint Conference on Natural Language Processing

Conference Location

Bangkok, Thailand

Conference Date

2021-08-01 – 2021-08-06

ISBN

9-781-9540-8557-2

IST-REx-ID

10367

Cite this

Ilharco C, Shirazi A, Gopalan A, et al. Recognizing multimodal entailment. In: 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Tutorial Abstracts. Association for Computational Linguistics; 2021:29-30. doi:10.18653/v1/2021.acl-tutorials.6

Ilharco, C., Shirazi, A., Gopalan, A., Nagrani, A., Bratanič, B., Bregler, C., … Imbrasaite, V. (2021). Recognizing multimodal entailment. In 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Tutorial Abstracts (pp. 29–30). Bangkok, Thailand: Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-tutorials.6

Ilharco, Cesar, Afsaneh Shirazi, Arjun Gopalan, Arsha Nagrani, Blaž Bratanič, Chris Bregler, Christina Liu, et al. “Recognizing Multimodal Entailment.” In 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Tutorial Abstracts, 29–30. Association for Computational Linguistics, 2021. https://doi.org/10.18653/v1/2021.acl-tutorials.6.

C. Ilharco et al., “Recognizing multimodal entailment,” in 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Tutorial Abstracts, Bangkok, Thailand, 2021, pp. 29–30.

Ilharco, Cesar, et al. “Recognizing Multimodal Entailment.” 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Tutorial Abstracts, Association for Computational Linguistics, 2021, pp. 29–30, doi:10.18653/v1/2021.acl-tutorials.6.

All files available under the following license(s):

Creative Commons Attribution 4.0 International Public License (CC-BY 4.0):