Critical Digital Literacy
The promotion of digital literacy combined with critical thinking (SubProject#2) is arguably the most efficient way to tackle “fake news” and disinformation online. It has been proven to work in Finland, where five years after introducing such programs, the country has declared victory in the fight against “fake news”. Inspired by Finland’s example, this sub-project aims at promoting critical digital literacy in Qatar. We plan to achieve this through a general media literacy platform that would teach citizens and residents of Qatar how to recognize “fake news” and propaganda techniques. The platform will have lessons and exploration capabilities. It will feature tools to analyze news, social media posts, or any custom text in Arabic and English, and it would make explicit the propaganda/persuasion techniques of the discussed issues.
The tool will look for persuasion techniques such as appeal to emotions (e.g., fear, prejudices, smears, etc.) as well as logical fallacies (e.g., black & white fallacies, bandwagon, etc.). By interacting with the platform, users will become aware of the ways they can be manipulated by “fake news”, and thus they would be less likely to act based on it and also less likely to share it further, which is critical for limiting the potential impact of organized disinformation campaigns online. We will further study the role of critical digital literacy on people’s resilience to online manipulation and influence, whether legitimate, e.g., in e-commerce, or malicious, e.g., in social engineering and phishing. This literacy will also cover understanding the influence of the algorithms and the designs used in digital media, i.e., we will go beyond the literacy of how to recognize threats and how to respond to them to understand the underlying mechanics of influence and deception online.
It can be a valid argument, typically made by social media companies, that people shall be primarily responsible for managing their traits, weaknesses, worries, stress, and jealousy, whether in physical or online worlds. Generally, self-regulation is expected from users of social media. However, we argue that social media design can become too immersive and, at times, addictive. Hence, we argue that social media shall reduce triggers leading to a loss of control over their usage. Digital addiction is associated with reduced productivity and distracting sleep. Fear of missing out (FoMO) is one manifestation of how users become overly preoccupied with online spaces. We have argued that a thoughtful design process shall equip users with tools to manage it, e.g. creative versions of auto-reply, coloring schemes, and filters. Such a design can benefit those who are highly susceptible to peer pressure and possess low impulse control.
Objectives
To build a high-quality corpus annotated with propaganda and its techniques.
To develop a system for detecting the use of propaganda and its techniques in text in Arabic and English with a focus on Qatar and social media.
To develop an online platform for teaching critical digital literacy and then use the platform to study the role of critical digital literacy on people’s resilience to online manipulation and influence.
Meet Critical Digital Literacy Team members...
FIROJ ALAM
Critical Digital Literacy
WAJDI ZAGHOUANI
GEORGE MIKROS
GIOVANNI DA SAN MARTINO
MARAM HASANAIN
FATEMA AHMAD
ELISA SARTORI
University of Padova
MUAADH NOMAN
Publications
Alam, Firoj; Hasnat, Abul; Ahmad, Fatema; Hasan, Md. Arid; Hasanain, Maram
ÄrMeme: Propagandistic Content in Arabic Memes Proceedings Article
In: Al-Onaizan, Yaser; Bansal, Mohit; Chen, Yun-Nung (Ed.): Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 21071–21090, Association for Computational Linguistics, Miami, Florida, USA, 2024.
@inproceedings{alam-etal-2024-armeme,
title = {ÄrMeme: Propagandistic Content in Arabic Memes},
author = {Firoj Alam and Abul Hasnat and Fatema Ahmad and Md. Arid Hasan and Maram Hasanain},
editor = {Yaser Al-Onaizan and Mohit Bansal and Yun-Nung Chen},
url = {https://aclanthology.org/2024.emnlp-main.1173},
year = {2024},
date = {2024-11-01},
urldate = {2024-11-01},
booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
pages = {21071–21090},
publisher = {Association for Computational Linguistics},
address = {Miami, Florida, USA},
abstract = {With the rise of digital communication memes have become a significant medium for cultural and political expression that is often used to mislead audience. Identification of such misleading and persuasive multimodal content become more important among various stakeholders, including social media platforms, policymakers, and the broader society as they often cause harm to the individuals, organizations and/or society. While there has been effort to develop AI based automatic system for resource rich languages (e.g., English), it is relatively little to none for medium to low resource languages. In this study, we focused on developing an Arabic memes dataset with manual annotations of propagandistic content. We annotated $sim6K$ Arabic memes collected from various social media platforms, which is a first resource for Arabic multimodal research. We provide a comprehensive analysis aiming to develop computational tools for their detection. We made the dataset publicly available for the community.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Hasanain, Maram; Ahmad, Fatema; Alam, Firoj
Large Language Models for Propaganda Span Annotation Proceedings Article
In: Al-Onaizan, Yaser; Bansal, Mohit; Chen, Yun-Nung (Ed.): Findings of the Association for Computational Linguistics: EMNLP 2024, pp. 14522–14532, Association for Computational Linguistics, Miami, Florida, USA, 2024.
@inproceedings{hasanain-etal-2024-large,
title = {Large Language Models for Propaganda Span Annotation},
author = {Maram Hasanain and Fatema Ahmad and Firoj Alam},
editor = {Yaser Al-Onaizan and Mohit Bansal and Yun-Nung Chen},
url = {https://aclanthology.org/2024.findings-emnlp.850},
year = {2024},
date = {2024-11-01},
urldate = {2024-11-01},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2024},
pages = {14522–14532},
publisher = {Association for Computational Linguistics},
address = {Miami, Florida, USA},
abstract = {The use of propagandistic techniques in online content has increased in recent years aiming to manipulate online audiences. Fine-grained propaganda detection and extraction of textual spans where propaganda techniques are used, are essential for more informed content consumption. Automatic systems targeting the task over lower resourced languages are limited, usually obstructed by lack of large scale training datasets. Our study investigates whether Large Language Models (LLMs), such as GPT-4, can effectively extract propagandistic spans. We further study the potential of employing the model to collect more cost-effective annotations. Finally, we examine the effectiveness of labels provided by GPT-4 in training smaller language models for the task. The experiments are performed over a large-scale in-house manually annotated dataset. The results suggest that providing more annotation context to GPT-4 within prompts improves its performance compared to human annotators. Moreover, when serving as an expert annotator (consolidator), the model provides labels that have higher agreement with expert annotators, and lead to specialized models that achieve state-of-the-art over an unseen Arabic testing set. Finally, our work is the first to show the potential of utilizing LLMs to develop annotated datasets for propagandistic spans detection task prompting it with annotations from human annotators with limited expertise. All scripts and annotations will be shared with the community.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Hasanain, Maram; Hasan, Md. Arid; Ahmad, Fatema; Suwaileh, Reem; Biswas, Md. Rafiul; Zaghouani, Wajdi; Alam, Firoj
ÄrAIEval Shared Task: Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content Proceedings Article
In: Habash, Nizar; Bouamor, Houda; Eskander, Ramy; Tomeh, Nadi; Farha, Ibrahim Abu; Abdelali, Ahmed; Touileb, Samia; Hamed, Injy; Onaizan, Yaser; Alhafni, Bashar; Antoun, Wissam; Khalifa, Salam; Haddad, Hatem; Zitouni, Imed; AlKhamissi, Badr; Almatham, Rawan; Mrini, Khalil (Ed.): Proceedings of The Second Arabic Natural Language Processing Conference, pp. 456–466, Association for Computational Linguistics, Bangkok, Thailand, 2024.
@inproceedings{hasanain-etal-2024-araieval,
title = {ÄrAIEval Shared Task: Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content},
author = {Maram Hasanain and Md. Arid Hasan and Fatema Ahmad and Reem Suwaileh and Md. Rafiul Biswas and Wajdi Zaghouani and Firoj Alam},
editor = {Nizar Habash and Houda Bouamor and Ramy Eskander and Nadi Tomeh and Ibrahim Abu Farha and Ahmed Abdelali and Samia Touileb and Injy Hamed and Yaser Onaizan and Bashar Alhafni and Wissam Antoun and Salam Khalifa and Hatem Haddad and Imed Zitouni and Badr AlKhamissi and Rawan Almatham and Khalil Mrini},
url = {https://aclanthology.org/2024.arabicnlp-1.44},
year = {2024},
date = {2024-08-01},
urldate = {2024-08-01},
booktitle = {Proceedings of The Second Arabic Natural Language Processing Conference},
pages = {456–466},
publisher = {Association for Computational Linguistics},
address = {Bangkok, Thailand},
abstract = {We present an overview of the second edition of the ArAIEval shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. In this edition, ArAIEval offers two tasks: (i) detection of propagandistic textual spans with persuasion techniques identification in tweets and news articles, and (ii) distinguishing between propagandistic and non-propagandistic memes. A total of 14 teams participated in the final evaluation phase, with 6 and 9 teams participating in Tasks 1 and 2, respectively. Finally, 11 teams submitted system description papers. Across both tasks, we observed that fine-tuning transformer models such as AraBERT was at the core of the majority of the participating systems. We provide a description of the task setup, including a description of the dataset construction and the evaluation setup. We further provide a brief overview of the participating systems. All datasets and evaluation scripts are released to the research community. We hope this will enable further research on these important tasks in Arabic.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Dimitrov, Dimitar; Alam, Firoj; Hasanain, Maram; Hasnat, Abul; Silvestri, Fabrizio; Nakov, Preslav; Martino, Giovanni Da San
SemEval-2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes Proceedings Article
In: Ojha, Atul Kr.; Doğruöz, A. Seza; Madabushi, Harish Tayyar; Martino, Giovanni Da San; Rosenthal, Sara; Rosá, Aiala (Ed.): Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pp. 2009–2026, Association for Computational Linguistics, Mexico City, Mexico, 2024.
@inproceedings{dimitrov-etal-2024-semevalb,
title = {SemEval-2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes},
author = {Dimitar Dimitrov and Firoj Alam and Maram Hasanain and Abul Hasnat and Fabrizio Silvestri and Preslav Nakov and Giovanni Da San Martino},
editor = {Atul Kr. Ojha and A. Seza Doğruöz and Harish Tayyar Madabushi and Giovanni Da San Martino and Sara Rosenthal and Aiala Rosá},
url = {https://aclanthology.org/2024.semeval-1.275},
doi = {https://doi.org/10.18653/v1/2024.semeval-1.275},
year = {2024},
date = {2024-06-01},
urldate = {2024-06-01},
booktitle = {Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)},
pages = {2009–2026},
publisher = {Association for Computational Linguistics},
address = {Mexico City, Mexico},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Hasanain, Maram; Ahmad, Fatema; Alam, Firoj
Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles Proceedings Article
In: Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen (Ed.): Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 2724–2744, ELRA and ICCL, Torino, Italia, 2024.
@inproceedings{hasanain-etal-2024-gpt,
title = {Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles},
author = {Maram Hasanain and Fatema Ahmad and Firoj Alam},
editor = {Nicoletta Calzolari and Min-Yen Kan and Veronique Hoste and Alessandro Lenci and Sakriani Sakti and Nianwen Xue},
url = {https://aclanthology.org/2024.lrec-main.244},
year = {2024},
date = {2024-05-01},
urldate = {2024-05-01},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
pages = {2724–2744},
publisher = {ELRA and ICCL},
address = {Torino, Italia},
abstract = {The use of propaganda has spiked on mainstream and social media, aiming to manipulate or mislead users. While efforts to automatically detect propaganda techniques in textual, visual, or multimodal content have increased, most of them primarily focus on English content. The majority of the recent initiatives targeting medium to low-resource languages produced relatively small annotated datasets, with a skewed distribution, posing challenges for the development of sophisticated propaganda detection models. To address this challenge, we carefully develop the largest propaganda dataset to date, ArPro, comprised of 8K paragraphs from newspaper articles, labeled at the text span level following a taxonomy of 23 propagandistic techniques. Furthermore, our work offers the first attempt to understand the performance of large language models (LLMs), using GPT-4, for fine-grained propaganda detection from text. Results showed that GPT-4's performance degrades as the task moves from simply classifying a paragraph as propagandistic or not, to the fine-grained task of detecting propaganda techniques and their manifestation in text. Compared to models fine-tuned on the dataset for propaganda detection at different classification granularities, GPT-4 is still far behind. Finally, we evaluate GPT-4 on a dataset consisting of six other languages for span detection, and results suggest that the model struggles with the task across languages. We made the dataset publicly available for the community.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Gurgun, Selin; Cemiloglu, Deniz; Close, Emily Arden; Phalp, Keith; Nakov, Preslav; Ali, Raian
In: Technology in Society, vol. 76, pp. 102444, 2024.
@article{gurgun2024we,
title = {Why do we not stand up to misinformation? Factors influencing the likelihood of challenging misinformation on social media and the role of demographics},
author = {Selin Gurgun and Deniz Cemiloglu and Emily Arden Close and Keith Phalp and Preslav Nakov and Raian Ali},
url = {https://www.sciencedirect.com/science/article/pii/S0160791X2300249X },
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
journal = {Technology in Society},
volume = {76},
pages = {102444},
publisher = {Elsevier},
abstract = {This study investigates the barriers to challenging others who post misinformation on social media platforms. We conducted a survey amongst U.K. Facebook users (143 (57.2 %) women, 104 (41.6 %) men) to assess the extent to which the barriers to correcting others, as identified in literature across disciplines, apply to correcting misinformation on social media. We also group the barriers into factors and explore demographic differences amongst them. It has been suggested that users are generally hesitant to challenge misinformation. We found that most of our participants (58.8 %) were reluctant to challenge misinformation. We also identified moderating roles of age and gender in the likelihood of challenging misinformation. Older people were more likely to challenge misinformation compared to young adults while, men demonstrated a slightly greater likelihood to challenge compared to women. The 20 barriers influencing the decision to challenge misinformation, were then grouped into four main factors: social concerns, effort/interest considerations, prosocial intents, and content-related factors. We found that, controlling for age and gender, “social concerns” and “effort/interest considerations” have the significant impact on likelihood to challenge. Identified four factors were analysed in terms of demographic differences. Men ranked “effort/interest considerations” higher than women, while women placed higher importance on “content-related factors”. Moreover, older individuals were found to be more resilient to “social concerns”. The influence of educational background was most prominent in ranking “content-related factors”. Our findings provide important insights for the design of future interventions aimed at encouraging the challenging of misinformation on social media platforms, highlighting the need for tailored, demographically sensitive approaches.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Gurgun, Selin; Noman, Muaadh; Arden-Close, Emily; Phalp, Keith; Ali, Raian
How Would I Be Perceived If I Challenge Individuals Sharing Misinformation? Exploring Misperceptions in the UK and Arab Samples and the Potential for the Social Norms Approach Proceedings Article
In: International Conference on Persuasive Technology, pp. 133–150, Springer 2024.
@inproceedings{gurgun2024would,
title = {How Would I Be Perceived If I Challenge Individuals Sharing Misinformation? Exploring Misperceptions in the UK and Arab Samples and the Potential for the Social Norms Approach},
author = {Selin Gurgun and Muaadh Noman and Emily Arden-Close and Keith Phalp and Raian Ali},
url = {https://www.researchgate.net/publication/379718115_How_Would_I_Be_Perceived_If_I_Challenge_Individuals_Sharing_Misinformation_Exploring_Misperceptions_in_the_UK_and_Arab_Samples_and_the_Potential_for_the_Social_Norms_Approach},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
booktitle = {International Conference on Persuasive Technology},
pages = {133–150},
organization = {Springer},
abstract = {Research conducted in the UK explored the presence of mispercep-
tions, revealing that people anticipated more negative consequences for challeng-
ing misinformation on social media. These misperceptions include the anticipa-
tion of harming relationships, causing embarrassment and offense to others, the
belief that challenging may not yield success and the perception that such behav-
iour is unacceptable. As the UK culture is characterised as individualistic, we
replicated this investigation in a collectivistic culture- Arab societies. Our aim is
to explore the differences and similarities of these misperceptions across cultures
and to examine whether applying the social norms approach can be a solution to
address the inaction towards challenging misinformation. Comparing the UK
(N=250) and Arabs (N=212), we showed that, in both cultures there are misper-
ceptions towards challenging misinformation. While misperceptions regarding
concerning relationship costs and futility remain consistent across cultures the
concerns about causing harm to others and the acceptability of the behaviour dif-
fer. Participants in the UK show a higher concern about offense or embarrass-
ment, in contrast, participants in Arab countries exhibit higher misperceptions
about injunctive norms, perceiving challenging misinformation as less socially
acceptable than it actually is. This study also shows that participants’ likelihood
to challenge misinformation is influenced by their misperceptions about potential
harm to others and perceived injunctive norms. These findings present an oppor-
tunity to apply the social norms approach to behaviour change by addressing
these misperceptions. Messages emphasising social acceptability of correcting
misinformation and highlighting that people appreciate being corrected could
serve as powerful tools to encourage users to challenge misinformation},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
tions, revealing that people anticipated more negative consequences for challeng-
ing misinformation on social media. These misperceptions include the anticipa-
tion of harming relationships, causing embarrassment and offense to others, the
belief that challenging may not yield success and the perception that such behav-
iour is unacceptable. As the UK culture is characterised as individualistic, we
replicated this investigation in a collectivistic culture- Arab societies. Our aim is
to explore the differences and similarities of these misperceptions across cultures
and to examine whether applying the social norms approach can be a solution to
address the inaction towards challenging misinformation. Comparing the UK
(N=250) and Arabs (N=212), we showed that, in both cultures there are misper-
ceptions towards challenging misinformation. While misperceptions regarding
concerning relationship costs and futility remain consistent across cultures the
concerns about causing harm to others and the acceptability of the behaviour dif-
fer. Participants in the UK show a higher concern about offense or embarrass-
ment, in contrast, participants in Arab countries exhibit higher misperceptions
about injunctive norms, perceiving challenging misinformation as less socially
acceptable than it actually is. This study also shows that participants’ likelihood
to challenge misinformation is influenced by their misperceptions about potential
harm to others and perceived injunctive norms. These findings present an oppor-
tunity to apply the social norms approach to behaviour change by addressing
these misperceptions. Messages emphasising social acceptability of correcting
misinformation and highlighting that people appreciate being corrected could
serve as powerful tools to encourage users to challenge misinformation
Hasanain, Maram; Suwaileh, Reem; Weering, Sanne; Li, Chengkai; Caselli, Tommaso; Zaghouani, Wajdi; Barrón-Cedeño, Alberto; Nakov, Preslav; Alam, Firoj
Overview of the CLEF-2024 CheckThat! Lab Task 1 on Check-Worthiness Estimation of Multigenre Content Proceedings Article
In: Working Notes of CLEF 2024 – Conference and Labs of the Evaluation Forum, Grenoble, France, 2024.
@inproceedings{clef-checkthat:2024:task1,
title = {Overview of the CLEF-2024 CheckThat! Lab Task 1 on Check-Worthiness Estimation of Multigenre Content},
author = {Maram Hasanain and Reem Suwaileh and Sanne Weering and Chengkai Li and Tommaso Caselli and Wajdi Zaghouani and Alberto Barrón-Cedeño and Preslav Nakov and Firoj Alam},
url = {https://research.rug.nl/en/publications/overview-of-the-clef-2024-checkthat-lab-task-1-on-check-worthines},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
booktitle = {Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum},
address = {Grenoble, France},
series = {CLEF~2024},
abstract = {We present an overview of the CheckThat! Lab 2024 Task 1, part of CLEF 2024. Task 1 involves determining whether a text item is check-worthy, with a special emphasis on COVID-19, political news, and political debates and speeches. It is conducted in three languages: Arabic, Dutch, and English. Additionally, Spanish was offered for extra training data during the development phase. A total of 75 teams registered, with 37 teams submitting 236 runs and 17 teams submitting system description papers. Out of these, 13, 15 and 26 teams participated for Arabic, Dutch and English, respectively. Among these teams, the use of transformer pre-trained language models (PLMs) was the most frequent. A few teams also employed Large Language Models (LLMs). We provide a description of the dataset, the task setup, including evaluation settings, and a brief overview of the participating systems. As is customary in the CheckThat! Lab, we release all the datasets as well as the evaluation scripts to the research community. This will enable further research on identifying relevant check-worthy content that can assist various stakeholders, such as fact-checkers, journalists, and policymakers.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Struß, Julia Maria; Ruggeri, Federico; Barrón-Cedeño, Alberto; Alam, Firoj; Dimitrov, Dimitar; Galassi, Andrea; Pachov, Georgi; Koychev, Ivan; Nakov, Preslav; Siegel, Melanie; Wiegand, Michael; Hasanain, Maram; Suwaileh, Reem; Zaghouani, Wajdi
Overview of the CLEF-2024 CheckThat! Lab Task 2 on Subjectivity in News Articles Proceedings Article
In: Working Notes of CLEF 2024 – Conference and Labs of the Evaluation Forum, Grenoble, France, 2024.
@inproceedings{clef-checkthat:2024:task2,
title = {Overview of the CLEF-2024 CheckThat! Lab Task 2 on Subjectivity in News Articles},
author = {Julia Maria Struß and Federico Ruggeri and Alberto Barrón-Cedeño and Firoj Alam and Dimitar Dimitrov and Andrea Galassi and Georgi Pachov and Ivan Koychev and Preslav Nakov and Melanie Siegel and Michael Wiegand and Maram Hasanain and Reem Suwaileh and Wajdi Zaghouani},
url = {https://cris.unibo.it/handle/11585/980321},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
booktitle = {Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum},
address = {Grenoble, France},
series = {CLEF~2024},
abstract = {We present an overview of Task 2 of the seventh edition of the CheckThat! lab at the 2024 iteration of the Conference and Labs of the Evaluation Forum (CLEF). The task focuses on subjectivity detection in news articles and was offered in five languages: Arabic, Bulgarian, English, German, and Italian, as well as in a multilingual setting. The datasets for each language were carefully curated and annotated, comprising over 10,000 sentences from news articles. The task challenged participants to develop systems capable of distinguishing between subjective statements (reflecting personal opinions or biases) and objective ones (presenting factual information) at the sentence level. A total of 15 teams participated in the task, submitting 36 valid runs across all language tracks. The participants used a variety of approaches, with transformer-based models being the most popular choice. Strategies included fine-tuning monolingual and multilingual models, and leveraging English models with automatic translation for the non-English datasets. Some teams also explored ensembles, feature engineering, and innovative techniques such as few-shot learning and in-context learning with large language models. The evaluation was based on macro-averaged F1 score. The results varied across languages, with the best performance achieved for Italian and German, followed by English. The Arabic track proved particularly challenging, with no team surpassing an F1 score of 0.50. This task contributes to the broader goal of enhancing the reliability of automated content analysis in the context of misinformation detection and fact-checking. The paper provides detailed insights into the datasets, participant approaches, and results, offering a benchmark for the current state of subjectivity detection across multiple languages. },
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Alam, Firoj; Biswas, Md. Rafiul; Shah, Uzair; Zaghouani, Wajdi; Mikros, Georgios
Propaganda to Hate: A Multimodal Analysis of Arabic Memes with Multi-Agent LLMs Journal Article
In: 2024.
@article{alam2024propagandahatemultimodalanalysis,
title = {Propaganda to Hate: A Multimodal Analysis of Arabic Memes with Multi-Agent LLMs},
author = {Firoj Alam and Md. Rafiul Biswas and Uzair Shah and Wajdi Zaghouani and Georgios Mikros},
url = {https://arxiv.org/abs/2409.07246},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
booktitle = {Proceedings of The 25th International Web Information Systems Engineering Conference (WISE)},
address = {Doha, Qatar},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Barrón-Cedeño, Alberto; Alam, Firoj; Struß, Julia Maria; Nakov, Preslav; Chakraborty, Tanmoy; Elsayed, Tamer; Przybyła, Piotr; Caselli, Tommaso; Martino, Giovanni Da San; Haouari, Fatima; Li, Chengkai; Piskorski, Jakub; Ruggeri, Federico; Song, Xingyi; Suwaileh, Reem
Overview of the CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities and Adversarial Robustness Proceedings Article
In: Goeuriot, Lorraine; Mulhem, Philippe; Quénot, Georges; Schwab, Didier; Soulier, Laure; Nunzio, Giorgio Maria Di; Galuščáková, Petra; de Herrera, Alba García Seco; Faggioli, Guglielmo; Ferro, Nicola (Ed.): Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024), 2024.
@inproceedings{clef-checkthat:2024-lncs,
title = {Overview of the CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities and Adversarial Robustness},
author = {Alberto Barrón-Cedeño and Firoj Alam and Julia Maria Struß and Preslav Nakov and Tanmoy Chakraborty and Tamer Elsayed and Piotr Przybyła and Tommaso Caselli and Giovanni Da San Martino and Fatima Haouari and Chengkai Li and Jakub Piskorski and Federico Ruggeri and Xingyi Song and Reem Suwaileh},
editor = {Lorraine Goeuriot and Philippe Mulhem and Georges Quénot and Didier Schwab and Laure Soulier and Giorgio Maria Di Nunzio and Petra Galuščáková and Alba García Seco de Herrera and Guglielmo Faggioli and Nicola Ferro},
url = {https://ceur-ws.org/Vol-3740/paper-24.pdf},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction.
Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024)},
abstract = {We present an overview of the CheckThat! Lab 2024 Task 1, part of CLEF 2024. Task 1 involves determining whether a text item is check-worthy, with a special emphasis on COVID-19, political news, and political debates and speeches. It is conducted in three languages: Arabic, Dutch, and English. Additionally, Spanish was offered for extra training data during the development phase. A total of 75 teams registered, with 37 teams submitting 236 runs and 17 teams submitting system description papers. Out of these, 13, 15 and 26 teams participated for Arabic, Dutch and English, respectively. Among these teams, the use of transformer pre-trained language models (PLMs) was the most frequent. A few teams also employed Large Language Models (LLMs). We provide a description of the dataset, the task setup, including evaluation settings, and a brief overview of the participating systems. As is customary in the CheckThat! Lab, we release all the datasets as well as the evaluation scripts to the research community. This will enable further research on identifying relevant check-worthy content that can assist various stakeholders, such as fact-checkers, journalists, and policymakers.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Piskorski, Jakub; Stefanovitch, Nicolas; Alam, Firoj; Campos, Ricardo; Dimitrov, Dimitar; Jorge, Alípio; Pollak, Senja; Ribin, Nikolay; Fijavž, Zoran; Hasanain, Maram; Guimarães, Nuno; Pacheco, Ana Filipa; Sartori, Elisa; Silvano, Purificação; Zwitter, Ana Vitez; Koychev, Ivan; Yu, Nana; Nakov, Preslav; Martino, Giovanni Da San
Overview of the CLEF-2024 CheckThat! Lab Task 3 on Persuasion Techniques Proceedings Article
In: Working Notes of CLEF 2024 – Conference and Labs of the Evaluation Forum, Grenoble, France, 2024.
@inproceedings{clef-checkthat:2024:task3,
title = {Overview of the CLEF-2024 CheckThat! Lab Task 3 on Persuasion Techniques},
author = {Jakub Piskorski and Nicolas Stefanovitch and Firoj Alam and Ricardo Campos and Dimitar Dimitrov and Alípio Jorge and Senja Pollak and Nikolay Ribin and Zoran Fijavž and Maram Hasanain and Nuno Guimarães and Ana Filipa Pacheco and Elisa Sartori and Purificação Silvano and Ana Vitez Zwitter and Ivan Koychev and Nana Yu and Preslav Nakov and Giovanni Da San Martino},
url = {https://ceur-ws.org/Vol-3740/paper-26.pdf},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
booktitle = {Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum},
address = {Grenoble, France},
series = {CLEF~2024},
abstract = {We present an overview of CheckThat! Lab’s 2024 Task 3, which focuses on detecting 23 persuasion techniques
at the text-span level in online media. The task covers five languages, namely, Arabic, Bulgarian, English,
Portuguese, and Slovene, and highly-debated topics in the media, e.g., the Isreali–Palestian conflict, the Russia–
Ukraine war, climate change, COVID-19, abortion, etc. A total of 23 teams registered for the task, and two of
them submitted system responses which were compared against a baseline and a task organizers’ system, which
used a state-of-the-art transformer-based architecture. We provide a description of the dataset and the overall
task setup, including the evaluation methodology, and an overview of the participating systems. The datasets
accompanied with the evaluation scripts are released to the research community, which we believe will foster
research on persuasion technique detection and analysis of online media content in various fields and contexts.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
at the text-span level in online media. The task covers five languages, namely, Arabic, Bulgarian, English,
Portuguese, and Slovene, and highly-debated topics in the media, e.g., the Isreali–Palestian conflict, the Russia–
Ukraine war, climate change, COVID-19, abortion, etc. A total of 23 teams registered for the task, and two of
them submitted system responses which were compared against a baseline and a task organizers’ system, which
used a state-of-the-art transformer-based architecture. We provide a description of the dataset and the overall
task setup, including the evaluation methodology, and an overview of the participating systems. The datasets
accompanied with the evaluation scripts are released to the research community, which we believe will foster
research on persuasion technique detection and analysis of online media content in various fields and contexts.
Gurgun, Selin; Cemiloglu, Deniz; Arden-Close, Emily; Phalp, Keith; Ali, Raian; Nakov, Preslav
Challenging Misinformation on Social Media: Users’ Perceptions and Misperceptions and Their Impact on the Likelihood to Challenge Journal Article
In: Available at SSRN 4600006, 2023.
@article{gurgun4600006challenging,
title = {Challenging Misinformation on Social Media: Users' Perceptions and Misperceptions and Their Impact on the Likelihood to Challenge},
author = {Selin Gurgun and Deniz Cemiloglu and Emily Arden-Close and Keith Phalp and Raian Ali and Preslav Nakov},
url = {https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4600006},
year = {2023},
date = {2023-10-19},
journal = {Available at SSRN 4600006},
abstract = {Despite being an effective way to mitigate the spread of misinformation, people on social media tend to avoid correcting others when they come across misinformation. Users’ perceptions and attitudes regarding challenging misinformation remains an underexplored area. To address this research gap, drawing on data from 250 UK-based social media users, this study aimed to identify the factors that contribute to users’ reluctance to challenge misinformation.The study found that people have misperceptions about the negative consequences of challenging misinformation and the acceptability of the behaviour. The negative consequences were categorized into three categories: relationship consequences (i.e., negative effects on the relationships due to challenging), negative impact on others (i.e., harm caused to others when challenging), and futility (i.e., belief that challenging misinformation is ineffective or pointless). Participants perceived that when they challenge others, those others may view their relationships more negatively compared to when they are challenged by others. Participants also perceive challenging others are more futile than being corrected. attempting to challenge or confront others is seen as less effective or less likely to produce a positive outcome compared to being corrected themselves. Those who believed that others think challenging misinformation is more socially acceptable than themselves were more likely to challenge. Moreover, age, injunctive norms and perceived negative impact on others have an impact on likelihood to challenge.Overall, the study underscores the significance of understanding the role of perceptions and misperceptions in challenging misinformation. Developing features on social media that facilitate challenging misinformation or fostering social norms that endorse it can address these misperceptions. To develop the right approach understanding of users and their motivations is crucial. Our study paves the way for the development of effective user-centric countermeasures by shedding light about user’s attitudes, perceptions and misperceptions. },
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Hasanain, Maram; El-Shangiti, Ahmed; Nandi, Rabindra Nath; Nakov, Preslav; Alam, Firoj
QCRI at SemEval-2023 Task 3: News Genre, Framing and Persuasion Techniques Detection Using Multilingual Models Proceedings Article
In: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pp. 1237–1244, Association for Computational Linguistics, Toronto, Canada, 2023.
@inproceedings{hasanain-etal-2023-qcri,
title = {QCRI at SemEval-2023 Task 3: News Genre, Framing and Persuasion Techniques Detection Using Multilingual Models},
author = {Maram Hasanain and Ahmed El-Shangiti and Rabindra Nath Nandi and Preslav Nakov and Firoj Alam},
url = {https://aclanthology.org/2023.semeval-1.172},
doi = {10.18653/v1/2023.semeval-1.172},
year = {2023},
date = {2023-07-01},
urldate = {2023-07-01},
booktitle = {Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)},
pages = {1237–1244},
publisher = {Association for Computational Linguistics},
address = {Toronto, Canada},
abstract = {Misinformation spreading in mainstream and social media has been misleading users in different ways. Manual detection and verification efforts by journalists and fact-checkers can no longer cope with the great scale and quick spread of misleading information. This motivated research and industry efforts to develop systems for analyzing and verifying news spreading online. The SemEval-2023 Task 3 is an attempt to address several subtasks under this overarching problem, targeting writing techniques used in news articles to affect readers' opinions. The task addressed three subtasks with six languages, in addition to three ``surprise'' test languages, resulting in 27 different test setups. This paper describes our participating system to this task. Our team is one of the 6 teams that successfully submitted runs for all setups. The official results show that our system is ranked among the top 3 systems for 10 out of the 27 setups.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Galassi, Andrea; Ruggeri, Federico; Barrón-Cedeño, Alberto; Alam, Firoj; Caselli, Tommaso; Kutlu, Mucahid; Struss, Julia Maria; Antici, Francesco; Hasanain, Maram; Köhler, Juliane; Korre, Katerina; Leistra, Folkert; Muti, Arianna; Siegel, Melanie; Deniz, Turkmen. Mehmet; Wiegand, Michael; Zaghouani, Wajdi
Overview of the CLEF-2023 CheckThat! Lab Task 2 on Subjectivity in News Articles Proceedings Article
In: Aliannejadi, Mohammad; Faggioli, Guglielmo; Ferro, Nicola; Vlachos,; Michalis, (Ed.): Working Notes of CLEF 2023–Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 2023.
@inproceedings{clef-checkthat:2023:task2,
title = {Overview of the CLEF-2023 CheckThat! Lab Task 2 on Subjectivity in News Articles},
author = {Andrea Galassi and Federico Ruggeri and Alberto Barrón-Cedeño and Firoj Alam and Tommaso Caselli and Mucahid Kutlu and Julia Maria Struss and Francesco Antici and Maram Hasanain and Juliane Köhler and Katerina Korre and Folkert Leistra and Arianna Muti and Melanie Siegel and Turkmen. Mehmet Deniz and Michael Wiegand and Wajdi Zaghouani},
editor = {Mohammad Aliannejadi and Guglielmo Faggioli and Nicola Ferro and Vlachos and Michalis},
url = {https://ceur-ws.org/Vol-3497/paper-020.pdf},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
booktitle = {Working Notes of CLEF 2023–Conference and Labs of the Evaluation Forum},
address = {Thessaloniki, Greece},
series = {CLEF~2023},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Martino, Giovanni Da San; Alam, Firoj; Hasanain, Maram; Nandi, Rabindra Nath; Azizov, Dilshod; Nakov, Preslav
Overview of the CLEF-2023 CheckThat! Lab Task 3 on Political Bias of News Articles and News Media Proceedings Article
In: Aliannejadi, Mohammad; Faggioli, Guglielmo; Ferro, Nicola; Vlachos,; Michalis, (Ed.): Working Notes of CLEF 2023–Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 2023.
@inproceedings{clef-checkthat:2023:task3,
title = {Overview of the CLEF-2023 CheckThat! Lab Task 3 on Political Bias of News Articles and News Media},
author = {Giovanni Da San Martino and Firoj Alam and Maram Hasanain and Rabindra Nath Nandi and Dilshod Azizov and Preslav Nakov},
editor = {Mohammad Aliannejadi and Guglielmo Faggioli and Nicola Ferro and Vlachos and Michalis},
url = {https://ceur-ws.org/Vol-3497/paper-021.pdf},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
booktitle = {Working Notes of CLEF 2023–Conference and Labs of the Evaluation Forum},
address = {Thessaloniki, Greece},
series = {CLEF~2023},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Nakov, Preslav; Alam, Firoj; Martino, Giovanni Da San; Hasanain, Maram; Nandi, Rabindra Nath; Azizov, Dilshod; Panayotov, Panayot
Overview of the CLEF-2023 CheckThat! Lab Task 4 on Factuality of Reporting of News Media Proceedings Article
In: Aliannejadi, Mohammad; Faggioli, Guglielmo; Ferro, Nicola; Vlachos,; Michalis, (Ed.): Working Notes of CLEF 2023–Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 2023.
@inproceedings{clef-checkthat:2023:task4,
title = {Overview of the CLEF-2023 CheckThat! Lab Task 4 on Factuality of Reporting of News Media},
author = {Preslav Nakov and Firoj Alam and Giovanni Da San Martino and Maram Hasanain and Rabindra Nath Nandi and Dilshod Azizov and Panayot Panayotov},
editor = {Mohammad Aliannejadi and Guglielmo Faggioli and Nicola Ferro and Vlachos and Michalis},
url = {https://ceur-ws.org/Vol-3497/paper-022.pdf},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
booktitle = {Working Notes of CLEF 2023–Conference and Labs of the Evaluation Forum},
address = {Thessaloniki, Greece},
series = {CLEF~2023},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Abdelali, Ahmed; Mubarak, Hamdy; Chowdhury, Shammur Absar; Hasanain, Maram; Mousi, Basel; Boughorbel, Sabri; Kheir, Yassine El; Izham, Daniel; Dalvi, Fahim; Hawasly, Majd; others,
Benchmarking arabic ai with large language models Journal Article
In: arXiv preprint arXiv:2305.14982, 2023.
@article{abdelali2023benchmarking,
title = {Benchmarking arabic ai with large language models},
author = {Ahmed Abdelali and Hamdy Mubarak and Shammur Absar Chowdhury and Maram Hasanain and Basel Mousi and Sabri Boughorbel and Yassine El Kheir and Daniel Izham and Fahim Dalvi and Majd Hawasly and others},
url = {https://aclanthology.org/2024.eacl-long.30/},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
journal = {arXiv preprint arXiv:2305.14982},
abstract = {Recent advancements in Large Language Models (LLMs) have significantly influenced the landscape of language and speech research. Despite this progress, these models lack specific benchmarking against state-of-the-art (SOTA) models tailored to particular languages and tasks. LAraBench addresses this gap for Arabic Natural Language Processing (NLP) and Speech Processing tasks, including sequence tagging and content classification across different domains. We utilized models such as GPT-3.5-turbo, GPT-4, BLOOMZ, Jais-13b-chat, Whisper, and USM, employing zero and few-shot learning techniques to tackle 33 distinct tasks across 61 publicly available datasets. This involved 98 experimental setups, encompassing ~296K data points, ~46 hours of speech, and 30 sentences for Text-to-Speech (TTS). This effort resulted in 330+ sets of experiments. Our analysis focused on measuring the performance gap between SOTA models and LLMs. The overarching trend observed was that SOTA models generally outperformed LLMs in zero-shot learning, with a few exceptions. Notably, larger computational models with few-shot learning techniques managed to reduce these performance gaps. Our findings provide valuable insights into the applicability of LLMs for Arabic NLP and speech processing tasks.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Dalvi, Fahim; Hasanain, Maram; Boughorbel, Sabri; Mousi, Basel; Abdaljalil, Samir; Nazar, Nizi; Abdelali, Ahmed; Chowdhury, Shammur Absar; Mubarak, Hamdy; Ali, Ahmed; others,
LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking Journal Article
In: arXiv preprint arXiv:2308.04945, 2023.
@article{dalvi2023llmebench,
title = {LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking},
author = {Fahim Dalvi and Maram Hasanain and Sabri Boughorbel and Basel Mousi and Samir Abdaljalil and Nizi Nazar and Ahmed Abdelali and Shammur Absar Chowdhury and Hamdy Mubarak and Ahmed Ali and others},
url = {https://arxiv.org/pdf/2308.04945},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
journal = {arXiv preprint arXiv:2308.04945},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Noman, Muaadh; Gurgun, Selin; Phalp, Keith; Nakov, Preslav; Ali, Raian
In: Behaviour & Information Technology, pp. 1–21, 2023.
@article{noman2023challengingb,
title = {Challenging others when posting misinformation: a UK vs. Arab cross-cultural comparison on the perception of negative consequences and injunctive norms},
author = {Muaadh Noman and Selin Gurgun and Keith Phalp and Preslav Nakov and Raian Ali},
url = {https://www.tandfonline.com/doi/epdf/10.1080/0144929X.2023.2298306?needAccess=true},
year = {2023},
date = {2023-01-01},
urldate = {2023-01-01},
journal = {Behaviour & Information Technology},
pages = {1–21},
publisher = {Taylor & Francis},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Educational Material
Introduction to Critical Digital Literacy
Download the booklet: Introduction to Critical Digital literacy
Critical digital literacy is essential in today’s world, where the internet and social media are the primary sources of information and communication. As mentioned before, there are many harmful online content that can sway opinions and actions. Hence, fostering critical digital literacy skills is vital in combating the spread of fake news, harmful stereotypes, and divisive narratives. Learn more about it in the attached booklet.
Propaganda
Download the booklet: Propaganda
It is important to learn what propaganda is as a part of Critical Digital Literacy to create a safer online space, where we engage with the digital world critically.
Conferences
Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content
Read the full paper Here
Find the presentation slides: Here
Find the poster Here
Abstract: We present an overview of the second edition of the ArAIEval shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. In this edition, ArAIEval offers two tasks: (i) detection of propagandistic textual spans with persuasion techniques identification in tweets and news articles, and (ii) distinguishing between propagandistic and non-propagandistic memes. A total of 14 teams participated in the final evaluation phase, with 6 and 9 teams participating in Tasks 1 and 2, respectively. Finally, 11 teams submitted system description papers. Across both tasks, we observed that fine-tuning transformer models such as AraBERT was at the core of the majority of the participating systems. We provide a description of the task setup, including a description of the dataset construction and the evaluation setup. We further provide a brief overview of the participating systems. All datasets and evaluation scripts are released to the research community. We hope this will enable further research on these important tasks in Arabic.
Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial Robustness
Read the full paper Here
Find the presentation slides: Here
Abstract: We present an overview of the CheckThat! Lab 2024 Task 1, part of CLEF 2024. Task 1 involves determining whether a text item is check-worthy, with a special emphasis on COVID-19, political news, and political debates
and speeches. It is conducted in three languages: Arabic, Dutch, and English. Additionally, Spanish was offered for extra training data during the development phase. A total of 75 teams registered, with 37 teams submitting 236 runs and 17 teams submitting system description papers. Out of these, 13, 15 and 26 teams participated for Arabic, Dutch and English, respectively. Among these teams, the use of transformer pre-trained language models (PLMs) was the most frequent. A few teams also employed Large Language Models (LLMs). We provide a description of the dataset, the task setup, including evaluation settings, and a brief overview of the participating systems. As is customary in the CheckThat! Lab, we release all the datasets as well as the evaluation scripts to the research community. This will enable further research on identifying relevant check-worthy content that can assist various stakeholders, such as fact-checkers, journalists, and policymakers.
ArMeme: Propagandistic Content in Arabic Memes
Read the full paper here
Find the presentation here
Download the dataset from here
Abstract: With the rise of digital communication memes have become a significant medium for cultural and political expression that is often used to mislead audience. Identification of such misleading and persuasive multimodal content become more important among various stakeholders, including social media platforms, policymakers, and the broader society as they often cause harm to the individuals, organizations and/or society. While there has been effort to develop AI based automatic system for resource rich languages (e.g., English), it is relatively little to none for medium to low resource languages. In this study, we focused on developing an Arabic memes dataset with manual annotations of propagandistic content. We annotated ∼6K Arabic memes collected from various social media platforms, which is a first resource for Arabic multimodal research. We provide a comprehensive analysis aiming to develop computational tools for their detection. We made the dataset publicly available for the community.
Large Language Models for Propaganda Span Annotation
Read the full paper here
Find the poster here
Download the dataset from here
Abstract: The use of propagandistic techniques in online content has increased in recent years aiming to manipulate online audiences. Fine-grained propaganda detection and extraction of textual spans where propaganda techniques are used, are essential for more informed content consumption. Automatic systems targeting the task over lower resourced languages are limited, usually obstructed by lack of large scale training datasets. Our study investigates whether Large Language Models (LLMs), such as GPT-4, can effectively extract propagandistic spans. We further study the potential of employing the model to collect more cost-effective annotations. Finally, we examine the effectiveness of labels provided by GPT-4 in training smaller language models for the task. The experiments are performed over a large-scale in-house manually annotated dataset. The results suggest that providing more annotation context to GPT-4 within prompts improves its performance compared to human annotators. Moreover, when serving as an expert annotator (consolidator), the model provides labels that have higher agreement with expert annotators, and lead to specialized models that achieve state-of-the-art over an unseen Arabic testing set. Finally, our work is the first to show the potential of utilizing LLMs to develop annotated datasets for propagandistic spans detection task prompting it with annotations from human annotators with limited expertise. All scripts and annotations will be shared with the community.
Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles
Read the full paper here
Find the poster here
Download the dataset from here
Abstract: The use of propaganda has spiked on mainstream and social media, aiming to manipulate or mislead users. While efforts to automatically detect propaganda techniques in textual, visual, or multimodal content have increased, most of them primarily focus on English content. The majority of the recent initiatives targeting medium to low-resource languages produced relatively small annotated datasets, with a skewed distribution, posing challenges for the development of sophisticated propaganda detection models. To address this challenge, we carefully develop the largest propaganda dataset to date, ArPro, comprised of 8K paragraphs from newspaper articles, labeled at the text span level following a taxonomy of 23 propagandistic techniques. Furthermore, our work offers the first attempt to understand the performance of large language models (LLMs), using GPT-4, for fine-grained propaganda detection from text. Results showed that GPT-4’s performance degrades as the task moves from simply classifying a paragraph as propagandistic or not, to the fine-grained task of detecting propaganda techniques and their manifestation in text. Compared to models fine-tuned on the dataset for propaganda detection at different classification granularities, GPT-4 is still far behind. Finally, we evaluate GPT-4 on a dataset consisting of six other languages for span detection, and results suggest that the model struggles with the task across languages. We made the dataset publicly available for the community.
Persuasion Techniques and Disinformation Detection in Arabic Text
Find the poster Here
Find the presentation Here
Abstract: We present an overview of CheckThat! Lab’s 2024 Task 3, which focuses on detecting 23 persuasion techniques at the text-span level in online media. The task covers five languages, namely, Arabic, Bulgarian, English, Portuguese, and Slovene, and highly-debated topics in the media, e.g., the Isreali–Palestian conflict, the Russia– Ukraine war, climate change, COVID-19, abortion, etc. A total of 23 teams registered for the task, and two of them submitted system responses which were compared against a baseline and a task organizers’ system, which used a state-of-the-art transformer-based architecture. We provide a description of the dataset and the overall task setup, including the evaluation methodology, and an overview of the participating systems. The datasets accompanied with the evaluation scripts are released to the research community, which we believe will foster research on persuasion technique detection and analysis of online media content in various fields and contexts.
Workshops
Critique What You Read!
Find the presentation here
On the 8th of Sep, 2024, Critical Digital Literacy team and team MARSAD (sp#1), and in collaboration with QNL, held a public workshop to empower people to critique what they read. The workshop focused on the ways we can improve our consumption of news and online content, and empowered them with ways and tools to verify news and identify possible use of propagandistic techniques.
ArMeme
Download the dataset here
Read the paper here
ArMeme is the first multimodal Arabic memes dataset that includes both text and images, collected from various social media platforms. It serves as the first resource dedicated to Arabic multimodal research. While the dataset has been annotated to identify propaganda in memes, it is versatile and can be utilized for a wide range of other research purposes, including sentiment analysis, hate speech detection, cultural studies, meme generation, and cross-lingual transfer learning. The dataset opens new avenues for exploring the intersection of language, culture, and visual communication.
LLM_Propaganda Annotation
Download the dataset here
Read the paper here
Our study investigates whether large language models (LLMs), such as GPT-4, can effectively extract propagandistic spans. We further study the potential of employing the model to collect more cost-effective annotations. Finally, we examine the effectiveness of labels provided by GPT-4 in training smaller language models for the task. In this repo we release full human annotations, consolidated gold labels, and annotations provided by GPT-4 in different annotator roles.