The Perspective Argument Retrieval Shared Task


August 15, 2024

Co-located with The 11th Workshop on Argument Mining in Bangkok, Thailand


Latest News:

Leaderboard(s) for the first evaluation round are out! Deadline for the final submission (all test sets, all scenarios) is moved by one week, to the 17th May 2024.

Registration and more:

Registration

You can register for the shared task using this registration form

Data Download

You can download the data and evaluation script from this repository

Communication

Join our slack channel for any information or to ask questions here

Or write us an e-mail perspectiveargretrieval@gmail.com


About

The "Perspective Argument Retrieval" task addresses the often-overlooked challenge of incorporating socio-cultural factors (such as political views, occupation, age, gender) in argument retrieval. By focusing on these aspects, we acknowledge their potential latent influence on argumentation.

With this shared task, we invite the community to develop methods that concentrate on this crucial area and advance state-of-the-art retrieval models by considering the perspective of societal diversity.

Task Description

Argument retrieval is the task of retrieving a set of top-k relevant arguments out of a corpus given a specific query. With this shared task, we formulate perspective argument retrieval as an expansion of argument retrieval considering sociocultural factors. Concretely, this task proposes three scenarios of varying difficulties to considering socio-cultural profiles. Therefore, we want to foster approaches taking into account latent aspects of argumentation beyond semantic features, such as personal attitude. We consider these aspects both during retrieval and evaluation.

Retrieval Scenarios

Scenario 1: Baseline retrieval of relevant arguments given a specific query from a given corpus. Therefore, we evaluate the general abilities of a system to retrieve relevant arguments.

Example query: Are you in favor of the introduction of a tax on foods containing sugar (sugar tax)?


Example relevant candidates: The reduction of sugar in food should be pushed. Not every food needs additional sugar as a supplement.


Scenario 2: Explicit perspectivism extends the baseline task by explicitly adding socio-cultural information to the query and the corpus and limiting the relevant candidates to arguments from authors matching the corresponding socio-cultural background. With this second scenario, we test whether a retrieval system can consider socio-cultural properties when explicitly mentioned in the query and the candidates.

Example query: Given a left attitude, are you in favor of the introduction of a tax on foods containing sugar (sugar tax)?

Example relevant candidates: With a left attitude, reducing sugar in food should be pushed. Not every food needs additional sugar as a supplement.


Scenario 3: Implicit perspectivism This scenario is similar to explicit perspectivism, but we only add socio-cultural information to the query. Therefore, we test the ability of a retrieval system to account for latently encoded socio-cultural information within the argument.

Example query: Given a liberal attitude, are you in favor of the introduction of a tax on foods containing sugar (sugar tax)?


Example relevant candidates: Eating is an individual decision. It doesn't need a nanny state.

Retrieval Evaluation

We employ for all three scenarios a two-folded evaluation for a comprehensive measure of the retrieval quality. Concretely, we distinguish between relevance and diversity / fairness:



With relevance, we focus on the ability of a retrieval system to select the relevant candidates, for example, all arguments addressing the queried issue for the baseline scenario or arguments that additionally match specific demographic properties for explicit or implicit perspectivism.


Using diversity, we account for the influence of perspectivism in the evaluation by measuring to what extend a retrieval system retrieval system diversifies the relevant arguments regarding stance distribution or other socio-cultural factors, such as age or education.

Data

The data and evaluation script can be downloaded from this repository

This shared task is grounded on the x-stance dataset (Vamvas & Sennrich, 2020), providing arguments annotated with their stance regarding different political issues gathered from the voting recommendation platform https://www.smartvote.ch/. This platform provides voting suggestions based on a questionnaire that politicians and voters fill out. Therein, politicians can argue why they are in favor or against specific political issues.


We use the arguments covering the 2019 Swiss Federal elections as a corpus and the political issues as queries. Afterward, we enrich these arguments with eight socio-cultural properties, either provided by the voting platform itself (gender, age, party, …) or derived from the filled-out questionnaire of the politicians (political attitude, important political aspects, …). This collection encapsulates 26,335 arguments for 45 political aspects from German, French, and Italian.


We generate the train and development splits by considering 35 political aspects for training and 10 for development, while the argument corpus is used for both sets. Apart from the queries for the baseline scenario, we will also provide queries for the perspectivism scenarios, including socio-cultural information. As the x-stance dataset is publicly available, final evaluation data consist of secret test sets.

Submission Policy

You may use any external data source for pre-training your models. However, we do not accept submissions using proprietary LLMs (e.g., GPT-4). Please do not input any of the data into the Chat-GPT online interface to avoid data leakage. You are allowed to use any open-source LLMs. Have a look at this website for a list of frequently used open-source LLMs in case you want to use those.

There are three test sets for the evaluation of the shared task. The first test set is taken from the election of 2019, the second from the year 2023 and the third is a suprise test set. The final evaluation will be on the 8th of May. You can submit all final predictions until the 7th of May, 11.59 pm UTC -12h (“anywhere on Earth”). If you upload predictions before you will see the results of those predictions on the leaderboard (once on the 24th of April and once on the 30th of April). You can change the predictions for all test sets, all scenarios, until the final deadline. You can also submit partial results, however keep in mind that for the final ranking, all predictions will be considered. (average across all test sets and scenarios).

Important Dates

All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”).

Leaderboards

First Evaluation Circle (Test set 1, election 2019)

Relevance - Scenario Baseline

k ndcg@k precision@k team team_rank
4 0.990189 0.988889 sbert_baseline 1
4 0.990189 0.988889 GESIS-DSM 1
4 0.982115 0.983333 Twente-BMS-NLP 2
4 0.716184 0.683333 bm25_baseline 3
8 0.987593 0.986111 sbert_baseline 1
8 0.987593 0.986111 GESIS-DSM 1
8 0.986639 0.988889 Twente-BMS-NLP 2
8 0.671677 0.636111 bm25_baseline 3
16 0.988446 0.990278 Twente-BMS-NLP 1
16 0.983255 0.980556 sbert_baseline 2
16 0.983255 0.980556 GESIS-DSM 2
16 0.619426 0.579167 bm25_baseline 3
20 0.988492 0.990000 Twente-BMS-NLP 1
20 0.981093 0.977778 sbert_baseline 2
20 0.981093 0.977778 GESIS-DSM 2
20 0.596877 0.553333 bm25_baseline 3
avg 0.986423 0.988125 Twente-BMS-NLP 1
avg 0.985532 0.983333 sbert_baseline 2
avg 0.985532 0.983333 GESIS-DSM 2
avg 0.651041 0.612986 bm25_baseline 3

Diversity - Scenario Baseline

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.901997 0.154536 GESIS-DSM 1 political_spectrum Open Foreign Policy
4 0.901690 0.155147 sbert_baseline 2 political_spectrum Open Foreign Policy
4 0.880867 0.174923 Twente-BMS-NLP 3 political_spectrum Open Foreign Policy
4 0.672682 0.152301 bm25_baseline 4 political_spectrum gender
8 0.908962 0.139420 GESIS-DSM 1 political_spectrum gender
8 0.908762 0.139904 sbert_baseline 2 political_spectrum gender
8 0.899820 0.158746 Twente-BMS-NLP 3 political_spectrum gender
8 0.643187 0.136052 bm25_baseline 4 political_spectrum gender
16 0.924070 0.106170 GESIS-DSM 1 education gender
16 0.923998 0.106429 sbert_baseline 2 education gender
16 0.923574 0.123654 Twente-BMS-NLP 3 education gender
16 0.608648 0.102753 bm25_baseline 4 education gender
20 0.931639 0.113165 Twente-BMS-NLP 1 education gender
20 0.929635 0.097030 GESIS-DSM 2 education gender
20 0.929557 0.097267 sbert_baseline 3 education gender
20 0.592760 0.093548 bm25_baseline 4 education gender
avg 0.916166 0.124289 GESIS-DSM 1 education gender
avg 0.916002 0.124687 sbert_baseline 2 education gender
avg 0.908975 0.142622 Twente-BMS-NLP 3 political_spectrum Open Foreign Policy
avg 0.629319 0.121164 bm25_baseline 4 education gender
socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Explicit

k ndcg@k precision@k team team_rank
4 0.853129 0.784139 Twente-BMS-NLP 1
4 0.217063 0.216921 GESIS-DSM 2
4 0.210929 0.209182 sbert_baseline 3
8 0.826973 0.699215 Twente-BMS-NLP 1
8 0.220313 0.215066 GESIS-DSM 2
8 0.210107 0.206796 sbert_baseline 3
16 0.806304 0.613947 Twente-BMS-NLP 1
16 0.219608 0.206292 GESIS-DSM 2
16 0.211977 0.205126 sbert_baseline 3
20 0.798233 0.584441 Twente-BMS-NLP 1
20 0.220142 0.204156 GESIS-DSM 2
20 0.211292 0.201972 sbert_baseline 3
avg 0.821159 0.670436 Twente-BMS-NLP 1
avg 0.219281 0.210609 GESIS-DSM 2
avg 0.211076 0.205769 sbert_baseline 3

Diversity - Scenario Explicit

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.803502 0.201835 Twente-BMS-NLP 1 age Open Foreign Policy
4 0.198266 0.161001 GESIS-DSM 2 political_spectrum Open Foreign Policy
4 0.193978 0.169943 sbert_baseline 3 political_spectrum gender
8 0.795288 0.188889 Twente-BMS-NLP 1 political_spectrum Open Foreign Policy
8 0.203975 0.145292 GESIS-DSM 2 political_spectrum gender
8 0.196950 0.153803 sbert_baseline 3 political_spectrum gender
16 0.788913 0.161965 Twente-BMS-NLP 1 education gender
16 0.207447 0.112282 GESIS-DSM 2 education gender
16 0.202131 0.120015 sbert_baseline 3 education gender
20 0.784744 0.154346 Twente-BMS-NLP 1 education gender
20 0.208944 0.103003 GESIS-DSM 2 education gender
20 0.202647 0.110547 sbert_baseline 3 education gender
avg 0.793112 0.176759 Twente-BMS-NLP 1 political_spectrum Open Foreign Policy
avg 0.204658 0.130394 GESIS-DSM 2 political_spectrum gender
avg 0.198927 0.138577 sbert_baseline 3 education gender
socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Implicit

k ndcg@k precision@k team team_rank
4 0.197831 0.198685 GESIS-DSM 1
4 0.195035 0.196141 sbert_baseline 2
8 0.199804 0.200594 sbert_baseline 1
8 0.198524 0.197360 GESIS-DSM 2
16 0.205488 0.201150 GESIS-DSM 1
16 0.205072 0.202502 sbert_baseline 2
20 0.208600 0.204114 sbert_baseline 1
20 0.207223 0.201124 GESIS-DSM 2
avg 0.202267 0.199580 GESIS-DSM 1
avg 0.202128 0.200838 sbert_baseline 2

Diversity - Scenario Implicit

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.182819 0.160637 GESIS-DSM 1 political_spectrum gender
4 0.179801 0.155224 sbert_baseline 2 political_spectrum gender
8 0.186279 0.144410 GESIS-DSM 1 political_spectrum gender
8 0.186115 0.139378 sbert_baseline 2 political_spectrum gender
16 0.194678 0.110474 GESIS-DSM 1 education gender
16 0.193679 0.106891 sbert_baseline 2 education gender
20 0.197447 0.097982 sbert_baseline 1 education gender
20 0.197116 0.100929 GESIS-DSM 2 education gender
avg 0.190223 0.129112 GESIS-DSM 1 political_spectrum gender
avg 0.189261 0.124869 sbert_baseline 2 education gender

Second Evaluation Circle (Test set 2, election 2023)

Relevance - Scenario Baseline

  k ndcg@k precision@k team team_rank
4 0.903527 0.912500 boulder_NLP 1
4 0.883713 0.887500 sbert_baseline 2
4 0.851954 0.850000 GESIS-DSM 3
4 0.775601 0.775000 bm_35_baseline 4
4 0.768397 0.781250 TWENTE-BMS-NLP 5
8 0.892774 0.893750 boulder_NLP 1
8 0.863010 0.856250 sbert_baseline 2
8 0.840808 0.834375 GESIS-DSM 3
8 0.768952 0.775000 TWENTE-BMS-NLP 4
8 0.764368 0.759375 bm_35_baseline 5
16 0.874400 0.867188 boulder_NLP 1
16 0.837641 0.823438 sbert_baseline 2
16 0.811359 0.795312 GESIS-DSM 3
16 0.755486 0.751563 TWENTE-BMS-NLP 4
16 0.717200 0.693750 bm_35_baseline 5
20 0.869493 0.861250 boulder_NLP 1
20 0.836021 0.823750 sbert_baseline 2
20 0.803148 0.786250 GESIS-DSM 3
20 0.755566 0.752500 TWENTE-BMS-NLP 4
20 0.692677 0.661250 bm_35_baseline 5
avg 0.885048 0.883672 boulder_NLP 1
avg 0.855096 0.847734 sbert_baseline 2
avg 0.826817 0.816484 GESIS-DSM 3
avg 0.762100 0.765078 TWENTE-BMS-NLP 4
avg 0.737462 0.722344 bm_35_baseline 5

Diversity - Scenario Baseline

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.843193 0.190554 boulder_NLP 1 gender Open Foreign Policy
4 0.825927 0.183359 sbert_baseline 2 gender Liberal Society
4 0.802393 0.189029 GESIS-DSM 3 gender Expanded Welfare State
4 0.729309 0.194654 bm_35_baseline 4 gender Expanded Welfare State
4 0.715027 0.175067 TWENTE-BMS-NLP 5 gender Enhanced Environmental Protection
8 0.835625 0.184543 boulder_NLP 1 civil_status Open Foreign Policy
8 0.810864 0.175428 sbert_baseline 2 civil_status Liberal Society
8 0.793244 0.181590 GESIS-DSM 3 civil_status Expanded Welfare State
8 0.720015 0.186390 bm_35_baseline 4 civil_status Expanded Welfare State
8 0.717539 0.167862 TWENTE-BMS-NLP 5 civil_status Enhanced Environmental Protection
16 0.835437 0.170560 boulder_NLP 1 education Open Foreign Policy
16 0.803294 0.158082 sbert_baseline 2 education Liberal Society
16 0.780678 0.163216 GESIS-DSM 3 education Expanded Welfare State
16 0.719560 0.150388 TWENTE-BMS-NLP 4 education Enhanced Environmental Protection
16 0.693180 0.163362 bm_35_baseline 5 education Expanded Welfare State
20 0.836555 0.166120 boulder_NLP 1 education Open Foreign Policy
20 0.806363 0.153383 sbert_baseline 2 education Liberal Society
20 0.778455 0.157177 GESIS-DSM 3 education Expanded Welfare State
20 0.724297 0.144618 TWENTE-BMS-NLP 4 education Enhanced Environmental Protection
20 0.676614 0.154637 bm_35_baseline 5 education Expanded Welfare State
avg 0.837703 0.177944 boulder_NLP 1 civil_status Open Foreign Policy
avg 0.811612 0.167563 sbert_baseline 2 civil_status Liberal Society
avg 0.788693 0.172753 GESIS-DSM 3 civil_status Enhanced Environmental Protection
avg 0.719106 0.159484 TWENTE-BMS-NLP 4 civil_status Enhanced Environmental Protection
avg 0.704779 0.174761 bm_35_baseline 5 civil_status Enhanced Environmental Protection

Relevance - Scenario Explicit

k ndcg@k precision@k team team_rank
4 0.778944 0.693322 sövereign 1
4 0.746718 0.664001 TWENTE-BMS-NLP 2
4 0.719577 0.634820 GESIS-DSM 3
4 0.152074 0.149411 sbert_baseline 4
8 0.753616 0.601010 sövereign 1
8 0.717243 0.567691 TWENTE-BMS-NLP 2
8 0.688722 0.539913 GESIS-DSM 3
8 0.147092 0.140853 sbert_baseline 4
16 0.723055 0.494353 sövereign 1
16 0.688206 0.466716 TWENTE-BMS-NLP 2
16 0.661889 0.445391 GESIS-DSM 3
16 0.145617 0.134961 sbert_baseline 4
20 0.713903 0.457632 sövereign 1
20 0.680463 0.433698 TWENTE-BMS-NLP 2
20 0.654631 0.413215 GESIS-DSM 3
20 0.145981 0.133109 sbert_baseline 4
avg 0.742379 0.561579 sövereign 1
avg 0.708157 0.533026 TWENTE-BMS-NLP 2
avg 0.681205 0.508335 GESIS-DSM 3
avg 0.147691 0.139583 sbert_baseline 4

Diversity - Scenario Explicit

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.762587 0.204173 sövereign 1 civil_status Enhanced Environmental Protection
4 0.731383 0.201999 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
4 0.708314 0.201885 GESIS-DSM 3 civil_status Enhanced Environmental Protection
4 0.148394 0.188384 sbert_baseline 4 civil_status Open Foreign Policy
8 0.742925 0.195974 sövereign 1 political_spectrum Enhanced Environmental Protection
8 0.707982 0.193850 TWENTE-BMS-NLP 2 political_spectrum Enhanced Environmental Protection
8 0.683388 0.193435 GESIS-DSM 3 political_spectrum Enhanced Environmental Protection
8 0.144145 0.180716 sbert_baseline 4 political_spectrum Open Foreign Policy
16 0.720318 0.176849 sövereign 1 political_spectrum Enhanced Environmental Protection
16 0.685955 0.174981 TWENTE-BMS-NLP 2 education Enhanced Environmental Protection
16 0.662275 0.174012 GESIS-DSM 3 education Enhanced Environmental Protection
16 0.143560 0.163211 sbert_baseline 4 political_spectrum Open Foreign Policy
20 0.712741 0.170729 sövereign 1 political_spectrum Enhanced Environmental Protection
20 0.679465 0.169039 TWENTE-BMS-NLP 2 political_spectrum Enhanced Environmental Protection
20 0.655853 0.167952 GESIS-DSM 3 education Enhanced Environmental Protection
20 0.143953 0.157795 sbert_baseline 4 political_spectrum Open Foreign Policy
avg 0.734643 0.186931 sövereign 1 political_spectrum Enhanced Environmental Protection
avg 0.701196 0.184967 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
avg 0.677457 0.184321 GESIS-DSM 3 civil_status Enhanced Environmental Protection
avg 0.145013 0.172527 sbert_baseline 4 civil_status Open Foreign Policy
socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Implicit

k ndcg@k precision@k team team_rank
4 0.138857 0.134961 sbert_baseline 1
4 0.135854 0.135943 TWENTE-BMS-NLP 2
4 0.133892 0.133418 GESIS-DSM 3
8 0.135941 0.131453 sbert_baseline 1
8 0.133384 0.131594 TWENTE-BMS-NLP 2
8 0.131639 0.127736 GESIS-DSM 3
16 0.134325 0.126578 sbert_baseline 1
16 0.134060 0.127666 TWENTE-BMS-NLP 2
16 0.131696 0.123422 GESIS-DSM 3
20 0.135666 0.127301 TWENTE-BMS-NLP 1
20 0.134688 0.124804 sbert_baseline 2
20 0.132429 0.122447 GESIS-DSM 3
avg 0.135953 0.129449 sbert_baseline 1
avg 0.134741 0.130626 TWENTE-BMS-NLP 2
avg 0.132414 0.126755 GESIS-DSM 3

Diversity - Scenario Implicit

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.136044 0.187736 sbert_baseline 1 civil_status Liberal Society
4 0.130388 0.187460 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
4 0.128952 0.191469 GESIS-DSM 3 civil_status Expanded Welfare State
8 0.133246 0.179940 sbert_baseline 1 civil_status Liberal Economic Policy
8 0.128856 0.180906 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
8 0.127276 0.183187 GESIS-DSM 3 civil_status Expanded Welfare State
16 0.132503 0.162401 sbert_baseline 1 civil_status Liberal Economic Policy
16 0.130391 0.166958 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
16 0.128243 0.164174 GESIS-DSM 3 civil_status Expanded Welfare State
20 0.132824 0.156980 sbert_baseline 1 civil_status Liberal Economic Policy
20 0.131905 0.163094 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
20 0.129197 0.158113 GESIS-DSM 3 civil_status Expanded Welfare State
avg 0.133654 0.171764 sbert_baseline 1 civil_status Liberal Economic Policy
avg 0.130385 0.174605 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
avg 0.128417 0.174236 GESIS-DSM 3 civil_status Enhanced Environmental Protection
socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Organizing Committee

References

Policy

We abide by the ACL anti-harassment policy.