The Perspective Argument Retrieval Shared Task


August 15, 2024

Co-located with The 11th Workshop on Argument Mining in Bangkok, Thailand


Latest News:

Last chance for participation: Deadline for the final submission is on the 17th of May, 11.59 pm UTC -12h (“anywhere on Earth”). Final results will be posted soon after submission deadline. Submission Link for the System Description Paper: OpenReview

Registration and more:

Registration

You can register for the shared task using this registration form

Data Download

You can download the data and evaluation script from this repository

Communication

Join our slack channel for any information or to ask questions here

Or write us an e-mail perspectiveargretrieval@gmail.com


About

The "Perspective Argument Retrieval" task addresses the often-overlooked challenge of incorporating socio-cultural factors (such as political views, occupation, age, gender) in argument retrieval. By focusing on these aspects, we acknowledge their potential latent influence on argumentation.

With this shared task, we invite the community to develop methods that concentrate on this crucial area and advance state-of-the-art retrieval models by considering the perspective of societal diversity.

Task Description

Argument retrieval is the task of retrieving a set of top-k relevant arguments out of a corpus given a specific query. With this shared task, we formulate perspective argument retrieval as an expansion of argument retrieval considering sociocultural factors. Concretely, this task proposes three scenarios of varying difficulties to considering socio-cultural profiles. Therefore, we want to foster approaches taking into account latent aspects of argumentation beyond semantic features, such as personal attitude. We consider these aspects both during retrieval and evaluation.

Retrieval Scenarios

Scenario 1: Baseline retrieval of relevant arguments given a specific query from a given corpus. Therefore, we evaluate the general abilities of a system to retrieve relevant arguments.

Example query: Are you in favor of the introduction of a tax on foods containing sugar (sugar tax)?


Example relevant candidates: The reduction of sugar in food should be pushed. Not every food needs additional sugar as a supplement.


Scenario 2: Explicit perspectivism extends the baseline task by explicitly adding socio-cultural information to the query and the corpus and limiting the relevant candidates to arguments from authors matching the corresponding socio-cultural background. With this second scenario, we test whether a retrieval system can consider socio-cultural properties when explicitly mentioned in the query and the candidates.

Example query: Given a left attitude, are you in favor of the introduction of a tax on foods containing sugar (sugar tax)?

Example relevant candidates: With a left attitude, reducing sugar in food should be pushed. Not every food needs additional sugar as a supplement.


Scenario 3: Implicit perspectivism This scenario is similar to explicit perspectivism, but we only add socio-cultural information to the query. Therefore, we test the ability of a retrieval system to account for latently encoded socio-cultural information within the argument.

Example query: Given a liberal attitude, are you in favor of the introduction of a tax on foods containing sugar (sugar tax)?


Example relevant candidates: Eating is an individual decision. It doesn't need a nanny state.

Retrieval Evaluation

We employ for all three scenarios a two-folded evaluation for a comprehensive measure of the retrieval quality. Concretely, we distinguish between relevance and diversity / fairness:



With relevance, we focus on the ability of a retrieval system to select the relevant candidates, for example, all arguments addressing the queried issue for the baseline scenario or arguments that additionally match specific demographic properties for explicit or implicit perspectivism.


Using diversity, we account for the influence of perspectivism in the evaluation by measuring to what extend a retrieval system retrieval system diversifies the relevant arguments regarding stance distribution or other socio-cultural factors, such as age or education.

Data

The data and evaluation script can be downloaded from this repository

This shared task is grounded on the x-stance dataset (Vamvas & Sennrich, 2020), providing arguments annotated with their stance regarding different political issues gathered from the voting recommendation platform https://www.smartvote.ch/. This platform provides voting suggestions based on a questionnaire that politicians and voters fill out. Therein, politicians can argue why they are in favor or against specific political issues.


We use the arguments covering the 2019 Swiss Federal elections as a corpus and the political issues as queries. Afterward, we enrich these arguments with eight socio-cultural properties, either provided by the voting platform itself (gender, age, party, …) or derived from the filled-out questionnaire of the politicians (political attitude, important political aspects, …). This collection encapsulates 26,335 arguments for 45 political aspects from German, French, and Italian.


We generate the train and development splits by considering 35 political aspects for training and 10 for development, while the argument corpus is used for both sets. Apart from the queries for the baseline scenario, we will also provide queries for the perspectivism scenarios, including socio-cultural information. As the x-stance dataset is publicly available, final evaluation data consist of secret test sets.

Submission Policy

You may use any external data source for pre-training your models. However, we do not accept submissions using proprietary LLMs (e.g., GPT-4). Please do not input any of the data into the Chat-GPT online interface to avoid data leakage. You are allowed to use any open-source LLMs. Have a look at this website for a list of frequently used open-source LLMs in case you want to use those.

There are three test sets for the evaluation of the shared task. The first test set is taken from the election of 2019, the second from the year 2023 and the third is a suprise test set. The final evaluation will be on the 8th of May. You can submit all final predictions until the 7th of May, 11.59 pm UTC -12h (“anywhere on Earth”). If you upload predictions before you will see the results of those predictions on the leaderboard (once on the 24th of April and once on the 30th of April). You can change the predictions for all test sets, all scenarios, until the final deadline. You can also submit partial results, however keep in mind that for the final ranking, all predictions will be considered. (average across all test sets and scenarios).

Important Dates

All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”).

System Description Papers

Any system that has submitted runs, including partial ones, is eligible to submit a system description paper. Upon acceptance, the paper will undergo peer review and be published in the Proceedings of the Argument Mining Workshop at ACL. Note that registration for the workshop is not required for publication. However, if you do register, you will have the opportunity to present your paper at the workshop, either in person or virtually. For guidance on the content of a system description paper, please refer to the SemEval Shared Tasks guidelines: https://semeval.github.io/paper-requirements.html (see "contents").

Page Limitation

Papers submitted for the Shared Task should not exceed 4 pages, plus a maximum of 1 page for Ethical Considerations, and unlimited pages for references and Appendices. Upon acceptance, authors will be granted an additional page to address reviewers' comments.

Reviewing

We are using the platform OpenReview for the submission and reviewing process. Please register to the platform if you have no account there yet. If you are submitting a paper you are also asked to review other shared task papers. If you have an account already, please send an e-mail with your e-mail that is registered at OpenReview, such that we can already register you as a reviewer. Submission Link for the System Description Paper: OpenReview

Leaderboards

First Evaluation Circle (Test set 1, election 2019)

Relevance - Scenario Baseline

k ndcg@k precision@k team team_rank
4 0.990189 0.988889 sbert_baseline 1
4 0.990189 0.988889 GESIS-DSM 1
4 0.982115 0.983333 Twente-BMS-NLP 2
4 0.716184 0.683333 bm25_baseline 3
8 0.987593 0.986111 sbert_baseline 1
8 0.987593 0.986111 GESIS-DSM 1
8 0.986639 0.988889 Twente-BMS-NLP 2
8 0.671677 0.636111 bm25_baseline 3
16 0.988446 0.990278 Twente-BMS-NLP 1
16 0.983255 0.980556 sbert_baseline 2
16 0.983255 0.980556 GESIS-DSM 2
16 0.619426 0.579167 bm25_baseline 3
20 0.988492 0.990000 Twente-BMS-NLP 1
20 0.981093 0.977778 sbert_baseline 2
20 0.981093 0.977778 GESIS-DSM 2
20 0.596877 0.553333 bm25_baseline 3
avg 0.986423 0.988125 Twente-BMS-NLP 1
avg 0.985532 0.983333 sbert_baseline 2
avg 0.985532 0.983333 GESIS-DSM 2
avg 0.651041 0.612986 bm25_baseline 3

Diversity - Scenario Baseline

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.901997 0.154536 GESIS-DSM 1 political_spectrum Open Foreign Policy
4 0.901690 0.155147 sbert_baseline 2 political_spectrum Open Foreign Policy
4 0.880867 0.174923 Twente-BMS-NLP 3 political_spectrum Open Foreign Policy
4 0.672682 0.152301 bm25_baseline 4 political_spectrum gender
8 0.908962 0.139420 GESIS-DSM 1 political_spectrum gender
8 0.908762 0.139904 sbert_baseline 2 political_spectrum gender
8 0.899820 0.158746 Twente-BMS-NLP 3 political_spectrum gender
8 0.643187 0.136052 bm25_baseline 4 political_spectrum gender
16 0.924070 0.106170 GESIS-DSM 1 education gender
16 0.923998 0.106429 sbert_baseline 2 education gender
16 0.923574 0.123654 Twente-BMS-NLP 3 education gender
16 0.608648 0.102753 bm25_baseline 4 education gender
20 0.931639 0.113165 Twente-BMS-NLP 1 education gender
20 0.929635 0.097030 GESIS-DSM 2 education gender
20 0.929557 0.097267 sbert_baseline 3 education gender
20 0.592760 0.093548 bm25_baseline 4 education gender
avg 0.916166 0.124289 GESIS-DSM 1 education gender
avg 0.916002 0.124687 sbert_baseline 2 education gender
avg 0.908975 0.142622 Twente-BMS-NLP 3 political_spectrum Open Foreign Policy
avg 0.629319 0.121164 bm25_baseline 4 education gender
socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Explicit

k ndcg@k precision@k team team_rank
4 0.853129 0.784139 Twente-BMS-NLP 1
4 0.217063 0.216921 GESIS-DSM 2
4 0.210929 0.209182 sbert_baseline 3
8 0.826973 0.699215 Twente-BMS-NLP 1
8 0.220313 0.215066 GESIS-DSM 2
8 0.210107 0.206796 sbert_baseline 3
16 0.806304 0.613947 Twente-BMS-NLP 1
16 0.219608 0.206292 GESIS-DSM 2
16 0.211977 0.205126 sbert_baseline 3
20 0.798233 0.584441 Twente-BMS-NLP 1
20 0.220142 0.204156 GESIS-DSM 2
20 0.211292 0.201972 sbert_baseline 3
avg 0.821159 0.670436 Twente-BMS-NLP 1
avg 0.219281 0.210609 GESIS-DSM 2
avg 0.211076 0.205769 sbert_baseline 3

Diversity - Scenario Explicit

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.803502 0.201835 Twente-BMS-NLP 1 age Open Foreign Policy
4 0.198266 0.161001 GESIS-DSM 2 political_spectrum Open Foreign Policy
4 0.193978 0.169943 sbert_baseline 3 political_spectrum gender
8 0.795288 0.188889 Twente-BMS-NLP 1 political_spectrum Open Foreign Policy
8 0.203975 0.145292 GESIS-DSM 2 political_spectrum gender
8 0.196950 0.153803 sbert_baseline 3 political_spectrum gender
16 0.788913 0.161965 Twente-BMS-NLP 1 education gender
16 0.207447 0.112282 GESIS-DSM 2 education gender
16 0.202131 0.120015 sbert_baseline 3 education gender
20 0.784744 0.154346 Twente-BMS-NLP 1 education gender
20 0.208944 0.103003 GESIS-DSM 2 education gender
20 0.202647 0.110547 sbert_baseline 3 education gender
avg 0.793112 0.176759 Twente-BMS-NLP 1 political_spectrum Open Foreign Policy
avg 0.204658 0.130394 GESIS-DSM 2 political_spectrum gender
avg 0.198927 0.138577 sbert_baseline 3 education gender
socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Implicit

k ndcg@k precision@k team team_rank
4 0.197831 0.198685 GESIS-DSM 1
4 0.195035 0.196141 sbert_baseline 2
8 0.199804 0.200594 sbert_baseline 1
8 0.198524 0.197360 GESIS-DSM 2
16 0.205488 0.201150 GESIS-DSM 1
16 0.205072 0.202502 sbert_baseline 2
20 0.208600 0.204114 sbert_baseline 1
20 0.207223 0.201124 GESIS-DSM 2
avg 0.202267 0.199580 GESIS-DSM 1
avg 0.202128 0.200838 sbert_baseline 2

Diversity - Scenario Implicit

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.182819 0.160637 GESIS-DSM 1 political_spectrum gender
4 0.179801 0.155224 sbert_baseline 2 political_spectrum gender
8 0.186279 0.144410 GESIS-DSM 1 political_spectrum gender
8 0.186115 0.139378 sbert_baseline 2 political_spectrum gender
16 0.194678 0.110474 GESIS-DSM 1 education gender
16 0.193679 0.106891 sbert_baseline 2 education gender
20 0.197447 0.097982 sbert_baseline 1 education gender
20 0.197116 0.100929 GESIS-DSM 2 education gender
avg 0.190223 0.129112 GESIS-DSM 1 political_spectrum gender
avg 0.189261 0.124869 sbert_baseline 2 education gender

Second Evaluation Circle (Test set 2, election 2023)

Relevance - Scenario Baseline

  k ndcg@k precision@k team team_rank
4 0.903527 0.912500 boulder_NLP 1
4 0.883713 0.887500 sbert_baseline 2
4 0.851954 0.850000 GESIS-DSM 3
4 0.775601 0.775000 bm_35_baseline 4
4 0.768397 0.781250 TWENTE-BMS-NLP 5
8 0.892774 0.893750 boulder_NLP 1
8 0.863010 0.856250 sbert_baseline 2
8 0.840808 0.834375 GESIS-DSM 3
8 0.768952 0.775000 TWENTE-BMS-NLP 4
8 0.764368 0.759375 bm_35_baseline 5
16 0.874400 0.867188 boulder_NLP 1
16 0.837641 0.823438 sbert_baseline 2
16 0.811359 0.795312 GESIS-DSM 3
16 0.755486 0.751563 TWENTE-BMS-NLP 4
16 0.717200 0.693750 bm_35_baseline 5
20 0.869493 0.861250 boulder_NLP 1
20 0.836021 0.823750 sbert_baseline 2
20 0.803148 0.786250 GESIS-DSM 3
20 0.755566 0.752500 TWENTE-BMS-NLP 4
20 0.692677 0.661250 bm_35_baseline 5
avg 0.885048 0.883672 boulder_NLP 1
avg 0.855096 0.847734 sbert_baseline 2
avg 0.826817 0.816484 GESIS-DSM 3
avg 0.762100 0.765078 TWENTE-BMS-NLP 4
avg 0.737462 0.722344 bm_35_baseline 5

Diversity - Scenario Baseline

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.843193 0.190554 boulder_NLP 1 gender Open Foreign Policy
4 0.825927 0.183359 sbert_baseline 2 gender Liberal Society
4 0.802393 0.189029 GESIS-DSM 3 gender Expanded Welfare State
4 0.729309 0.194654 bm_35_baseline 4 gender Expanded Welfare State
4 0.715027 0.175067 TWENTE-BMS-NLP 5 gender Enhanced Environmental Protection
8 0.835625 0.184543 boulder_NLP 1 civil_status Open Foreign Policy
8 0.810864 0.175428 sbert_baseline 2 civil_status Liberal Society
8 0.793244 0.181590 GESIS-DSM 3 civil_status Expanded Welfare State
8 0.720015 0.186390 bm_35_baseline 4 civil_status Expanded Welfare State
8 0.717539 0.167862 TWENTE-BMS-NLP 5 civil_status Enhanced Environmental Protection
16 0.835437 0.170560 boulder_NLP 1 education Open Foreign Policy
16 0.803294 0.158082 sbert_baseline 2 education Liberal Society
16 0.780678 0.163216 GESIS-DSM 3 education Expanded Welfare State
16 0.719560 0.150388 TWENTE-BMS-NLP 4 education Enhanced Environmental Protection
16 0.693180 0.163362 bm_35_baseline 5 education Expanded Welfare State
20 0.836555 0.166120 boulder_NLP 1 education Open Foreign Policy
20 0.806363 0.153383 sbert_baseline 2 education Liberal Society
20 0.778455 0.157177 GESIS-DSM 3 education Expanded Welfare State
20 0.724297 0.144618 TWENTE-BMS-NLP 4 education Enhanced Environmental Protection
20 0.676614 0.154637 bm_35_baseline 5 education Expanded Welfare State
avg 0.837703 0.177944 boulder_NLP 1 civil_status Open Foreign Policy
avg 0.811612 0.167563 sbert_baseline 2 civil_status Liberal Society
avg 0.788693 0.172753 GESIS-DSM 3 civil_status Enhanced Environmental Protection
avg 0.719106 0.159484 TWENTE-BMS-NLP 4 civil_status Enhanced Environmental Protection
avg 0.704779 0.174761 bm_35_baseline 5 civil_status Enhanced Environmental Protection

Relevance - Scenario Explicit

k ndcg@k precision@k team team_rank
4 0.778944 0.693322 sövereign 1
4 0.746718 0.664001 TWENTE-BMS-NLP 2
4 0.719577 0.634820 GESIS-DSM 3
4 0.152074 0.149411 sbert_baseline 4
8 0.753616 0.601010 sövereign 1
8 0.717243 0.567691 TWENTE-BMS-NLP 2
8 0.688722 0.539913 GESIS-DSM 3
8 0.147092 0.140853 sbert_baseline 4
16 0.723055 0.494353 sövereign 1
16 0.688206 0.466716 TWENTE-BMS-NLP 2
16 0.661889 0.445391 GESIS-DSM 3
16 0.145617 0.134961 sbert_baseline 4
20 0.713903 0.457632 sövereign 1
20 0.680463 0.433698 TWENTE-BMS-NLP 2
20 0.654631 0.413215 GESIS-DSM 3
20 0.145981 0.133109 sbert_baseline 4
avg 0.742379 0.561579 sövereign 1
avg 0.708157 0.533026 TWENTE-BMS-NLP 2
avg 0.681205 0.508335 GESIS-DSM 3
avg 0.147691 0.139583 sbert_baseline 4

Diversity - Scenario Explicit

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.762587 0.204173 sövereign 1 civil_status Enhanced Environmental Protection
4 0.731383 0.201999 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
4 0.708314 0.201885 GESIS-DSM 3 civil_status Enhanced Environmental Protection
4 0.148394 0.188384 sbert_baseline 4 civil_status Open Foreign Policy
8 0.742925 0.195974 sövereign 1 political_spectrum Enhanced Environmental Protection
8 0.707982 0.193850 TWENTE-BMS-NLP 2 political_spectrum Enhanced Environmental Protection
8 0.683388 0.193435 GESIS-DSM 3 political_spectrum Enhanced Environmental Protection
8 0.144145 0.180716 sbert_baseline 4 political_spectrum Open Foreign Policy
16 0.720318 0.176849 sövereign 1 political_spectrum Enhanced Environmental Protection
16 0.685955 0.174981 TWENTE-BMS-NLP 2 education Enhanced Environmental Protection
16 0.662275 0.174012 GESIS-DSM 3 education Enhanced Environmental Protection
16 0.143560 0.163211 sbert_baseline 4 political_spectrum Open Foreign Policy
20 0.712741 0.170729 sövereign 1 political_spectrum Enhanced Environmental Protection
20 0.679465 0.169039 TWENTE-BMS-NLP 2 political_spectrum Enhanced Environmental Protection
20 0.655853 0.167952 GESIS-DSM 3 education Enhanced Environmental Protection
20 0.143953 0.157795 sbert_baseline 4 political_spectrum Open Foreign Policy
avg 0.734643 0.186931 sövereign 1 political_spectrum Enhanced Environmental Protection
avg 0.701196 0.184967 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
avg 0.677457 0.184321 GESIS-DSM 3 civil_status Enhanced Environmental Protection
avg 0.145013 0.172527 sbert_baseline 4 civil_status Open Foreign Policy
socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Implicit

k ndcg@k precision@k team team_rank
4 0.138857 0.134961 sbert_baseline 1
4 0.135854 0.135943 TWENTE-BMS-NLP 2
4 0.133892 0.133418 GESIS-DSM 3
8 0.135941 0.131453 sbert_baseline 1
8 0.133384 0.131594 TWENTE-BMS-NLP 2
8 0.131639 0.127736 GESIS-DSM 3
16 0.134325 0.126578 sbert_baseline 1
16 0.134060 0.127666 TWENTE-BMS-NLP 2
16 0.131696 0.123422 GESIS-DSM 3
20 0.135666 0.127301 TWENTE-BMS-NLP 1
20 0.134688 0.124804 sbert_baseline 2
20 0.132429 0.122447 GESIS-DSM 3
avg 0.135953 0.129449 sbert_baseline 1
avg 0.134741 0.130626 TWENTE-BMS-NLP 2
avg 0.132414 0.126755 GESIS-DSM 3

Diversity - Scenario Implicit

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.136044 0.187736 sbert_baseline 1 civil_status Liberal Society
4 0.130388 0.187460 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
4 0.128952 0.191469 GESIS-DSM 3 civil_status Expanded Welfare State
8 0.133246 0.179940 sbert_baseline 1 civil_status Liberal Economic Policy
8 0.128856 0.180906 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
8 0.127276 0.183187 GESIS-DSM 3 civil_status Expanded Welfare State
16 0.132503 0.162401 sbert_baseline 1 civil_status Liberal Economic Policy
16 0.130391 0.166958 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
16 0.128243 0.164174 GESIS-DSM 3 civil_status Expanded Welfare State
20 0.132824 0.156980 sbert_baseline 1 civil_status Liberal Economic Policy
20 0.131905 0.163094 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
20 0.129197 0.158113 GESIS-DSM 3 civil_status Expanded Welfare State
avg 0.133654 0.171764 sbert_baseline 1 civil_status Liberal Economic Policy
avg 0.130385 0.174605 TWENTE-BMS-NLP 2 civil_status Enhanced Environmental Protection
avg 0.128417 0.174236 GESIS-DSM 3 civil_status Enhanced Environmental Protection
socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Third Evaluation Circle (Suprise Test Set, election 2023, User Study)

Relevance - Scenario Baseline

k ndcg@k precision@k team team_rank
4 0.751645 0.777778 TWENTE-BMS-NLP 1
4 0.669450 0.694444 GESIS-DSM 2
4 0.669450 0.694444 sbert_baseline 2
4 0.549329 0.546296 sövereign 3
4 0.358807 0.361111 bm25_baseline 4
8 0.729739 0.731481 TWENTE-BMS-NLP 1
8 0.676268 0.689815 GESIS-DSM 2
8 0.676268 0.689815 sbert_baseline 2
8 0.545443 0.541667 sövereign 3
8 0.366001 0.370370 bm25_baseline 4
16 0.706378 0.696759 TWENTE-BMS-NLP 1
16 0.609884 0.590278 GESIS-DSM 2
16 0.609884 0.590278 sbert_baseline 2
16 0.540876 0.537037 sövereign 3
16 0.378635 0.386574 bm25_baseline 4
20 0.690208 0.674074 TWENTE-BMS-NLP 1
20 0.590431 0.564815 GESIS-DSM 2
20 0.590431 0.564815 sbert_baseline 2
20 0.537909 0.533333 sövereign 3
20 0.369074 0.370370 bm25_baseline 4
avg 0.719492 0.720023 TWENTE-BMS-NLP 1
avg 0.636508 0.634838 GESIS-DSM 2
avg 0.636508 0.634838 sbert_baseline 2
avg 0.543389 0.539583 sövereign 3
avg 0.368129 0.372106 bm25_baseline 4

Diversity - Scenario Baseline

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.692650 0.199453 TWENTE-BMS-NLP 1 education Enhanced Environmental Protection
4 0.614806 0.193470 sbert_baseline 2 education Liberal Society
4 0.614261 0.189161 GESIS-DSM 3 education Liberal Society
4 0.514943 0.198622 sövereign 4 education Expanded Welfare State
4 0.332209 0.207754 bm25_baseline 5 education Law & Order
8 0.698982 0.191964 TWENTE-BMS-NLP 1 education Enhanced Environmental Protection
8 0.642488 0.180104 GESIS-DSM 2 education Liberal Society
8 0.642160 0.183726 sbert_baseline 3 age Liberal Society
8 0.527252 0.187721 sövereign 4 education Expanded Welfare State
8 0.348938 0.191614 bm25_baseline 5 stance Law & Order
16 0.693610 0.171857 TWENTE-BMS-NLP 1 civil_status Enhanced Environmental Protection
16 0.601084 0.155062 GESIS-DSM 2 stance Liberal Society
16 0.600932 0.157257 sbert_baseline 3 age Liberal Society
16 0.530549 0.165135 sövereign 4 education Expanded Welfare State
16 0.365794 0.158984 bm25_baseline 5 civil_status Law & Order
20 0.682328 0.165512 TWENTE-BMS-NLP 1 civil_status Enhanced Environmental Protection
20 0.586336 0.147475 GESIS-DSM 2 stance Liberal Society
20 0.586076 0.149359 sbert_baseline 3 age Liberal Society
20 0.530000 0.158783 sövereign 4 education Expanded Welfare State
20 0.359814 0.149078 bm25_baseline 5 civil_status Law & Order
avg 0.691892 0.182196 TWENTE-BMS-NLP 1 education Enhanced Environmental Protection
avg 0.611042 0.167951 GESIS-DSM 2 age Liberal Society
avg 0.610994 0.170953 sbert_baseline 3 age Liberal Society
avg 0.525686 0.177565 sövereign 4 education Enhanced Environmental Protection
avg 0.351689 0.176857 bm25_baseline 5 education Law & Order
socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Explicit

k ndcg@k precision@k team team_rank
4 0.640162 0.593964 GESIS-DSM 1
4 0.590100 0.590535 TWENTE-BMS-NLP 2
4 0.559131 0.526406 sövereign 3
4 0.381226 0.378258 sbert_baseline 4
8 0.619106 0.582476 TWENTE-BMS-NLP 1
8 0.603002 0.487311 GESIS-DSM 2
8 0.567702 0.472394 sövereign 3
8 0.389244 0.356824 sbert_baseline 4
16 0.686393 0.542267 TWENTE-BMS-NLP 1
16 0.606888 0.399691 sövereign 2
16 0.579259 0.372771 GESIS-DSM 3
16 0.417007 0.315758 sbert_baseline 4
20 0.725216 0.523937 TWENTE-BMS-NLP 1
20 0.623785 0.371399 sövereign 2
20 0.574316 0.336831 GESIS-DSM 3
20 0.437995 0.305487 sbert_baseline 4
avg 0.655204 0.559804 TWENTE-BMS-NLP 1
avg 0.599185 0.447719 GESIS-DSM 2
avg 0.589377 0.442473 sövereign 3
avg 0.406368 0.339082 sbert_baseline 4

Diversity - Scenario Explicit

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.650068 0.228261 GESIS-DSM 1 age Liberal Society
4 0.576809 0.239758 TWENTE-BMS-NLP 2 stance political_spectrum
4 0.552310 0.215621 sövereign 3 stance Liberal Society
4 0.377455 0.203101 sbert_baseline 4 stance political_spectrum
8 0.623374 0.219347 GESIS-DSM 1 age Liberal Society
8 0.607758 0.234161 TWENTE-BMS-NLP 2 stance political_spectrum
8 0.563101 0.207237 sövereign 3 stance Liberal Society
8 0.387433 0.190764 sbert_baseline 4 stance political_spectrum
16 0.659803 0.222237 TWENTE-BMS-NLP 1 stance political_spectrum
16 0.601287 0.199812 GESIS-DSM 2 gender Liberal Society
16 0.594174 0.188411 sövereign 3 stance political_spectrum
16 0.408144 0.162935 sbert_baseline 4 stance political_spectrum
20 0.689067 0.218854 TWENTE-BMS-NLP 1 stance political_spectrum
20 0.606706 0.182806 sövereign 2 stance political_spectrum
20 0.596274 0.193914 GESIS-DSM 3 stance Liberal Society
20 0.423671 0.154293 sbert_baseline 4 stance political_spectrum
avg 0.633359 0.228753 TWENTE-BMS-NLP 1 stance political_spectrum
avg 0.617751 0.210334 GESIS-DSM 2 age Liberal Society
avg 0.579073 0.198518 sövereign 3 stance Liberal Society
avg 0.399176 0.177773 sbert_baseline 4 stance political_spectrum
socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Implicit

k ndcg@k precision@k team team_rank
4 0.441651 0.456447 GESIS-DSM 1
4 0.382125 0.392661 sbert_baseline 2
4 0.346715 0.342593 sövereign 3
8 0.456150 0.428841 GESIS-DSM 1
8 0.397490 0.372942 sbert_baseline 2
8 0.364934 0.337791 sövereign 3
16 0.484986 0.373714 GESIS-DSM 1
16 0.422964 0.334019 sövereign 2
16 0.420502 0.322874 sbert_baseline 3
20 0.502046 0.352675 GESIS-DSM 1
20 0.452568 0.330247 sövereign 2
20 0.436431 0.306447 sbert_baseline 3
avg 0.471208 0.402919 GESIS-DSM 1
avg 0.409137 0.348731 sbert_baseline 2
avg 0.396795 0.336163 sövereign 3

Diversity - Scenario Implicit

k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α)
4 0.423761 0.198571 GESIS-DSM 1 stance political_spectrum
4 0.369611 0.199280 sbert_baseline 2 stance political_spectrum
4 0.339328 0.196251 sövereign 3 stance residence
8 0.443519 0.188251 GESIS-DSM 1 stance political_spectrum
8 0.388724 0.187431 sbert_baseline 2 stance political_spectrum
8 0.359010 0.186613 sövereign 3 stance political_spectrum
16 0.466246 0.162230 GESIS-DSM 1 stance political_spectrum
16 0.407494 0.159313 sbert_baseline 2 stance political_spectrum
16 0.403965 0.165996 sövereign 3 stance political_spectrum
20 0.478868 0.154204 GESIS-DSM 1 stance political_spectrum
20 0.426963 0.159973 sövereign 2 stance political_spectrum
20 0.419468 0.149905 sbert_baseline 3 stance political_spectrum
avg 0.453098 0.175814 GESIS-DSM 1 stance political_spectrum
avg 0.396324 0.173982 sbert_baseline 2 stance political_spectrum
avg 0.382317 0.177208 sövereign 3 stance political_spectrum
socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Leaderboards

The following two table show the final results for each track: relevance and diversity. We averaged the metric over k values, and then averaged over test sets. For comparison we averaged over the rank instead of the metric, no submitted runs were assigned the last possible rank.

Relevance

CSV To HTML using codebeautify.org
Team Mean Rank Mean (NDCG) Rank
twente-bms-nlp 1.33 0.707 1
sövereign 2.22 0.632 2
GESIS-DSM 3.44 0.607 3
sbert_baseline 4.44 0.518 4
turiya 5.0 0.445 5
team031 5.44 0.417 6
boulderNLP 6.44 0.292 7
bm25_baseline 7.67 0.195 8
Detailed Results for Relevance

CSV To HTML using codebeautify.org

Show Table
k ndcg@k precision@k team team_rank test_set scenario
4 1.0 1.0 sövereign 1 test1 baseline
4 0.990189057472516 0.9888888888888888 sbert_baseline 2 test1 baseline
4 0.990189057472516 0.9888888888888888 GESIS-DSM 2 test1 baseline
4 0.9881901089435936 0.9833333333333332 turiya 3 test1 baseline
4 0.9858515013614926 0.9888888888888888 boulderNLP 4 test1 baseline
4 0.9821153338888012 0.9833333333333332 twente-bms-nlp 5 test1 baseline
4 0.9168462271127248 0.9222222222222224 team031 6 test1 baseline
4 0.7161839498029191 0.6833333333333333 bm25_baseline 7 test1 baseline
8 1.0 1.0 sövereign 1 test1 baseline
8 0.9875927496432856 0.9861111111111112 sbert_baseline 2 test1 baseline
8 0.9875927496432856 0.9861111111111112 GESIS-DSM 2 test1 baseline
8 0.9866386041983832 0.9888888888888888 twente-bms-nlp 3 test1 baseline
8 0.9833098934306858 0.9833333333333332 boulderNLP 4 test1 baseline
8 0.9792047038130544 0.9722222222222222 turiya 5 test1 baseline
8 0.9072481161687932 0.9055555555555556 team031 6 test1 baseline
8 0.6716770078215716 0.6361111111111111 bm25_baseline 7 test1 baseline
16 0.998038122956649 0.9972222222222222 sövereign 1 test1 baseline
16 0.9894021441383424 0.9916666666666668 twente-bms-nlp 2 test1 baseline
16 0.9832545277354618 0.9805555555555556 GESIS-DSM 3 test1 baseline
16 0.9832545277354618 0.9805555555555556 sbert_baseline 3 test1 baseline
16 0.9773249241880808 0.975 boulderNLP 4 test1 baseline
16 0.9751721136524104 0.9694444444444444 turiya 5 test1 baseline
16 0.899859643525079 0.8958333333333334 team031 6 test1 baseline
16 0.6194260623649784 0.5791666666666667 bm25_baseline 7 test1 baseline
20 0.9982984713489932 0.9977777777777778 sövereign 1 test1 baseline
20 0.990078186412697 0.9922222222222222 twente-bms-nlp 2 test1 baseline
20 0.9810926808166834 0.9777777777777776 GESIS-DSM 3 test1 baseline
20 0.9810926808166834 0.9777777777777776 sbert_baseline 3 test1 baseline
20 0.972582822889201 0.9666666666666668 turiya 4 test1 baseline
20 0.9714870761972928 0.9666666666666668 boulderNLP 5 test1 baseline
20 0.8924909218930537 0.8855555555555555 team031 6 test1 baseline
20 0.5968774789265284 0.5533333333333333 bm25_baseline 7 test1 baseline
avg 0.9990841485764106 0.99875 sövereign 1 test1 baseline
avg 0.987058567159556 0.9890277777777776 twente-bms-nlp 2 test1 baseline
avg 0.9855322539169866 0.9833333333333334 GESIS-DSM 3 test1 baseline
avg 0.9855322539169866 0.9833333333333334 sbert_baseline 3 test1 baseline
avg 0.979493348794388 0.9784722222222222 boulderNLP 4 test1 baseline
avg 0.9787874373245647 0.9729166666666668 turiya 5 test1 baseline
avg 0.9041112271749128 0.9022916666666668 team031 6 test1 baseline
avg 0.6510411247289993 0.6129861111111111 bm25_baseline 7 test1 baseline
4 0.9001480461649514 0.8242154368108566 twente-bms-nlp 1 test1 explicit
4 0.8938215896369928 0.8176420695504665 sövereign 2 test1 explicit
4 0.8640193442830918 0.7926208651399491 GESIS-DSM 3 test1 explicit
4 0.7100640009053408 0.6593511450381679 turiya 4 test1 explicit
4 0.2175003870916797 0.2168150975402883 sbert_baseline 5 test1 explicit
4 0.1805931791924541 0.1805555555555555 team031 6 test1 explicit
8 0.8922772927899614 0.7459711620016963 twente-bms-nlp 1 test1 explicit
8 0.8802957150793026 0.7360581000848176 sövereign 2 test1 explicit
8 0.8413687834128213 0.7104007633587787 GESIS-DSM 3 test1 explicit
8 0.6893920594193227 0.595742962155176 turiya 4 test1 explicit
8 0.2195557791900342 0.2172391857506361 sbert_baseline 5 test1 explicit
8 0.1802815307736844 0.1776399491094147 team031 6 test1 explicit
16 0.8933848950285854 0.6624522900763359 twente-bms-nlp 1 test1 explicit
16 0.8716583627924369 0.650657336726039 sövereign 2 test1 explicit
16 0.8216202968972766 0.623717133163698 GESIS-DSM 3 test1 explicit
16 0.670775601164894 0.5372995801908015 turiya 4 test1 explicit
16 0.2258025677535813 0.2189620441051738 sbert_baseline 5 test1 explicit
16 0.1819053163856849 0.175917090754877 team031 6 test1 explicit
20 0.8951817256118378 0.6338634435962681 twente-bms-nlp 1 test1 explicit
20 0.8681664385182296 0.6220101781170484 sövereign 2 test1 explicit
20 0.8143315303846864 0.5946953633022336 GESIS-DSM 3 test1 explicit
20 0.6641266387944684 0.5193389539436657 turiya 4 test1 explicit
20 0.2270783113113688 0.2175784563189143 sbert_baseline 5 test1 explicit
20 0.1825981750938157 0.1745547073791348 team031 6 test1 explicit
avg 0.895247989898834 0.7166255831212892 twente-bms-nlp 1 test1 explicit
avg 0.8784855265067405 0.7065919211195928 sövereign 2 test1 explicit
avg 0.835334988744469 0.6803585312411649 GESIS-DSM 3 test1 explicit
avg 0.6835895750710065 0.5779331603319527 turiya 4 test1 explicit
avg 0.222484261336666 0.2176486959287531 sbert_baseline 5 test1 explicit
avg 0.1813445503614097 0.1771668256997454 team031 6 test1 explicit
4 0.2054552287288432 0.2055767599660729 sövereign 1 test1 implicit
4 0.1953762100874915 0.1957167090754877 GESIS-DSM 2 test1 implicit
4 0.1953215904929799 0.1956106870229007 twente-bms-nlp 3 test1 implicit
4 0.1950354227210964 0.1961407972858354 sbert_baseline 4 test1 implicit
4 0.1805931791924541 0.1805555555555555 team031 5 test1 implicit
8 0.2095966886054688 0.2090224766751484 sövereign 1 test1 implicit
8 0.2006420981831635 0.2008587786259542 twente-bms-nlp 2 test1 implicit
8 0.1998037256574479 0.2005937234944868 sbert_baseline 3 test1 implicit
8 0.1995275545197261 0.199374469889737 GESIS-DSM 4 test1 implicit
8 0.1802815307736844 0.1776399491094147 team031 5 test1 implicit
16 0.2172998625323523 0.214350084817642 sövereign 1 test1 implicit
16 0.2070910886627467 0.2043575063613231 twente-bms-nlp 2 test1 implicit
16 0.2050723911465899 0.2025021204410517 sbert_baseline 3 test1 implicit
16 0.2049944998931236 0.2017069550466497 GESIS-DSM 4 test1 implicit
16 0.1819053163856849 0.175917090754877 team031 5 test1 implicit
20 0.2197860194050664 0.2147794741306191 sövereign 1 test1 implicit
20 0.2108761109974972 0.2066581849024597 twente-bms-nlp 2 test1 implicit
20 0.2085997304828043 0.2041136556403732 sbert_baseline 3 test1 implicit
20 0.2082150355571272 0.2034563189143342 GESIS-DSM 4 test1 implicit
20 0.1825981750938157 0.1745547073791348 team031 5 test1 implicit
avg 0.2130344498179326 0.2109321988973705 sövereign 1 test1 implicit
avg 0.2034827220840968 0.2018712892281594 twente-bms-nlp 2 test1 implicit
avg 0.2021278175019846 0.2008375742154367 sbert_baseline 3 test1 implicit
avg 0.2020283250143671 0.2000636132315521 GESIS-DSM 4 test1 implicit
avg 0.1813445503614097 0.1771668256997454 team031 5 test1 implicit
4 0.9506765622181236 0.95 twente-bms-nlp 1 test2 baseline
4 0.9370083183448402 0.9375 turiya 2 test2 baseline
4 0.9061575597185184 0.9 sövereign 3 test2 baseline
4 0.9035266261886546 0.9125 boulderNLP 4 test2 baseline
4 0.8837131779057522 0.8875 sbert_baseline 5 test2 baseline
4 0.8837131779057522 0.8875 GESIS-DSM 5 test2 baseline
4 0.8172701854045675 0.8125 team031 6 test2 baseline
4 0.7756012468754936 0.775 bm25_baseline 7 test2 baseline
8 0.9438893622828144 0.940625 twente-bms-nlp 1 test2 baseline
8 0.9302213830137418 0.928125 turiya 2 test2 baseline
8 0.8977474997836907 0.890625 sövereign 3 test2 baseline
8 0.8927744682895797 0.89375 boulderNLP 4 test2 baseline
8 0.8630103992747806 0.85625 sbert_baseline 5 test2 baseline
8 0.8630103992747806 0.85625 GESIS-DSM 5 test2 baseline
8 0.807023471012481 0.8 team031 6 test2 baseline
8 0.7643677571905212 0.759375 bm25_baseline 7 test2 baseline
16 0.926180193487849 0.9171875 twente-bms-nlp 1 test2 baseline
16 0.9099072816558096 0.9 turiya 2 test2 baseline
16 0.8898112362771199 0.8828125 sövereign 3 test2 baseline
16 0.8743999301440972 0.8671875 boulderNLP 4 test2 baseline
16 0.8376410923352875 0.8234375 sbert_baseline 5 test2 baseline
16 0.8376410923352875 0.8234375 GESIS-DSM 5 test2 baseline
16 0.799430458369072 0.7921875 team031 6 test2 baseline
16 0.7171996891582305 0.69375 bm25_baseline 7 test2 baseline
20 0.9227059842814548 0.91375 twente-bms-nlp 1 test2 baseline
20 0.9028388865527066 0.89125 turiya 2 test2 baseline
20 0.8853640741967123 0.8775000000000001 sövereign 3 test2 baseline
20 0.8694925639825994 0.8612500000000001 boulderNLP 4 test2 baseline
20 0.8360211803043445 0.8237500000000001 sbert_baseline 5 test2 baseline
20 0.8360211803043445 0.8237500000000001 GESIS-DSM 5 test2 baseline
20 0.8019289404650657 0.7975 team031 6 test2 baseline
20 0.6926774698203287 0.66125 bm25_baseline 7 test2 baseline
avg 0.9358630255675604 0.930390625 twente-bms-nlp 1 test2 baseline
avg 0.9199939673917744 0.91421875 turiya 2 test2 baseline
avg 0.8947700924940104 0.887734375 sövereign 3 test2 baseline
avg 0.8850483971512327 0.883671875 boulderNLP 4 test2 baseline
avg 0.8550964624550412 0.847734375 sbert_baseline 5 test2 baseline
avg 0.8550964624550412 0.847734375 GESIS-DSM 5 test2 baseline
avg 0.8064132638127965 0.800546875 team031 6 test2 baseline
avg 0.7374615407611435 0.72234375 bm25_baseline 7 test2 baseline
4 0.8401994013581225 0.7487373737373737 sövereign 1 test2 explicit
4 0.821466482568112 0.7370931537598204 twente-bms-nlp 2 test2 explicit
4 0.7597256206685896 0.6770482603815937 GESIS-DSM 3 test2 explicit
4 0.7181658241944868 0.670314253647587 turiya 4 test2 explicit
4 0.1520743812311075 0.1494107744107744 sbert_baseline 5 test2 explicit
4 0.1291370189622331 0.127665544332211 team031 6 test2 explicit
8 0.8311386022544768 0.6627384960718294 sövereign 1 test2 explicit
8 0.8034594278677059 0.647516835016835 twente-bms-nlp 2 test2 explicit
8 0.7304752871670588 0.5797558922558923 GESIS-DSM 3 test2 explicit
8 0.6865914880186403 0.6023809523809524 turiya 4 test2 explicit
8 0.1470920609146963 0.1408529741863075 sbert_baseline 5 test2 explicit
8 0.1289305941676711 0.1257014590347923 team031 6 test2 explicit
16 0.8138006848030526 0.558641975308642 sövereign 1 test2 explicit
16 0.7867159997762964 0.5454545454545454 twente-bms-nlp 2 test2 explicit
16 0.7021023592617129 0.4772774036662925 GESIS-DSM 3 test2 explicit
16 0.6508330940798412 0.5271792928358585 turiya 4 test2 explicit
16 0.1456174257912896 0.1349607182940516 sbert_baseline 5 test2 explicit
16 0.1326741155169621 0.1244739057239057 team031 6 test2 explicit
20 0.8087703608546549 0.5222222222222223 sövereign 1 test2 explicit
20 0.7818119173753864 0.5101290684624018 twente-bms-nlp 2 test2 explicit
20 0.6943552440639758 0.4434530490086045 GESIS-DSM 3 test2 explicit
20 0.6408987980659909 0.5031557783460844 turiya 4 test2 explicit
20 0.1459810843316855 0.1331088664421998 sbert_baseline 5 test2 explicit
20 0.1353178895431881 0.1253086419753086 team031 6 test2 explicit
avg 0.8234772623175767 0.6230850168350168 sövereign 1 test2 explicit
avg 0.7983634568968752 0.6100484006734006 twente-bms-nlp 2 test2 explicit
avg 0.7216646277903342 0.5443836513280957 GESIS-DSM 3 test2 explicit
avg 0.6741223010897399 0.5757575693026206 turiya 4 test2 explicit
avg 0.1476912380671947 0.1395833333333333 sbert_baseline 5 test2 explicit
avg 0.1315149045475136 0.1257873877665544 team031 6 test2 explicit
4 0.1491400488763286 0.1473063973063973 twente-bms-nlp 1 test2 implicit
4 0.1402859410706227 0.1404320987654321 GESIS-DSM 2 test2 implicit
4 0.138856828432963 0.1349607182940516 sbert_baseline 3 test2 implicit
4 0.1359238732277851 0.1352413019079685 sövereign 4 test2 implicit
4 0.1291370189622331 0.127665544332211 team031 5 test2 implicit
8 0.1487113747836123 0.1463243546576879 twente-bms-nlp 1 test2 implicit
8 0.1377485887860268 0.1367845117845118 sövereign 2 test2 implicit
8 0.136982480497495 0.1346801346801346 GESIS-DSM 3 test2 implicit
8 0.1359412289049273 0.1314534231200897 sbert_baseline 4 test2 implicit
8 0.1289305941676711 0.1257014590347923 team031 5 test2 implicit
16 0.1489668236672465 0.1416596520763187 twente-bms-nlp 1 test2 implicit
16 0.140683237693298 0.1360830527497194 sövereign 2 test2 implicit
16 0.1376507499622861 0.1309624017957351 GESIS-DSM 3 test2 implicit
16 0.1343245457902445 0.1265782828282828 sbert_baseline 4 test2 implicit
16 0.1326741155169621 0.1244739057239057 team031 5 test2 implicit
20 0.1508884226776939 0.1408249158249158 twente-bms-nlp 1 test2 implicit
20 0.1421854720014271 0.1354938271604938 sövereign 2 test2 implicit
20 0.1395176107479813 0.1308361391694725 GESIS-DSM 3 test2 implicit
20 0.1353178895431881 0.1253086419753086 team031 4 test2 implicit
20 0.1346876902929752 0.1248035914702581 sbert_baseline 5 test2 implicit
avg 0.1494266675012203 0.1440288299663299 twente-bms-nlp 1 test2 implicit
avg 0.1391352929271342 0.1359006734006733 sövereign 2 test2 implicit
avg 0.1386091955695962 0.1342276936026935 GESIS-DSM 3 test2 implicit
avg 0.1359525733552775 0.1294490039281705 sbert_baseline 4 test2 implicit
avg 0.1315149045475136 0.1257873877665544 team031 5 test2 implicit
4 0.955733702777924 0.9537037037037036 twente-bms-nlp 1 test3 baseline
4 0.8012470305565794 0.8240740740740741 turiya 2 test3 baseline
4 0.7723299898164238 0.7962962962962963 boulderNLP 3 test3 baseline
4 0.6694504532875712 0.6944444444444444 sbert_baseline 4 test3 baseline
4 0.6694504532875712 0.6944444444444444 GESIS-DSM 4 test3 baseline
4 0.6676689810638864 0.6666666666666666 sövereign 5 test3 baseline
4 0.6290818353689971 0.6388888888888888 team031 6 test3 baseline
4 0.3588072149065744 0.3611111111111111 bm25_baseline 7 test3 baseline
8 0.954941801480141 0.9537037037037036 twente-bms-nlp 1 test3 baseline
8 0.7971576499451849 0.8055555555555556 turiya 2 test3 baseline
8 0.7765162640957356 0.7870370370370371 boulderNLP 3 test3 baseline
8 0.6762677388062868 0.6898148148148148 sbert_baseline 4 test3 baseline
8 0.6762677388062868 0.6898148148148148 GESIS-DSM 4 test3 baseline
8 0.6394523697942182 0.625 sövereign 5 test3 baseline
8 0.6043411978764253 0.5972222222222222 team031 6 test3 baseline
8 0.3660008931850136 0.3703703703703703 bm25_baseline 7 test3 baseline
16 0.938689557839632 0.9305555555555556 twente-bms-nlp 1 test3 baseline
16 0.7549540209028123 0.7523148148148148 boulderNLP 2 test3 baseline
16 0.7290344996287841 0.7037037037037037 turiya 3 test3 baseline
16 0.6099855894459848 0.5902777777777778 sövereign 4 test3 baseline
16 0.6098835373261119 0.5902777777777778 sbert_baseline 5 test3 baseline
16 0.6098835373261119 0.5902777777777778 GESIS-DSM 5 test3 baseline
16 0.567728112759321 0.5486111111111112 team031 6 test3 baseline
16 0.378634673193704 0.386574074074074 bm25_baseline 7 test3 baseline
20 0.9259681562666584 0.9129629629629628 twente-bms-nlp 1 test3 baseline
20 0.7383461732610088 0.7277777777777777 boulderNLP 2 test3 baseline
20 0.7022747547006197 0.6685185185185185 turiya 3 test3 baseline
20 0.5953882623344432 0.5722222222222222 sövereign 4 test3 baseline
20 0.5904311320364634 0.5648148148148148 sbert_baseline 5 test3 baseline
20 0.5904311320364634 0.5648148148148148 GESIS-DSM 5 test3 baseline
20 0.5687193359245571 0.5537037037037037 team031 6 test3 baseline
20 0.3690742436025672 0.3703703703703703 bm25_baseline 7 test3 baseline
avg 0.9438333045910888 0.9377314814814814 twente-bms-nlp 1 test3 baseline
avg 0.7605366120189951 0.7658564814814816 boulderNLP 2 test3 baseline
avg 0.757428483707792 0.750462962962963 turiya 3 test3 baseline
avg 0.6365082153641083 0.6348379629629629 sbert_baseline 4 test3 baseline
avg 0.6365082153641083 0.6348379629629629 GESIS-DSM 4 test3 baseline
avg 0.6281238006596331 0.6135416666666667 sövereign 5 test3 baseline
avg 0.5924676204823252 0.5846064814814815 team031 6 test3 baseline
avg 0.3681292562219648 0.3721064814814814 bm25_baseline 7 test3 baseline
4 0.8198554563236097 0.7767489711934157 twente-bms-nlp 1 test3 explicit
4 0.7005811735273945 0.6722679469593049 turiya 2 test3 explicit
4 0.6988445277827787 0.6529492455418381 sövereign 3 test3 explicit
4 0.6557260465824912 0.6069958847736625 GESIS-DSM 4 test3 explicit
4 0.3812260923239645 0.3782578875171468 sbert_baseline 5 test3 explicit
4 0.3744073973289669 0.3710562414266118 team031 6 test3 explicit
8 0.7970654428040227 0.664437585733882 twente-bms-nlp 1 test3 explicit
8 0.6759496125041157 0.5493827160493827 sövereign 2 test3 explicit
8 0.6547858097440934 0.5804036841073877 turiya 3 test3 explicit
8 0.621075561362604 0.504286694101509 GESIS-DSM 4 test3 explicit
8 0.3892442896858138 0.3568244170096022 sbert_baseline 5 test3 explicit
8 0.3885170845229115 0.3621399176954732 team031 6 test3 explicit
16 0.7872409801663298 0.5279492455418381 twente-bms-nlp 1 test3 explicit
16 0.6621420160395799 0.4297839506172839 sövereign 2 test3 explicit
16 0.6214336086761723 0.4928626357638703 turiya 3 test3 explicit
16 0.5962899192860883 0.3859739368998628 GESIS-DSM 4 test3 explicit
16 0.4317931596800455 0.3436213991769547 team031 5 test3 explicit
16 0.4170067980409585 0.3157578875171468 sbert_baseline 6 test3 explicit
20 0.7866482366998089 0.4811385459533607 twente-bms-nlp 1 test3 explicit
20 0.6559260234324775 0.3853909465020576 sövereign 2 test3 explicit
20 0.6147280552525008 0.4653386916207184 turiya 3 test3 explicit
20 0.5911325176447614 0.3486968449931413 GESIS-DSM 4 test3 explicit
20 0.4577675273363525 0.3371056241426611 team031 5 test3 explicit
20 0.4379952789940077 0.3054869684499314 sbert_baseline 6 test3 explicit
avg 0.7977025289984427 0.6125685871056241 twente-bms-nlp 1 test3 explicit
avg 0.6732155449397379 0.5043767146776407 sövereign 2 test3 explicit
avg 0.6478821618000402 0.5527182396128203 turiya 3 test3 explicit
avg 0.6160560112189862 0.4614883401920439 GESIS-DSM 4 test3 explicit
avg 0.413121292217069 0.3534807956104252 team031 5 test3 explicit
avg 0.4063681147611861 0.3390817901234567 sbert_baseline 6 test3 explicit
4 0.5901002912696925 0.5905349794238683 twente-bms-nlp 1 test3 implicit
4 0.4416508936048149 0.4564471879286694 GESIS-DSM 2 test3 implicit
4 0.3922386671544742 0.3861454046639231 sövereign 3 test3 implicit
4 0.3821253012184295 0.3926611796982167 sbert_baseline 4 test3 implicit
4 0.3744073973289669 0.3710562414266118 team031 5 test3 implicit
8 0.6191061939721301 0.5824759945130316 twente-bms-nlp 1 test3 implicit
8 0.4561503447358589 0.428840877914952 GESIS-DSM 2 test3 implicit
8 0.4073474290111054 0.3736282578875171 sövereign 3 test3 implicit
8 0.3974903953664807 0.3729423868312757 sbert_baseline 4 test3 implicit
8 0.3885170845229115 0.3621399176954732 team031 5 test3 implicit
16 0.6863927918686149 0.5422668038408779 twente-bms-nlp 1 test3 implicit
16 0.4849864909233201 0.3737139917695473 GESIS-DSM 2 test3 implicit
16 0.4592592159817502 0.3548525377229081 sövereign 3 test3 implicit
16 0.4317931596800455 0.3436213991769547 team031 4 test3 implicit
16 0.4205015577091178 0.3228737997256515 sbert_baseline 5 test3 implicit
20 0.7252156135941026 0.5239368998628258 twente-bms-nlp 1 test3 implicit
20 0.5020455177182803 0.3526748971193416 GESIS-DSM 2 test3 implicit
20 0.4849894963542152 0.3441700960219478 sövereign 3 test3 implicit
20 0.4577675273363525 0.3371056241426611 team031 4 test3 implicit
20 0.4364310782867878 0.3064471879286694 sbert_baseline 5 test3 implicit
avg 0.655203722676135 0.5598036694101509 twente-bms-nlp 1 test3 implicit
avg 0.4712083117455685 0.4029192386831275 GESIS-DSM 2 test3 implicit
avg 0.4359587021253862 0.364699074074074 sövereign 3 test3 implicit
avg 0.413121292217069 0.3534807956104252 team031 4 test3 implicit
avg 0.4091370831452039 0.3487311385459533 sbert_baseline 5 test3 implicit

Diversity

Leaderboard Diversity
team mean_rank mean(αNDCG) rank
twente-bms-nlp 1.67 0.672 1
sövereign 2.22 0.601 2
GESIS-DSM 3.44 0.579 3
sbert_baseline 4.67 0.495 4
turiya 5.0 0.419 5
team031 5.78 0.394 6
boulderNLP 6.56 0.271 7
bm25_baseline 8.0 0.185 8
Final Results Diversity

Detailed Results for Diversity CSV To HTML using codebeautify.org

Show Table
k αNDCG@k klDiv@k team team_rank socioVar(lowest_α) socioVar(highest_α) test_set scenario
4 0.9019970009344244 0.1545358338681438 GESIS-DSM 1 political_spectrum Open Foreign Policy test1 baseline
4 0.901690015804786 0.1551472473081518 sbert_baseline 2 political_spectrum Open Foreign Policy test1 baseline
4 0.9001443511545081 0.1735484649913478 sövereign 3 political_spectrum gender test1 baseline
4 0.8974267304082445 0.1597936041696669 boulderNLP 4 age Expanded Welfare State test1 baseline
4 0.8902245883191952 0.1748546359011083 turiya 5 political_spectrum gender test1 baseline
4 0.881596836723969 0.1744801535869937 twente-bms-nlp 6 political_spectrum Open Foreign Policy test1 baseline
4 0.8364699126677203 0.1541542514654867 team031 7 civil_status gender test1 baseline
4 0.6726820197638872 0.1523011191459042 bm25_baseline 8 political_spectrum gender test1 baseline
8 0.913536118382498 0.1583148521004212 sövereign 1 political_spectrum gender test1 baseline
8 0.9089622315243664 0.1394203854946891 GESIS-DSM 2 political_spectrum gender test1 baseline
8 0.908762417677734 0.139903804902831 sbert_baseline 3 political_spectrum gender test1 baseline
8 0.90804629536743 0.1426355833585971 boulderNLP 4 political_spectrum Open Foreign Policy test1 baseline
8 0.8999893260995866 0.1582434862398295 twente-bms-nlp 5 political_spectrum gender test1 baseline
8 0.8963954326730816 0.1580407037821194 turiya 6 political_spectrum gender test1 baseline
8 0.8395067256513395 0.1380091227682019 team031 7 political_spectrum Liberal Society test1 baseline
8 0.6431872932833367 0.1360515419120727 bm25_baseline 8 political_spectrum gender test1 baseline
16 0.9334435269621736 0.1244070719171076 sövereign 1 education gender test1 baseline
16 0.924272584352686 0.1230319761817763 twente-bms-nlp 2 education gender test1 baseline
16 0.924070105264207 0.1061699588125747 GESIS-DSM 3 education gender test1 baseline
16 0.9239978191796326 0.106429241058285 sbert_baseline 4 education gender test1 baseline
16 0.9215736888963632 0.107496322829392 boulderNLP 5 education Open Foreign Policy test1 baseline
16 0.9134880134828716 0.1222881690885618 turiya 6 education gender test1 baseline
16 0.8499053702539047 0.1029188979728469 team031 7 education Liberal Society test1 baseline
16 0.6086479469897234 0.1027533544093416 bm25_baseline 8 education gender test1 baseline
20 0.9414020542731196 0.1151429098272939 sövereign 1 education gender test1 baseline
20 0.932912354849055 0.1125954204983612 twente-bms-nlp 2 education gender test1 baseline
20 0.929634808717093 0.0970301722088382 GESIS-DSM 3 education gender test1 baseline
20 0.9295570582523096 0.0972666613453144 sbert_baseline 4 education gender test1 baseline
20 0.923571804546576 0.0978047524775676 boulderNLP 5 education Open Foreign Policy test1 baseline
20 0.919242030402476 0.1112468329139299 turiya 6 education gender test1 baseline
20 0.8508566258944938 0.0927611297043296 team031 7 education Liberal Society test1 baseline
20 0.5927598871171057 0.0935480838185558 bm25_baseline 8 education gender test1 baseline
avg 0.9221315126930748 0.1428533247090426 sövereign 1 education gender test1 baseline
avg 0.9161660366100228 0.1242890875960614 GESIS-DSM 2 education gender test1 baseline
avg 0.9160018277286156 0.1246867386536455 sbert_baseline 3 education gender test1 baseline
avg 0.9126546298046534 0.1269325657088059 boulderNLP 4 education Open Foreign Policy test1 baseline
avg 0.9096927755063242 0.1420877591267401 twente-bms-nlp 5 political_spectrum gender test1 baseline
avg 0.904837516219406 0.1416075854214298 turiya 6 political_spectrum gender test1 baseline
avg 0.8441846586168645 0.1219608504777162 team031 7 education gender test1 baseline
avg 0.6293192867885132 0.1211635248214685 bm25_baseline 8 education gender test1 baseline
4 0.843143049430011 0.206811326664089 twente-bms-nlp 1 age Open Foreign Policy test1 explicit
4 0.8389643364349352 0.2062025477390115 sövereign 2 political_spectrum Open Foreign Policy test1 explicit
4 0.814501349866188 0.2019475524427494 GESIS-DSM 3 age Open Foreign Policy test1 explicit
4 0.6723072323331305 0.195715049435303 turiya 4 age gender test1 explicit
4 0.1988500982088648 0.1699434119445776 sbert_baseline 5 political_spectrum gender test1 explicit
4 0.1678368307523686 0.1541460741310489 team031 6 civil_status gender test1 explicit
8 0.8483257755666023 0.1937447559212075 twente-bms-nlp 1 political_spectrum Open Foreign Policy test1 explicit
8 0.8419795433215165 0.1931365771629827 sövereign 2 political_spectrum Open Foreign Policy test1 explicit
8 0.8090017369108808 0.1891929693342114 GESIS-DSM 3 political_spectrum Open Foreign Policy test1 explicit
8 0.667602303255406 0.1813940535047481 turiya 4 political_spectrum gender test1 explicit
8 0.2040679201709478 0.1538031471988926 sbert_baseline 5 political_spectrum gender test1 explicit
8 0.1701345417762519 0.1380034529038456 team031 6 education gender test1 explicit
16 0.8572370964146122 0.1660187434028118 twente-bms-nlp 1 education Open Foreign Policy test1 explicit
16 0.8470379390612589 0.1655245690369987 sövereign 2 education Open Foreign Policy test1 explicit
16 0.803539135546609 0.1626582524156065 GESIS-DSM 3 education Open Foreign Policy test1 explicit
16 0.6618105993488821 0.150733262478512 turiya 4 education gender test1 explicit
16 0.2127694884642832 0.1200146960388071 sbert_baseline 5 education gender test1 explicit
16 0.1739887825464037 0.1029195802371271 team031 6 education gender test1 explicit
20 0.8594772939635447 0.1582004705605105 twente-bms-nlp 1 education Open Foreign Policy test1 explicit
20 0.8472002421000155 0.1576117216526112 sövereign 2 education Open Foreign Policy test1 explicit
20 0.8000801802116122 0.1551870652724488 GESIS-DSM 3 education Open Foreign Policy test1 explicit
20 0.6586734598980795 0.1419128868928617 turiya 4 education gender test1 explicit
20 0.2149343712386582 0.1105472062194354 sbert_baseline 5 education gender test1 explicit
20 0.1753115166608778 0.0927666777509362 team031 6 education gender test1 explicit
avg 0.8520458038436926 0.1811938241371546 twente-bms-nlp 1 political_spectrum Open Foreign Policy test1 explicit
avg 0.8437955152294315 0.180618853897901 sövereign 2 political_spectrum Open Foreign Policy test1 explicit
avg 0.8067806006338225 0.177246459866254 GESIS-DSM 3 political_spectrum Open Foreign Policy test1 explicit
avg 0.6650983987088745 0.1674388130778561 turiya 4 political_spectrum gender test1 explicit
avg 0.2076554695206885 0.1385771153504281 sbert_baseline 5 political_spectrum gender test1 explicit
avg 0.1718179179339755 0.1219589462557394 team031 6 education gender test1 explicit
4 0.1883944407371643 0.1662334296479237 sövereign 1 political_spectrum gender test1 implicit
4 0.1804283405800343 0.1540634630028475 GESIS-DSM 2 political_spectrum gender test1 implicit
4 0.1803229481745985 0.1541849972369482 twente-bms-nlp 3 political_spectrum gender test1 implicit
4 0.1798008941970487 0.1552243541769757 sbert_baseline 4 political_spectrum gender test1 implicit
4 0.1678368307523686 0.1541460741310489 team031 5 civil_status gender test1 implicit
8 0.1945992841774712 0.1506246755581603 sövereign 1 political_spectrum gender test1 implicit
8 0.1867422790987336 0.1390363162004441 twente-bms-nlp 2 political_spectrum gender test1 implicit
8 0.1861153373975238 0.1393779815541836 sbert_baseline 3 political_spectrum gender test1 implicit
8 0.186044488827547 0.1389492008070555 GESIS-DSM 4 political_spectrum gender test1 implicit
8 0.1701345417762519 0.1380034529038456 team031 5 education gender test1 implicit
16 0.2044708160928928 0.1169708662516098 sövereign 1 education gender test1 implicit
16 0.1953274585532708 0.1061332388430287 twente-bms-nlp 2 education gender test1 implicit
16 0.193697029870305 0.1058750763274714 GESIS-DSM 3 education gender test1 implicit
16 0.1936789912736826 0.1068908937580698 sbert_baseline 4 education gender test1 implicit
16 0.1739887825464037 0.1029195802371271 team031 5 education gender test1 implicit
20 0.2075851107416589 0.1076349985074671 sövereign 1 education gender test1 implicit
20 0.1993615280044656 0.0971124604070055 twente-bms-nlp 2 education gender test1 implicit
20 0.1974470597394484 0.0979820296831711 sbert_baseline 3 education gender test1 implicit
20 0.1972990058244904 0.0967778383028306 GESIS-DSM 4 education gender test1 implicit
20 0.1753115166608778 0.0927666777509362 team031 5 education gender test1 implicit
avg 0.1987624129372968 0.1353659924912902 sövereign 1 education gender test1 implicit
avg 0.1904385534577671 0.1241167531718566 twente-bms-nlp 2 education gender test1 implicit
avg 0.1893672162755941 0.1239163946100512 GESIS-DSM 3 education gender test1 implicit
avg 0.1892605706519259 0.1248688147931 sbert_baseline 4 education gender test1 implicit
avg 0.1718179179339755 0.1219589462557394 team031 5 education gender test1 implicit
4 0.8596691448400074 0.1435006656471971 twente-bms-nlp 1 gender Expanded Welfare State test2 baseline
4 0.8439015332463649 0.1514892083342836 turiya 2 gender Expanded Welfare State test2 baseline
4 0.8150526930285257 0.1539618236581867 boulderNLP 3 gender Expanded Welfare State test2 baseline
4 0.8116776686203651 0.1644556389412739 sövereign 4 gender Expanded Welfare State test2 baseline
4 0.7914032818077227 0.1486246850822561 sbert_baseline 5 gender Enhanced Environmental Protection test2 baseline
4 0.7914032818077227 0.1486246850822561 GESIS-DSM 6 gender Enhanced Environmental Protection test2 baseline
4 0.742380242951213 0.1518197265065421 team031 7 gender Expanded Welfare State test2 baseline
4 0.7002734965293811 0.1555105770050212 bm25_baseline 8 gender Expanded Welfare State test2 baseline
8 0.8681035217911394 0.1288537766567586 twente-bms-nlp 1 civil_status Expanded Welfare State test2 baseline
8 0.8543248675626207 0.1357647713122422 turiya 2 civil_status Expanded Welfare State test2 baseline
8 0.8198267217302753 0.1495308767256653 sövereign 3 civil_status Expanded Welfare State test2 baseline
8 0.8190315478761963 0.1383028778057128 boulderNLP 4 civil_status Expanded Welfare State test2 baseline
8 0.792509114975895 0.1320143736633611 sbert_baseline 5 civil_status Enhanced Environmental Protection test2 baseline
8 0.792507734956548 0.1320143736633611 GESIS-DSM 6 civil_status Enhanced Environmental Protection test2 baseline
8 0.7468215781005241 0.1351293292872165 team031 7 civil_status Expanded Welfare State test2 baseline
8 0.7037460286662772 0.1383383132480902 bm25_baseline 8 civil_status Expanded Welfare State test2 baseline
16 0.8746004728307052 0.0983103476562093 twente-bms-nlp 1 education Expanded Welfare State test2 baseline
16 0.860004904444542 0.1032616457482554 turiya 2 education Expanded Welfare State test2 baseline
16 0.8353194789929267 0.1184975195317492 sövereign 3 education Expanded Welfare State test2 baseline
16 0.825617365881163 0.1056594539569529 boulderNLP 4 education Expanded Welfare State test2 baseline
16 0.7925623055675046 0.0990963527045277 sbert_baseline 5 education Enhanced Environmental Protection test2 baseline
16 0.7925226856312766 0.0990917075203069 GESIS-DSM 6 education Enhanced Environmental Protection test2 baseline
16 0.7577701557590385 0.10090614928308 team031 7 education Expanded Welfare State test2 baseline
16 0.6846931483708592 0.1022053665741982 bm25_baseline 8 education Expanded Welfare State test2 baseline
20 0.8792657028220507 0.0896169083360676 twente-bms-nlp 1 education Expanded Welfare State test2 baseline
20 0.8611917878841686 0.0940871105379612 turiya 2 education Expanded Welfare State test2 baseline
20 0.8393628295034755 0.1093269998474775 sövereign 3 education Expanded Welfare State test2 baseline
20 0.8284238464054676 0.0962846509830787 boulderNLP 4 education Expanded Welfare State test2 baseline
20 0.7970344209471913 0.0904525730200898 GESIS-DSM 5 education Enhanced Environmental Protection test2 baseline
20 0.7970037191362768 0.0904550778161378 sbert_baseline 6 education Enhanced Environmental Protection test2 baseline
20 0.7656635421934611 0.0915341509681246 team031 7 education Expanded Welfare State test2 baseline
20 0.670001414383212 0.0922582352817289 bm25_baseline 8 education Expanded Welfare State test2 baseline
avg 0.8704097105709756 0.1150704245740581 twente-bms-nlp 1 civil_status Expanded Welfare State test2 baseline
avg 0.854855773284424 0.1211506839831856 turiya 2 civil_status Expanded Welfare State test2 baseline
avg 0.8265466747117607 0.1354527587615414 sövereign 3 civil_status Expanded Welfare State test2 baseline
avg 0.8220313632978382 0.1235522016009827 boulderNLP 4 civil_status Expanded Welfare State test2 baseline
avg 0.7933696053718498 0.1175476223165706 sbert_baseline 5 civil_status Enhanced Environmental Protection test2 baseline
avg 0.7933670308356847 0.1175458348215034 GESIS-DSM 6 civil_status Enhanced Environmental Protection test2 baseline
avg 0.7531588797510592 0.1198473390112408 team031 7 education Expanded Welfare State test2 baseline
avg 0.6896785219874324 0.1220781230272596 bm25_baseline 8 civil_status Expanded Welfare State test2 baseline
4 0.7929190001856363 0.1899376264586104 sövereign 1 civil_status Expanded Welfare State test2 explicit
4 0.7784182051472814 0.1889393159352049 twente-bms-nlp 2 civil_status Expanded Welfare State test2 explicit
4 0.7205589695235514 0.1867786313250642 GESIS-DSM 3 civil_status Expanded Welfare State test2 explicit
4 0.6860395691066639 0.1843875930866665 turiya 4 civil_status Expanded Welfare State test2 explicit
4 0.1427502082247471 0.1546644515612977 sbert_baseline 5 civil_status Expanded Welfare State test2 explicit
4 0.1211223870063154 0.1517016327007973 team031 6 residence Expanded Welfare State test2 explicit
8 0.7979022943300329 0.1777119422929806 sövereign 1 political_spectrum Expanded Welfare State test2 explicit
8 0.7746522830610651 0.176547534384226 twente-bms-nlp 2 political_spectrum Expanded Welfare State test2 explicit
8 0.7077334881305005 0.1745630961227365 GESIS-DSM 3 political_spectrum Expanded Welfare State test2 explicit
8 0.6722091045460789 0.1718995120510371 turiya 4 political_spectrum Expanded Welfare State test2 explicit
8 0.1411414998589877 0.1388524011251983 sbert_baseline 5 political_spectrum Expanded Welfare State test2 explicit
8 0.1229367115305565 0.1350837375198916 team031 6 political_spectrum Expanded Welfare State test2 explicit
16 0.7938226928593625 0.1519399235810757 sövereign 1 education Expanded Welfare State test2 explicit
16 0.7679247369615424 0.1511405244524261 twente-bms-nlp 2 education Expanded Welfare State test2 explicit
16 0.6914898029416457 0.1490039833580322 GESIS-DSM 3 political_spectrum stance test2 explicit
16 0.6509426544799443 0.1467837377412502 turiya 4 political_spectrum Expanded Welfare State test2 explicit
16 0.141853618763209 0.1060758328469115 sbert_baseline 5 political_spectrum Expanded Welfare State test2 explicit
16 0.1274992112929576 0.1009351873862183 team031 6 political_spectrum age test2 explicit
20 0.7914850377926509 0.1446822242273996 sövereign 1 political_spectrum Expanded Welfare State test2 explicit
20 0.7649914048158978 0.1441680606758572 twente-bms-nlp 2 education Expanded Welfare State test2 explicit
20 0.6859634979815318 0.1418695271609526 GESIS-DSM 3 political_spectrum stance test2 explicit
20 0.6440424751343595 0.1400439556270465 turiya 4 political_spectrum Expanded Welfare State test2 explicit
20 0.1424850706547802 0.0967405350877738 sbert_baseline 5 political_spectrum Expanded Welfare State test2 explicit
20 0.1300528050458294 0.0915591547872168 team031 6 political_spectrum age test2 explicit
avg 0.7940322562919206 0.1660679291400165 sövereign 1 civil_status Expanded Welfare State test2 explicit
avg 0.7714966574964467 0.1651988588619285 twente-bms-nlp 2 civil_status Expanded Welfare State test2 explicit
avg 0.7014364396443074 0.1630538094916963 GESIS-DSM 3 civil_status Expanded Welfare State test2 explicit
avg 0.6633084508167616 0.1607786996265 turiya 4 political_spectrum Expanded Welfare State test2 explicit
avg 0.142057599375431 0.1240833051552953 sbert_baseline 5 civil_status Expanded Welfare State test2 explicit
avg 0.1254027787189147 0.119819928098531 team031 6 political_spectrum Expanded Welfare State test2 explicit
4 0.139705387432964 0.1528616531800585 twente-bms-nlp 1 civil_status Expanded Welfare State test2 implicit
4 0.1308316585212354 0.152955124174716 sbert_baseline 2 civil_status Enhanced Environmental Protection test2 implicit
4 0.129996724876443 0.1495210966741525 GESIS-DSM 3 civil_status Enhanced Environmental Protection test2 implicit
4 0.1261766066723165 0.1522607920091116 sövereign 4 civil_status Expanded Welfare State test2 implicit
4 0.1211223870063154 0.1517016327007973 team031 5 residence Expanded Welfare State test2 implicit
8 0.1416882698296257 0.1360492441914399 twente-bms-nlp 1 civil_status Expanded Welfare State test2 implicit
8 0.1304867250898901 0.1372173655384356 sbert_baseline 2 civil_status Enhanced Environmental Protection test2 implicit
8 0.1301752198826576 0.1329377136336857 GESIS-DSM 3 civil_status Enhanced Environmental Protection test2 implicit
8 0.1296490126302001 0.1384695655926301 sövereign 4 civil_status Expanded Welfare State test2 implicit
8 0.1229367115305565 0.1350837375198916 team031 5 political_spectrum Expanded Welfare State test2 implicit
16 0.1438225542489765 0.1025865368387708 twente-bms-nlp 1 civil_status Liberal Economic Policy test2 implicit
16 0.1342452062627152 0.1094855428326242 sövereign 2 political_spectrum Liberal Economic Policy test2 implicit
16 0.1326183205722217 0.0999199234579163 GESIS-DSM 3 civil_status Liberal Economic Policy test2 implicit
16 0.1309450108709812 0.103861235235479 sbert_baseline 4 civil_status Enhanced Environmental Protection test2 implicit
16 0.1274992112929576 0.1009351873862183 team031 5 political_spectrum age test2 implicit
20 0.145637931583728 0.0938155982677546 twente-bms-nlp 1 civil_status Liberal Economic Policy test2 implicit
20 0.1360512624569728 0.1013573502040684 sövereign 2 political_spectrum Liberal Economic Policy test2 implicit
20 0.1344431683787441 0.0912305045109255 GESIS-DSM 3 civil_status Liberal Economic Policy test2 implicit
20 0.1314880488736023 0.0946881388644727 sbert_baseline 4 civil_status Enhanced Environmental Protection test2 implicit
20 0.1300528050458294 0.0915591547872168 team031 5 political_spectrum age test2 implicit
avg 0.1427135357738235 0.1213282581195059 twente-bms-nlp 1 civil_status Expanded Welfare State test2 implicit
avg 0.1318083584275166 0.11840230956917 GESIS-DSM 2 civil_status Liberal Economic Policy test2 implicit
avg 0.1315305220055511 0.1253933126596085 sövereign 3 civil_status Expanded Welfare State test2 implicit
avg 0.1309378608389272 0.1221804659532758 sbert_baseline 4 civil_status Enhanced Environmental Protection test2 implicit
avg 0.1254027787189147 0.119819928098531 team031 5 political_spectrum Expanded Welfare State test2 implicit
4 0.838573720660759 0.245924222883813 twente-bms-nlp 1 education Open Foreign Policy test3 baseline
4 0.7062034091878469 0.1981405921630443 turiya 2 education Enhanced Environmental Protection test3 baseline
4 0.668843694725362 0.2099744898227741 boulderNLP 3 education Enhanced Environmental Protection test3 baseline
4 0.5958230137740126 0.1867553722037808 sövereign 4 education Enhanced Environmental Protection test3 baseline
4 0.5842710757331174 0.1843420948958528 sbert_baseline 5 education Enhanced Environmental Protection test3 baseline
4 0.5807451792475774 0.1819401437218415 GESIS-DSM 6 education Enhanced Environmental Protection test3 baseline
4 0.5436666519490195 0.1986322397092366 team031 7 civil_status Enhanced Environmental Protection test3 baseline
4 0.3173606982686766 0.1863613314643273 bm25_baseline 8 education Enhanced Environmental Protection test3 baseline
8 0.884603239340912 0.2300043782965247 twente-bms-nlp 1 education Enhanced Environmental Protection test3 baseline
8 0.7414192287778192 0.1823366260899847 turiya 2 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
8 0.7132469494696095 0.194039519977546 boulderNLP 3 education Enhanced Environmental Protection test3 baseline
8 0.6218871176004452 0.1691557934866204 sbert_baseline 4 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
8 0.6215732813548899 0.1671033644716565 GESIS-DSM 5 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
8 0.6033834251372624 0.1746287777699152 sövereign 6 education Enhanced Environmental Protection test3 baseline
8 0.5586184745404498 0.181328073516437 team031 7 age Enhanced Environmental Protection test3 baseline
8 0.3389441491101829 0.1683568504455137 bm25_baseline 8 stance Enhanced Environmental Protection test3 baseline
16 0.9006475848662722 0.1938279483630162 twente-bms-nlp 1 education Enhanced Environmental Protection test3 baseline
16 0.719450717369105 0.1588100607377114 boulderNLP 2 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
16 0.7084268640623587 0.1502352699010248 turiya 3 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
16 0.5959901639914733 0.1457405183765326 sövereign 4 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
16 0.5895481594110684 0.1346325594523309 sbert_baseline 5 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
16 0.5893366282285519 0.1333701743091621 GESIS-DSM 6 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
16 0.547826537851428 0.145647288202764 team031 7 gender Enhanced Environmental Protection test3 baseline
16 0.3586846476945486 0.1314771732715052 bm25_baseline 8 civil_status Enhanced Environmental Protection test3 baseline
20 0.8960997242257639 0.1833046610255597 twente-bms-nlp 1 education Enhanced Environmental Protection test3 baseline
20 0.7107502438947255 0.1487424492145065 boulderNLP 2 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
20 0.6906575872768002 0.1404262919395938 turiya 3 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
20 0.586181668663252 0.1367752280479396 sövereign 4 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
20 0.5767143070819051 0.124776552690353 sbert_baseline 5 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
20 0.5766504736775157 0.1237109402251022 GESIS-DSM 6 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
20 0.5517461538184697 0.135267172769409 team031 7 gender Enhanced Environmental Protection test3 baseline
20 0.3537741480402498 0.1210055081175744 bm25_baseline 8 civil_status Enhanced Environmental Protection test3 baseline
avg 0.8799810672734267 0.2132653026422284 twente-bms-nlp 1 education Enhanced Environmental Protection test3 baseline
avg 0.7116767723262063 0.1677846950234119 turiya 2 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
avg 0.7030729013647006 0.1778916299381345 boulderNLP 3 education Enhanced Environmental Protection test3 baseline
avg 0.5953445678915001 0.160974974099542 sövereign 4 education Enhanced Environmental Protection test3 baseline
avg 0.5931051649566341 0.1532267501312892 sbert_baseline 5 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
avg 0.5920763906271337 0.1515311556819406 GESIS-DSM 6 Restrictive Immigration Policy Enhanced Environmental Protection test3 baseline
avg 0.5504644545398417 0.1652186935494616 team031 7 age Enhanced Environmental Protection test3 baseline
avg 0.3421909107784145 0.1518002158247301 bm25_baseline 8 education Enhanced Environmental Protection test3 baseline
4 0.7996306387079166 0.269793699300543 twente-bms-nlp 1 education Law & Order test3 explicit
4 0.6916271116432543 0.2409603804524061 turiya 2 stance Liberal Economic Policy test3 explicit
4 0.6873816606887692 0.2414749213689018 sövereign 3 age Liberal Economic Policy test3 explicit
4 0.6544835515192873 0.2388165755249987 GESIS-DSM 4 education Liberal Economic Policy test3 explicit
4 0.374824204910662 0.1945566043271698 sbert_baseline 5 stance political_spectrum test3 explicit
4 0.3643597175216282 0.1968215666976877 team031 6 stance residence test3 explicit
8 0.7940492522789729 0.2625567977951125 twente-bms-nlp 1 education Law & Order test3 explicit
8 0.6786006312566784 0.2310641790901396 sövereign 2 age Liberal Economic Policy test3 explicit
8 0.6663649758103218 0.2278067762664941 turiya 3 stance Expanded Welfare State test3 explicit
8 0.6347465803306451 0.2278933963337442 GESIS-DSM 4 age Liberal Economic Policy test3 explicit
8 0.3876115408337925 0.1790408765818324 sbert_baseline 5 stance political_spectrum test3 explicit
8 0.3827839799050314 0.1825351890017625 team031 6 stance political_spectrum test3 explicit
16 0.7898573099009154 0.2482207784164836 twente-bms-nlp 1 education Law & Order test3 explicit
16 0.670160293045864 0.2085165888647449 sövereign 2 stance Liberal Economic Policy test3 explicit
16 0.641645063755123 0.1979783046324642 turiya 3 stance Expanded Welfare State test3 explicit
16 0.6147447558167222 0.2048740258050405 GESIS-DSM 4 gender education test3 explicit
16 0.4185064486233371 0.1507247976846235 team031 5 stance political_spectrum test3 explicit
16 0.4099401957592709 0.1444574677927211 sbert_baseline 6 stance political_spectrum test3 explicit
20 0.7897402318053337 0.2441724769627611 twente-bms-nlp 1 education Law & Order test3 explicit
20 0.6651831469115177 0.2016538145874079 sövereign 2 stance Liberal Economic Policy test3 explicit
20 0.6358453902909823 0.1889384022334226 turiya 3 stance age test3 explicit
20 0.6100692803760382 0.1982938659682189 GESIS-DSM 4 gender education test3 explicit
20 0.4388359116306663 0.1401321376837655 team031 5 stance education test3 explicit
20 0.4257042513604356 0.1340496468113222 sbert_baseline 6 stance political_spectrum test3 explicit
avg 0.7933193581732846 0.256185938118725 twente-bms-nlp 1 education Law & Order test3 explicit
avg 0.6753314329757074 0.2206773759777985 sövereign 2 stance Liberal Economic Policy test3 explicit
avg 0.6588706353749204 0.2139209658961967 turiya 3 stance Expanded Welfare State test3 explicit
avg 0.6285110420106732 0.2174694659080006 GESIS-DSM 4 gender Liberal Economic Policy test3 explicit
avg 0.4011215144201658 0.1675534227669598 team031 5 stance political_spectrum test3 explicit
avg 0.3995200482160402 0.1630261488782614 sbert_baseline 6 stance political_spectrum test3 explicit
4 0.5755541372802974 0.2148056473370537 twente-bms-nlp 1 stance political_spectrum test3 implicit
4 0.4201438874604862 0.1897275909642647 GESIS-DSM 2 stance political_spectrum test3 implicit
4 0.3845465571284123 0.1889371223684448 sövereign 3 stance residence test3 implicit
4 0.3673911690294011 0.1900036696005424 sbert_baseline 4 stance political_spectrum test3 implicit
4 0.3643597175216282 0.1968215666976877 team031 5 stance residence test3 implicit
8 0.6103716977626453 0.2016624842908093 twente-bms-nlp 1 stance political_spectrum test3 implicit
8 0.4446600935366286 0.1736534408164564 GESIS-DSM 2 stance political_spectrum test3 implicit
8 0.4048372964665679 0.1737114535443616 sövereign 3 stance political_spectrum test3 implicit
8 0.3899244680065674 0.1738337241513623 sbert_baseline 4 stance political_spectrum test3 implicit
8 0.3827839799050314 0.1825351890017625 team031 5 stance political_spectrum test3 implicit
16 0.6645327220194603 0.1738079614532192 twente-bms-nlp 1 stance political_spectrum test3 implicit
16 0.4686411687348337 0.1382828642007774 GESIS-DSM 2 stance political_spectrum test3 implicit
16 0.4455751480354461 0.1437362823383521 sövereign 3 stance political_spectrum test3 implicit
16 0.4185064486233371 0.1507247976846235 team031 4 stance political_spectrum test3 implicit
16 0.4097824464274244 0.1385120668628046 sbert_baseline 5 stance political_spectrum test3 implicit
20 0.6940811563261714 0.1655064601759661 twente-bms-nlp 1 stance political_spectrum test3 implicit
20 0.4814478114960005 0.1281961532665239 GESIS-DSM 2 stance political_spectrum test3 implicit
20 0.4654077766944569 0.1347446335565081 sövereign 3 stance political_spectrum test3 implicit
20 0.4388359116306663 0.1401321376837655 team031 4 stance education test3 implicit
20 0.4219442250692948 0.1277438130032451 sbert_baseline 5 stance political_spectrum test3 implicit
avg 0.6361349283471436 0.1889456383142621 twente-bms-nlp 1 stance political_spectrum test3 implicit
avg 0.4537232403069873 0.1574650123120056 GESIS-DSM 2 stance political_spectrum test3 implicit
avg 0.4250916945812208 0.1602823729519166 sövereign 3 stance political_spectrum test3 implicit
avg 0.4011215144201658 0.1675534227669598 team031 4 stance political_spectrum test3 implicit
avg 0.3972605771331719 0.1575233184044886 sbert_baseline 5 stance political_spectrum test3 implicit

Organizing Committee

References

Policy

We abide by the ACL anti-harassment policy.