Argument Mining Shared Task 2024

Latest News:

Last chance for participation: Deadline for the final submission is on the 17th of May, 11.59 pm UTC -12h (“anywhere on Earth”). Final results will be posted soon after submission deadline. Submission Link for the System Description Paper: OpenReview

Registration and more:

Registration

You can register for the shared task using this registration form

Data Download

You can download the data and evaluation script from this repository

Communication

Join our slack channel for any information or to ask questions here

Or write us an e-mail perspectiveargretrieval@gmail.com

About

The "Perspective Argument Retrieval" task addresses the often-overlooked challenge of incorporating socio-cultural factors (such as political views, occupation, age, gender) in argument retrieval. By focusing on these aspects, we acknowledge their potential latent influence on argumentation.

With this shared task, we invite the community to develop methods that concentrate on this crucial area and advance state-of-the-art retrieval models by considering the perspective of societal diversity.

Task Description

Argument retrieval is the task of retrieving a set of top-k relevant arguments out of a corpus given a specific query. With this shared task, we formulate perspective argument retrieval as an expansion of argument retrieval considering sociocultural factors. Concretely, this task proposes three scenarios of varying difficulties to considering socio-cultural profiles. Therefore, we want to foster approaches taking into account latent aspects of argumentation beyond semantic features, such as personal attitude. We consider these aspects both during retrieval and evaluation.

Retrieval Scenarios

Scenario 1: Baseline retrieval of relevant arguments given a specific query from a given corpus. Therefore, we evaluate the general abilities of a system to retrieve relevant arguments.

Example query: Are you in favor of the introduction of a tax on foods containing sugar (sugar tax)?

Example relevant candidates: The reduction of sugar in food should be pushed. Not every food needs additional sugar as a supplement.

Scenario 2: Explicit perspectivism extends the baseline task by explicitly adding socio-cultural information to the query and the corpus and limiting the relevant candidates to arguments from authors matching the corresponding socio-cultural background. With this second scenario, we test whether a retrieval system can consider socio-cultural properties when explicitly mentioned in the query and the candidates.

Example query: Given a left attitude, are you in favor of the introduction of a tax on foods containing sugar (sugar tax)?

Example relevant candidates: With a left attitude, reducing sugar in food should be pushed. Not every food needs additional sugar as a supplement.

Scenario 3: Implicit perspectivism This scenario is similar to explicit perspectivism, but we only add socio-cultural information to the query. Therefore, we test the ability of a retrieval system to account for latently encoded socio-cultural information within the argument.

Example query: Given a liberal attitude, are you in favor of the introduction of a tax on foods containing sugar (sugar tax)?

Example relevant candidates: Eating is an individual decision. It doesn't need a nanny state.

Retrieval Evaluation

We employ for all three scenarios a two-folded evaluation for a comprehensive measure of the retrieval quality. Concretely, we distinguish between relevance and diversity / fairness:

With relevance, we focus on the ability of a retrieval system to select the relevant candidates, for example, all arguments addressing the queried issue for the baseline scenario or arguments that additionally match specific demographic properties for explicit or implicit perspectivism.

Using diversity, we account for the influence of perspectivism in the evaluation by measuring to what extend a retrieval system retrieval system diversifies the relevant arguments regarding stance distribution or other socio-cultural factors, such as age or education.

Data

The data and evaluation script can be downloaded from this repository

This shared task is grounded on the x-stance dataset (Vamvas & Sennrich, 2020), providing arguments annotated with their stance regarding different political issues gathered from the voting recommendation platform https://www.smartvote.ch/. This platform provides voting suggestions based on a questionnaire that politicians and voters fill out. Therein, politicians can argue why they are in favor or against specific political issues.

We use the arguments covering the 2019 Swiss Federal elections as a corpus and the political issues as queries. Afterward, we enrich these arguments with eight socio-cultural properties, either provided by the voting platform itself (gender, age, party, …) or derived from the filled-out questionnaire of the politicians (political attitude, important political aspects, …). This collection encapsulates 26,335 arguments for 45 political aspects from German, French, and Italian.

We generate the train and development splits by considering 35 political aspects for training and 10 for development, while the argument corpus is used for both sets. Apart from the queries for the baseline scenario, we will also provide queries for the perspectivism scenarios, including socio-cultural information. As the x-stance dataset is publicly available, final evaluation data consist of secret test sets.

Submission Policy

You may use any external data source for pre-training your models. However, we do not accept submissions using proprietary LLMs (e.g., GPT-4). Please do not input any of the data into the Chat-GPT online interface to avoid data leakage. You are allowed to use any open-source LLMs. Have a look at this website for a list of frequently used open-source LLMs in case you want to use those.

There are three test sets for the evaluation of the shared task. The first test set is taken from the election of 2019, the second from the year 2023 and the third is a suprise test set. The final evaluation will be on the 8th of May. You can submit all final predictions until the 7th of May, 11.59 pm UTC -12h (“anywhere on Earth”). If you upload predictions before you will see the results of those predictions on the leaderboard (once on the 24th of April and once on the 30th of April). You can change the predictions for all test sets, all scenarios, until the final deadline. You can also submit partial results, however keep in mind that for the final ranking, all predictions will be considered. (average across all test sets and scenarios).

Important Dates

Release Training Data: 4th March
Release Test Set 1: 17 March
First Evaluation Cycle (Test set 1): 24 April
Second Evaluation Cycle (Test set 2): 2 May
Third Evaluation Cycle (Surprise Test set): 13 May
Submission Final Run: 17 May
Evaluation Final Run: 18 May
Submission System Description Paper: 27 May
Camera-ready papers due: 1st July
Workshop / Shared Task: 15th of August 2024

All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”).

System Description Papers

Any system that has submitted runs, including partial ones, is eligible to submit a system description paper. Upon acceptance, the paper will undergo peer review and be published in the Proceedings of the Argument Mining Workshop at ACL. Note that registration for the workshop is not required for publication. However, if you do register, you will have the opportunity to present your paper at the workshop, either in person or virtually. For guidance on the content of a system description paper, please refer to the SemEval Shared Tasks guidelines: https://semeval.github.io/paper-requirements.html (see "contents").

Page Limitation

Papers submitted for the Shared Task should not exceed 4 pages, plus a maximum of 1 page for Ethical Considerations, and unlimited pages for references and Appendices. Upon acceptance, authors will be granted an additional page to address reviewers' comments.

Reviewing

We are using the platform OpenReview for the submission and reviewing process. Please register to the platform if you have no account there yet. If you are submitting a paper you are also asked to review other shared task papers. If you have an account already, please send an e-mail with your e-mail that is registered at OpenReview, such that we can already register you as a reviewer. Submission Link for the System Description Paper: OpenReview

Leaderboards

First Evaluation Circle (Test set 1, election 2019)

Relevance - Scenario Baseline

k	ndcg@k	precision@k	team	team_rank
4	0.990189	0.988889	sbert_baseline	1
4	0.990189	0.988889	GESIS-DSM	1
4	0.982115	0.983333	Twente-BMS-NLP	2
4	0.716184	0.683333	bm25_baseline	3
8	0.987593	0.986111	sbert_baseline	1
8	0.987593	0.986111	GESIS-DSM	1
8	0.986639	0.988889	Twente-BMS-NLP	2
8	0.671677	0.636111	bm25_baseline	3
16	0.988446	0.990278	Twente-BMS-NLP	1
16	0.983255	0.980556	sbert_baseline	2
16	0.983255	0.980556	GESIS-DSM	2
16	0.619426	0.579167	bm25_baseline	3
20	0.988492	0.990000	Twente-BMS-NLP	1
20	0.981093	0.977778	sbert_baseline	2
20	0.981093	0.977778	GESIS-DSM	2
20	0.596877	0.553333	bm25_baseline	3
avg	0.986423	0.988125	Twente-BMS-NLP	1
avg	0.985532	0.983333	sbert_baseline	2
avg	0.985532	0.983333	GESIS-DSM	2
avg	0.651041	0.612986	bm25_baseline	3

Diversity - Scenario Baseline

k	αNDCG@k	klDiv@k	team	team_rank	socioVar(lowest_α)	socioVar(highest_α)
4	0.901997	0.154536	GESIS-DSM	1	political_spectrum	Open Foreign Policy
4	0.901690	0.155147	sbert_baseline	2	political_spectrum	Open Foreign Policy
4	0.880867	0.174923	Twente-BMS-NLP	3	political_spectrum	Open Foreign Policy
4	0.672682	0.152301	bm25_baseline	4	political_spectrum	gender
8	0.908962	0.139420	GESIS-DSM	1	political_spectrum	gender
8	0.908762	0.139904	sbert_baseline	2	political_spectrum	gender
8	0.899820	0.158746	Twente-BMS-NLP	3	political_spectrum	gender
8	0.643187	0.136052	bm25_baseline	4	political_spectrum	gender
16	0.924070	0.106170	GESIS-DSM	1	education	gender
16	0.923998	0.106429	sbert_baseline	2	education	gender
16	0.923574	0.123654	Twente-BMS-NLP	3	education	gender
16	0.608648	0.102753	bm25_baseline	4	education	gender
20	0.931639	0.113165	Twente-BMS-NLP	1	education	gender
20	0.929635	0.097030	GESIS-DSM	2	education	gender
20	0.929557	0.097267	sbert_baseline	3	education	gender
20	0.592760	0.093548	bm25_baseline	4	education	gender
avg	0.916166	0.124289	GESIS-DSM	1	education	gender
avg	0.916002	0.124687	sbert_baseline	2	education	gender
avg	0.908975	0.142622	Twente-BMS-NLP	3	political_spectrum	Open Foreign Policy
avg	0.629319	0.121164	bm25_baseline	4	education	gender

socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Explicit

k	ndcg@k	precision@k	team	team_rank
4	0.853129	0.784139	Twente-BMS-NLP	1
4	0.217063	0.216921	GESIS-DSM	2
4	0.210929	0.209182	sbert_baseline	3
8	0.826973	0.699215	Twente-BMS-NLP	1
8	0.220313	0.215066	GESIS-DSM	2
8	0.210107	0.206796	sbert_baseline	3
16	0.806304	0.613947	Twente-BMS-NLP	1
16	0.219608	0.206292	GESIS-DSM	2
16	0.211977	0.205126	sbert_baseline	3
20	0.798233	0.584441	Twente-BMS-NLP	1
20	0.220142	0.204156	GESIS-DSM	2
20	0.211292	0.201972	sbert_baseline	3
avg	0.821159	0.670436	Twente-BMS-NLP	1
avg	0.219281	0.210609	GESIS-DSM	2
avg	0.211076	0.205769	sbert_baseline	3

Diversity - Scenario Explicit

k	αNDCG@k	klDiv@k	team	team_rank	socioVar(lowest_α)	socioVar(highest_α)
4	0.803502	0.201835	Twente-BMS-NLP	1	age	Open Foreign Policy
4	0.198266	0.161001	GESIS-DSM	2	political_spectrum	Open Foreign Policy
4	0.193978	0.169943	sbert_baseline	3	political_spectrum	gender
8	0.795288	0.188889	Twente-BMS-NLP	1	political_spectrum	Open Foreign Policy
8	0.203975	0.145292	GESIS-DSM	2	political_spectrum	gender
8	0.196950	0.153803	sbert_baseline	3	political_spectrum	gender
16	0.788913	0.161965	Twente-BMS-NLP	1	education	gender
16	0.207447	0.112282	GESIS-DSM	2	education	gender
16	0.202131	0.120015	sbert_baseline	3	education	gender
20	0.784744	0.154346	Twente-BMS-NLP	1	education	gender
20	0.208944	0.103003	GESIS-DSM	2	education	gender
20	0.202647	0.110547	sbert_baseline	3	education	gender
avg	0.793112	0.176759	Twente-BMS-NLP	1	political_spectrum	Open Foreign Policy
avg	0.204658	0.130394	GESIS-DSM	2	political_spectrum	gender
avg	0.198927	0.138577	sbert_baseline	3	education	gender

socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Implicit

k	ndcg@k	precision@k	team	team_rank
4	0.197831	0.198685	GESIS-DSM	1
4	0.195035	0.196141	sbert_baseline	2
8	0.199804	0.200594	sbert_baseline	1
8	0.198524	0.197360	GESIS-DSM	2
16	0.205488	0.201150	GESIS-DSM	1
16	0.205072	0.202502	sbert_baseline	2
20	0.208600	0.204114	sbert_baseline	1
20	0.207223	0.201124	GESIS-DSM	2
avg	0.202267	0.199580	GESIS-DSM	1
avg	0.202128	0.200838	sbert_baseline	2

Diversity - Scenario Implicit

k	αNDCG@k	klDiv@k	team	team_rank	socioVar(lowest_α)	socioVar(highest_α)
4	0.182819	0.160637	GESIS-DSM	1	political_spectrum	gender
4	0.179801	0.155224	sbert_baseline	2	political_spectrum	gender
8	0.186279	0.144410	GESIS-DSM	1	political_spectrum	gender
8	0.186115	0.139378	sbert_baseline	2	political_spectrum	gender
16	0.194678	0.110474	GESIS-DSM	1	education	gender
16	0.193679	0.106891	sbert_baseline	2	education	gender
20	0.197447	0.097982	sbert_baseline	1	education	gender
20	0.197116	0.100929	GESIS-DSM	2	education	gender
avg	0.190223	0.129112	GESIS-DSM	1	political_spectrum	gender
avg	0.189261	0.124869	sbert_baseline	2	education	gender

Second Evaluation Circle (Test set 2, election 2023)

Relevance - Scenario Baseline

	k	ndcg@k	precision@k	team
4	0.903527	0.912500	boulder_NLP	1
4	0.883713	0.887500	sbert_baseline	2
4	0.851954	0.850000	GESIS-DSM	3
4	0.775601	0.775000	bm_35_baseline	4
4	0.768397	0.781250	TWENTE-BMS-NLP	5
8	0.892774	0.893750	boulder_NLP	1
8	0.863010	0.856250	sbert_baseline	2
8	0.840808	0.834375	GESIS-DSM	3
8	0.768952	0.775000	TWENTE-BMS-NLP	4
8	0.764368	0.759375	bm_35_baseline	5
16	0.874400	0.867188	boulder_NLP	1
16	0.837641	0.823438	sbert_baseline	2
16	0.811359	0.795312	GESIS-DSM	3
16	0.755486	0.751563	TWENTE-BMS-NLP	4
16	0.717200	0.693750	bm_35_baseline	5
20	0.869493	0.861250	boulder_NLP	1
20	0.836021	0.823750	sbert_baseline	2
20	0.803148	0.786250	GESIS-DSM	3
20	0.755566	0.752500	TWENTE-BMS-NLP	4
20	0.692677	0.661250	bm_35_baseline	5
avg	0.885048	0.883672	boulder_NLP	1
avg	0.855096	0.847734	sbert_baseline	2
avg	0.826817	0.816484	GESIS-DSM	3
avg	0.762100	0.765078	TWENTE-BMS-NLP	4
avg	0.737462	0.722344	bm_35_baseline	5

Diversity - Scenario Baseline

k	αNDCG@k	klDiv@k	team	team_rank	socioVar(lowest_α)	socioVar(highest_α)
4	0.843193	0.190554	boulder_NLP	1	gender	Open Foreign Policy
4	0.825927	0.183359	sbert_baseline	2	gender	Liberal Society
4	0.802393	0.189029	GESIS-DSM	3	gender	Expanded Welfare State
4	0.729309	0.194654	bm_35_baseline	4	gender	Expanded Welfare State
4	0.715027	0.175067	TWENTE-BMS-NLP	5	gender	Enhanced Environmental Protection
8	0.835625	0.184543	boulder_NLP	1	civil_status	Open Foreign Policy
8	0.810864	0.175428	sbert_baseline	2	civil_status	Liberal Society
8	0.793244	0.181590	GESIS-DSM	3	civil_status	Expanded Welfare State
8	0.720015	0.186390	bm_35_baseline	4	civil_status	Expanded Welfare State
8	0.717539	0.167862	TWENTE-BMS-NLP	5	civil_status	Enhanced Environmental Protection
16	0.835437	0.170560	boulder_NLP	1	education	Open Foreign Policy
16	0.803294	0.158082	sbert_baseline	2	education	Liberal Society
16	0.780678	0.163216	GESIS-DSM	3	education	Expanded Welfare State
16	0.719560	0.150388	TWENTE-BMS-NLP	4	education	Enhanced Environmental Protection
16	0.693180	0.163362	bm_35_baseline	5	education	Expanded Welfare State
20	0.836555	0.166120	boulder_NLP	1	education	Open Foreign Policy
20	0.806363	0.153383	sbert_baseline	2	education	Liberal Society
20	0.778455	0.157177	GESIS-DSM	3	education	Expanded Welfare State
20	0.724297	0.144618	TWENTE-BMS-NLP	4	education	Enhanced Environmental Protection
20	0.676614	0.154637	bm_35_baseline	5	education	Expanded Welfare State
avg	0.837703	0.177944	boulder_NLP	1	civil_status	Open Foreign Policy
avg	0.811612	0.167563	sbert_baseline	2	civil_status	Liberal Society
avg	0.788693	0.172753	GESIS-DSM	3	civil_status	Enhanced Environmental Protection
avg	0.719106	0.159484	TWENTE-BMS-NLP	4	civil_status	Enhanced Environmental Protection
avg	0.704779	0.174761	bm_35_baseline	5	civil_status	Enhanced Environmental Protection

Relevance - Scenario Explicit

k	ndcg@k	precision@k	team	team_rank
4	0.778944	0.693322	sövereign	1
4	0.746718	0.664001	TWENTE-BMS-NLP	2
4	0.719577	0.634820	GESIS-DSM	3
4	0.152074	0.149411	sbert_baseline	4
8	0.753616	0.601010	sövereign	1
8	0.717243	0.567691	TWENTE-BMS-NLP	2
8	0.688722	0.539913	GESIS-DSM	3
8	0.147092	0.140853	sbert_baseline	4
16	0.723055	0.494353	sövereign	1
16	0.688206	0.466716	TWENTE-BMS-NLP	2
16	0.661889	0.445391	GESIS-DSM	3
16	0.145617	0.134961	sbert_baseline	4
20	0.713903	0.457632	sövereign	1
20	0.680463	0.433698	TWENTE-BMS-NLP	2
20	0.654631	0.413215	GESIS-DSM	3
20	0.145981	0.133109	sbert_baseline	4
avg	0.742379	0.561579	sövereign	1
avg	0.708157	0.533026	TWENTE-BMS-NLP	2
avg	0.681205	0.508335	GESIS-DSM	3
avg	0.147691	0.139583	sbert_baseline	4

Diversity - Scenario Explicit

k	αNDCG@k	klDiv@k	team	team_rank	socioVar(lowest_α)	socioVar(highest_α)
4	0.762587	0.204173	sövereign	1	civil_status	Enhanced Environmental Protection
4	0.731383	0.201999	TWENTE-BMS-NLP	2	civil_status	Enhanced Environmental Protection
4	0.708314	0.201885	GESIS-DSM	3	civil_status	Enhanced Environmental Protection
4	0.148394	0.188384	sbert_baseline	4	civil_status	Open Foreign Policy
8	0.742925	0.195974	sövereign	1	political_spectrum	Enhanced Environmental Protection
8	0.707982	0.193850	TWENTE-BMS-NLP	2	political_spectrum	Enhanced Environmental Protection
8	0.683388	0.193435	GESIS-DSM	3	political_spectrum	Enhanced Environmental Protection
8	0.144145	0.180716	sbert_baseline	4	political_spectrum	Open Foreign Policy
16	0.720318	0.176849	sövereign	1	political_spectrum	Enhanced Environmental Protection
16	0.685955	0.174981	TWENTE-BMS-NLP	2	education	Enhanced Environmental Protection
16	0.662275	0.174012	GESIS-DSM	3	education	Enhanced Environmental Protection
16	0.143560	0.163211	sbert_baseline	4	political_spectrum	Open Foreign Policy
20	0.712741	0.170729	sövereign	1	political_spectrum	Enhanced Environmental Protection
20	0.679465	0.169039	TWENTE-BMS-NLP	2	political_spectrum	Enhanced Environmental Protection
20	0.655853	0.167952	GESIS-DSM	3	education	Enhanced Environmental Protection
20	0.143953	0.157795	sbert_baseline	4	political_spectrum	Open Foreign Policy
avg	0.734643	0.186931	sövereign	1	political_spectrum	Enhanced Environmental Protection
avg	0.701196	0.184967	TWENTE-BMS-NLP	2	civil_status	Enhanced Environmental Protection
avg	0.677457	0.184321	GESIS-DSM	3	civil_status	Enhanced Environmental Protection
avg	0.145013	0.172527	sbert_baseline	4	civil_status	Open Foreign Policy

socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Implicit

k	ndcg@k	precision@k	team	team_rank
4	0.138857	0.134961	sbert_baseline	1
4	0.135854	0.135943	TWENTE-BMS-NLP	2
4	0.133892	0.133418	GESIS-DSM	3
8	0.135941	0.131453	sbert_baseline	1
8	0.133384	0.131594	TWENTE-BMS-NLP	2
8	0.131639	0.127736	GESIS-DSM	3
16	0.134325	0.126578	sbert_baseline	1
16	0.134060	0.127666	TWENTE-BMS-NLP	2
16	0.131696	0.123422	GESIS-DSM	3
20	0.135666	0.127301	TWENTE-BMS-NLP	1
20	0.134688	0.124804	sbert_baseline	2
20	0.132429	0.122447	GESIS-DSM	3
avg	0.135953	0.129449	sbert_baseline	1
avg	0.134741	0.130626	TWENTE-BMS-NLP	2
avg	0.132414	0.126755	GESIS-DSM	3

Diversity - Scenario Implicit

k	αNDCG@k	klDiv@k	team	team_rank	socioVar(lowest_α)	socioVar(highest_α)
4	0.136044	0.187736	sbert_baseline	1	civil_status	Liberal Society
4	0.130388	0.187460	TWENTE-BMS-NLP	2	civil_status	Enhanced Environmental Protection
4	0.128952	0.191469	GESIS-DSM	3	civil_status	Expanded Welfare State
8	0.133246	0.179940	sbert_baseline	1	civil_status	Liberal Economic Policy
8	0.128856	0.180906	TWENTE-BMS-NLP	2	civil_status	Enhanced Environmental Protection
8	0.127276	0.183187	GESIS-DSM	3	civil_status	Expanded Welfare State
16	0.132503	0.162401	sbert_baseline	1	civil_status	Liberal Economic Policy
16	0.130391	0.166958	TWENTE-BMS-NLP	2	civil_status	Enhanced Environmental Protection
16	0.128243	0.164174	GESIS-DSM	3	civil_status	Expanded Welfare State
20	0.132824	0.156980	sbert_baseline	1	civil_status	Liberal Economic Policy
20	0.131905	0.163094	TWENTE-BMS-NLP	2	civil_status	Enhanced Environmental Protection
20	0.129197	0.158113	GESIS-DSM	3	civil_status	Expanded Welfare State
avg	0.133654	0.171764	sbert_baseline	1	civil_status	Liberal Economic Policy
avg	0.130385	0.174605	TWENTE-BMS-NLP	2	civil_status	Enhanced Environmental Protection
avg	0.128417	0.174236	GESIS-DSM	3	civil_status	Enhanced Environmental Protection

socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Third Evaluation Circle (Suprise Test Set, election 2023, User Study)

Relevance - Scenario Baseline

k	ndcg@k	precision@k	team	team_rank
4	0.751645	0.777778	TWENTE-BMS-NLP	1
4	0.669450	0.694444	GESIS-DSM	2
4	0.669450	0.694444	sbert_baseline	2
4	0.549329	0.546296	sövereign	3
4	0.358807	0.361111	bm25_baseline	4
8	0.729739	0.731481	TWENTE-BMS-NLP	1
8	0.676268	0.689815	GESIS-DSM	2
8	0.676268	0.689815	sbert_baseline	2
8	0.545443	0.541667	sövereign	3
8	0.366001	0.370370	bm25_baseline	4
16	0.706378	0.696759	TWENTE-BMS-NLP	1
16	0.609884	0.590278	GESIS-DSM	2
16	0.609884	0.590278	sbert_baseline	2
16	0.540876	0.537037	sövereign	3
16	0.378635	0.386574	bm25_baseline	4
20	0.690208	0.674074	TWENTE-BMS-NLP	1
20	0.590431	0.564815	GESIS-DSM	2
20	0.590431	0.564815	sbert_baseline	2
20	0.537909	0.533333	sövereign	3
20	0.369074	0.370370	bm25_baseline	4
avg	0.719492	0.720023	TWENTE-BMS-NLP	1
avg	0.636508	0.634838	GESIS-DSM	2
avg	0.636508	0.634838	sbert_baseline	2
avg	0.543389	0.539583	sövereign	3
avg	0.368129	0.372106	bm25_baseline	4

Diversity - Scenario Baseline

k	αNDCG@k	klDiv@k	team	team_rank	socioVar(lowest_α)	socioVar(highest_α)
4	0.692650	0.199453	TWENTE-BMS-NLP	1	education	Enhanced Environmental Protection
4	0.614806	0.193470	sbert_baseline	2	education	Liberal Society
4	0.614261	0.189161	GESIS-DSM	3	education	Liberal Society
4	0.514943	0.198622	sövereign	4	education	Expanded Welfare State
4	0.332209	0.207754	bm25_baseline	5	education	Law & Order
8	0.698982	0.191964	TWENTE-BMS-NLP	1	education	Enhanced Environmental Protection
8	0.642488	0.180104	GESIS-DSM	2	education	Liberal Society
8	0.642160	0.183726	sbert_baseline	3	age	Liberal Society
8	0.527252	0.187721	sövereign	4	education	Expanded Welfare State
8	0.348938	0.191614	bm25_baseline	5	stance	Law & Order
16	0.693610	0.171857	TWENTE-BMS-NLP	1	civil_status	Enhanced Environmental Protection
16	0.601084	0.155062	GESIS-DSM	2	stance	Liberal Society
16	0.600932	0.157257	sbert_baseline	3	age	Liberal Society
16	0.530549	0.165135	sövereign	4	education	Expanded Welfare State
16	0.365794	0.158984	bm25_baseline	5	civil_status	Law & Order
20	0.682328	0.165512	TWENTE-BMS-NLP	1	civil_status	Enhanced Environmental Protection
20	0.586336	0.147475	GESIS-DSM	2	stance	Liberal Society
20	0.586076	0.149359	sbert_baseline	3	age	Liberal Society
20	0.530000	0.158783	sövereign	4	education	Expanded Welfare State
20	0.359814	0.149078	bm25_baseline	5	civil_status	Law & Order
avg	0.691892	0.182196	TWENTE-BMS-NLP	1	education	Enhanced Environmental Protection
avg	0.611042	0.167951	GESIS-DSM	2	age	Liberal Society
avg	0.610994	0.170953	sbert_baseline	3	age	Liberal Society
avg	0.525686	0.177565	sövereign	4	education	Enhanced Environmental Protection
avg	0.351689	0.176857	bm25_baseline	5	education	Law & Order

socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Explicit

k	ndcg@k	precision@k	team	team_rank
4	0.640162	0.593964	GESIS-DSM	1
4	0.590100	0.590535	TWENTE-BMS-NLP	2
4	0.559131	0.526406	sövereign	3
4	0.381226	0.378258	sbert_baseline	4
8	0.619106	0.582476	TWENTE-BMS-NLP	1
8	0.603002	0.487311	GESIS-DSM	2
8	0.567702	0.472394	sövereign	3
8	0.389244	0.356824	sbert_baseline	4
16	0.686393	0.542267	TWENTE-BMS-NLP	1
16	0.606888	0.399691	sövereign	2
16	0.579259	0.372771	GESIS-DSM	3
16	0.417007	0.315758	sbert_baseline	4
20	0.725216	0.523937	TWENTE-BMS-NLP	1
20	0.623785	0.371399	sövereign	2
20	0.574316	0.336831	GESIS-DSM	3
20	0.437995	0.305487	sbert_baseline	4
avg	0.655204	0.559804	TWENTE-BMS-NLP	1
avg	0.599185	0.447719	GESIS-DSM	2
avg	0.589377	0.442473	sövereign	3
avg	0.406368	0.339082	sbert_baseline	4

Diversity - Scenario Explicit

k	αNDCG@k	klDiv@k	team	team_rank	socioVar(lowest_α)	socioVar(highest_α)
4	0.650068	0.228261	GESIS-DSM	1	age	Liberal Society
4	0.576809	0.239758	TWENTE-BMS-NLP	2	stance	political_spectrum
4	0.552310	0.215621	sövereign	3	stance	Liberal Society
4	0.377455	0.203101	sbert_baseline	4	stance	political_spectrum
8	0.623374	0.219347	GESIS-DSM	1	age	Liberal Society
8	0.607758	0.234161	TWENTE-BMS-NLP	2	stance	political_spectrum
8	0.563101	0.207237	sövereign	3	stance	Liberal Society
8	0.387433	0.190764	sbert_baseline	4	stance	political_spectrum
16	0.659803	0.222237	TWENTE-BMS-NLP	1	stance	political_spectrum
16	0.601287	0.199812	GESIS-DSM	2	gender	Liberal Society
16	0.594174	0.188411	sövereign	3	stance	political_spectrum
16	0.408144	0.162935	sbert_baseline	4	stance	political_spectrum
20	0.689067	0.218854	TWENTE-BMS-NLP	1	stance	political_spectrum
20	0.606706	0.182806	sövereign	2	stance	political_spectrum
20	0.596274	0.193914	GESIS-DSM	3	stance	Liberal Society
20	0.423671	0.154293	sbert_baseline	4	stance	political_spectrum
avg	0.633359	0.228753	TWENTE-BMS-NLP	1	stance	political_spectrum
avg	0.617751	0.210334	GESIS-DSM	2	age	Liberal Society
avg	0.579073	0.198518	sövereign	3	stance	Liberal Society
avg	0.399176	0.177773	sbert_baseline	4	stance	political_spectrum

socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Relevance - Scenario Implicit

k	ndcg@k	precision@k	team	team_rank
4	0.441651	0.456447	GESIS-DSM	1
4	0.382125	0.392661	sbert_baseline	2
4	0.346715	0.342593	sövereign	3
8	0.456150	0.428841	GESIS-DSM	1
8	0.397490	0.372942	sbert_baseline	2
8	0.364934	0.337791	sövereign	3
16	0.484986	0.373714	GESIS-DSM	1
16	0.422964	0.334019	sövereign	2
16	0.420502	0.322874	sbert_baseline	3
20	0.502046	0.352675	GESIS-DSM	1
20	0.452568	0.330247	sövereign	2
20	0.436431	0.306447	sbert_baseline	3
avg	0.471208	0.402919	GESIS-DSM	1
avg	0.409137	0.348731	sbert_baseline	2
avg	0.396795	0.336163	sövereign	3

Diversity - Scenario Implicit

k	αNDCG@k	klDiv@k	team	team_rank	socioVar(lowest_α)	socioVar(highest_α)
4	0.423761	0.198571	GESIS-DSM	1	stance	political_spectrum
4	0.369611	0.199280	sbert_baseline	2	stance	political_spectrum
4	0.339328	0.196251	sövereign	3	stance	residence
8	0.443519	0.188251	GESIS-DSM	1	stance	political_spectrum
8	0.388724	0.187431	sbert_baseline	2	stance	political_spectrum
8	0.359010	0.186613	sövereign	3	stance	political_spectrum
16	0.466246	0.162230	GESIS-DSM	1	stance	political_spectrum
16	0.407494	0.159313	sbert_baseline	2	stance	political_spectrum
16	0.403965	0.165996	sövereign	3	stance	political_spectrum
20	0.478868	0.154204	GESIS-DSM	1	stance	political_spectrum
20	0.426963	0.159973	sövereign	2	stance	political_spectrum
20	0.419468	0.149905	sbert_baseline	3	stance	political_spectrum
avg	0.453098	0.175814	GESIS-DSM	1	stance	political_spectrum
avg	0.396324	0.173982	sbert_baseline	2	stance	political_spectrum
avg	0.382317	0.177208	sövereign	3	stance	political_spectrum

socioVar(lowest_α) and socioVar(highest_α) are the socio-cultural variables with the lowest and highest α-ndcg values, respectively.

Leaderboards

The following two table show the final results for each track: relevance and diversity. We averaged the metric over k values, and then averaged over test sets. For comparison we averaged over the rank instead of the metric, no submitted runs were assigned the last possible rank.

Relevance

CSV To HTML using codebeautify.org

Team	Mean Rank	Mean (NDCG)	Rank
twente-bms-nlp	1.33	0.707	1
sövereign	2.22	0.632	2
GESIS-DSM	3.44	0.607	3
turya	4.44	0.518	4
sbert_baseline	5.0	0.445	5
team031	5.44	0.417	6
boulderNLP	6.44	0.292	7
bm25_baseline	7.67	0.195	8

Detailed Results for Relevance

CSV To HTML using codebeautify.org

Show Table

k	ndcg@k	precision@k	team	team_rank	test_set	scenario
4	1.0	1.0	sövereign	1	test1	baseline
4	0.990189057472516	0.9888888888888888	sbert_baseline	2	test1	baseline
4	0.990189057472516	0.9888888888888888	GESIS-DSM	2	test1	baseline
4	0.9881901089435936	0.9833333333333332	turiya	3	test1	baseline
4	0.9858515013614926	0.9888888888888888	boulderNLP	4	test1	baseline
4	0.9821153338888012	0.9833333333333332	twente-bms-nlp	5	test1	baseline
4	0.9168462271127248	0.9222222222222224	team031	6	test1	baseline
4	0.7161839498029191	0.6833333333333333	bm25_baseline	7	test1	baseline
8	1.0	1.0	sövereign	1	test1	baseline
8	0.9875927496432856	0.9861111111111112	sbert_baseline	2	test1	baseline
8	0.9875927496432856	0.9861111111111112	GESIS-DSM	2	test1	baseline
8	0.9866386041983832	0.9888888888888888	twente-bms-nlp	3	test1	baseline
8	0.9833098934306858	0.9833333333333332	boulderNLP	4	test1	baseline
8	0.9792047038130544	0.9722222222222222	turiya	5	test1	baseline
8	0.9072481161687932	0.9055555555555556	team031	6	test1	baseline
8	0.6716770078215716	0.6361111111111111	bm25_baseline	7	test1	baseline
16	0.998038122956649	0.9972222222222222	sövereign	1	test1	baseline
16	0.9894021441383424	0.9916666666666668	twente-bms-nlp	2	test1	baseline
16	0.9832545277354618	0.9805555555555556	GESIS-DSM	3	test1	baseline
16	0.9832545277354618	0.9805555555555556	sbert_baseline	3	test1	baseline
16	0.9773249241880808	0.975	boulderNLP	4	test1	baseline
16	0.9751721136524104	0.9694444444444444	turiya	5	test1	baseline
16	0.899859643525079	0.8958333333333334	team031	6	test1	baseline
16	0.6194260623649784	0.5791666666666667	bm25_baseline	7	test1	baseline
20	0.9982984713489932	0.9977777777777778	sövereign	1	test1	baseline
20	0.990078186412697	0.9922222222222222	twente-bms-nlp	2	test1	baseline
20	0.9810926808166834	0.9777777777777776	GESIS-DSM	3	test1	baseline
20	0.9810926808166834	0.9777777777777776	sbert_baseline	3	test1	baseline
20	0.972582822889201	0.9666666666666668	turiya	4	test1	baseline
20	0.9714870761972928	0.9666666666666668	boulderNLP	5	test1	baseline
20	0.8924909218930537	0.8855555555555555	team031	6	test1	baseline
20	0.5968774789265284	0.5533333333333333	bm25_baseline	7	test1	baseline
avg	0.9990841485764106	0.99875	sövereign	1	test1	baseline
avg	0.987058567159556	0.9890277777777776	twente-bms-nlp	2	test1	baseline
avg	0.9855322539169866	0.9833333333333334	GESIS-DSM	3	test1	baseline
avg	0.9855322539169866	0.9833333333333334	sbert_baseline	3	test1	baseline
avg	0.979493348794388	0.9784722222222222	boulderNLP	4	test1	baseline
avg	0.9787874373245647	0.9729166666666668	turiya	5	test1	baseline
avg	0.9041112271749128	0.9022916666666668	team031	6	test1	baseline
avg	0.6510411247289993	0.6129861111111111	bm25_baseline	7	test1	baseline
4	0.9001480461649514	0.8242154368108566	twente-bms-nlp	1	test1	explicit
4	0.8938215896369928	0.8176420695504665	sövereign	2	test1	explicit
4	0.8640193442830918	0.7926208651399491	GESIS-DSM	3	test1	explicit
4	0.7100640009053408	0.6593511450381679	turiya	4	test1	explicit
4	0.2175003870916797	0.2168150975402883	sbert_baseline	5	test1	explicit
4	0.1805931791924541	0.1805555555555555	team031	6	test1	explicit
8	0.8922772927899614	0.7459711620016963	twente-bms-nlp	1	test1	explicit
8	0.8802957150793026	0.7360581000848176	sövereign	2	test1	explicit
8	0.8413687834128213	0.7104007633587787	GESIS-DSM	3	test1	explicit
8	0.6893920594193227	0.595742962155176	turiya	4	test1	explicit
8	0.2195557791900342	0.2172391857506361	sbert_baseline	5	test1	explicit
8	0.1802815307736844	0.1776399491094147	team031	6	test1	explicit
16	0.8933848950285854	0.6624522900763359	twente-bms-nlp	1	test1	explicit
16	0.8716583627924369	0.650657336726039	sövereign	2	test1	explicit
16	0.8216202968972766	0.623717133163698	GESIS-DSM	3	test1	explicit
16	0.670775601164894	0.5372995801908015	turiya	4	test1	explicit
16	0.2258025677535813	0.2189620441051738	sbert_baseline	5	test1	explicit
16	0.1819053163856849	0.175917090754877	team031	6	test1	explicit
20	0.8951817256118378	0.6338634435962681	twente-bms-nlp	1	test1	explicit
20	0.8681664385182296	0.6220101781170484	sövereign	2	test1	explicit
20	0.8143315303846864	0.5946953633022336	GESIS-DSM	3	test1	explicit
20	0.6641266387944684	0.5193389539436657	turiya	4	test1	explicit
20	0.2270783113113688	0.2175784563189143	sbert_baseline	5	test1	explicit
20	0.1825981750938157	0.1745547073791348	team031	6	test1	explicit
avg	0.895247989898834	0.7166255831212892	twente-bms-nlp	1	test1	explicit
avg	0.8784855265067405	0.7065919211195928	sövereign	2	test1	explicit
avg	0.835334988744469	0.6803585312411649	GESIS-DSM	3	test1	explicit
avg	0.6835895750710065	0.5779331603319527	turiya	4	test1	explicit
avg	0.222484261336666	0.2176486959287531	sbert_baseline	5	test1	explicit
avg	0.1813445503614097	0.1771668256997454	team031	6	test1	explicit
4	0.2054552287288432	0.2055767599660729	sövereign	1	test1	implicit
4	0.1953762100874915	0.1957167090754877	GESIS-DSM	2	test1	implicit
4	0.1953215904929799	0.1956106870229007	twente-bms-nlp	3	test1	implicit
4	0.1950354227210964	0.1961407972858354	sbert_baseline	4	test1	implicit
4	0.1805931791924541	0.1805555555555555	team031	5	test1	implicit
8	0.2095966886054688	0.2090224766751484	sövereign	1	test1	implicit
8	0.2006420981831635	0.2008587786259542	twente-bms-nlp	2	test1	implicit
8	0.1998037256574479	0.2005937234944868	sbert_baseline	3	test1	implicit
8	0.1995275545197261	0.199374469889737	GESIS-DSM	4	test1	implicit
8	0.1802815307736844	0.1776399491094147	team031	5	test1	implicit
16	0.2172998625323523	0.214350084817642	sövereign	1	test1	implicit
16	0.2070910886627467	0.2043575063613231	twente-bms-nlp	2	test1	implicit
16	0.2050723911465899	0.2025021204410517	sbert_baseline	3	test1	implicit
16	0.2049944998931236	0.2017069550466497	GESIS-DSM	4	test1	implicit
16	0.1819053163856849	0.175917090754877	team031	5	test1	implicit
20	0.2197860194050664	0.2147794741306191	sövereign	1	test1	implicit
20	0.2108761109974972	0.2066581849024597	twente-bms-nlp	2	test1	implicit
20	0.2085997304828043	0.2041136556403732	sbert_baseline	3	test1	implicit
20	0.2082150355571272	0.2034563189143342	GESIS-DSM	4	test1	implicit
20	0.1825981750938157	0.1745547073791348	team031	5	test1	implicit
avg	0.2130344498179326	0.2109321988973705	sövereign	1	test1	implicit
avg	0.2034827220840968	0.2018712892281594	twente-bms-nlp	2	test1	implicit
avg	0.2021278175019846	0.2008375742154367	sbert_baseline	3	test1	implicit
avg	0.2020283250143671	0.2000636132315521	GESIS-DSM	4	test1	implicit
avg	0.1813445503614097	0.1771668256997454	team031	5	test1	implicit
4	0.9506765622181236	0.95	twente-bms-nlp	1	test2	baseline
4	0.9370083183448402	0.9375	turiya	2	test2	baseline
4	0.9061575597185184	0.9	sövereign	3	test2	baseline
4	0.9035266261886546	0.9125	boulderNLP	4	test2	baseline
4	0.8837131779057522	0.8875	sbert_baseline	5	test2	baseline
4	0.8837131779057522	0.8875	GESIS-DSM	5	test2	baseline
4	0.8172701854045675	0.8125	team031	6	test2	baseline
4	0.7756012468754936	0.775	bm25_baseline	7	test2	baseline
8	0.9438893622828144	0.940625	twente-bms-nlp	1	test2	baseline
8	0.9302213830137418	0.928125	turiya	2	test2	baseline
8	0.8977474997836907	0.890625	sövereign	3	test2	baseline
8	0.8927744682895797	0.89375	boulderNLP	4	test2	baseline
8	0.8630103992747806	0.85625	sbert_baseline	5	test2	baseline
8	0.8630103992747806	0.85625	GESIS-DSM	5	test2	baseline
8	0.807023471012481	0.8	team031	6	test2	baseline
8	0.7643677571905212	0.759375	bm25_baseline	7	test2	baseline
16	0.926180193487849	0.9171875	twente-bms-nlp	1	test2	baseline
16	0.9099072816558096	0.9	turiya	2	test2	baseline
16	0.8898112362771199	0.8828125	sövereign	3	test2	baseline
16	0.8743999301440972	0.8671875	boulderNLP	4	test2	baseline
16	0.8376410923352875	0.8234375	sbert_baseline	5	test2	baseline
16	0.8376410923352875	0.8234375	GESIS-DSM	5	test2	baseline
16	0.799430458369072	0.7921875	team031	6	test2	baseline
16	0.7171996891582305	0.69375	bm25_baseline	7	test2	baseline
20	0.9227059842814548	0.91375	twente-bms-nlp	1	test2	baseline
20	0.9028388865527066	0.89125	turiya	2	test2	baseline
20	0.8853640741967123	0.8775000000000001	sövereign	3	test2	baseline
20	0.8694925639825994	0.8612500000000001	boulderNLP	4	test2	baseline
20	0.8360211803043445	0.8237500000000001	sbert_baseline	5	test2	baseline
20	0.8360211803043445	0.8237500000000001	GESIS-DSM	5	test2	baseline
20	0.8019289404650657	0.7975	team031	6	test2	baseline
20	0.6926774698203287	0.66125	bm25_baseline	7	test2	baseline
avg	0.9358630255675604	0.930390625	twente-bms-nlp	1	test2	baseline
avg	0.9199939673917744	0.91421875	turiya	2	test2	baseline
avg	0.8947700924940104	0.887734375	sövereign	3	test2	baseline
avg	0.8850483971512327	0.883671875	boulderNLP	4	test2	baseline
avg	0.8550964624550412	0.847734375	sbert_baseline	5	test2	baseline
avg	0.8550964624550412	0.847734375	GESIS-DSM	5	test2	baseline
avg	0.8064132638127965	0.800546875	team031	6	test2	baseline
avg	0.7374615407611435	0.72234375	bm25_baseline	7	test2	baseline
4	0.8401994013581225	0.7487373737373737	sövereign	1	test2	explicit
4	0.821466482568112	0.7370931537598204	twente-bms-nlp	2	test2	explicit
4	0.7597256206685896	0.6770482603815937	GESIS-DSM	3	test2	explicit
4	0.7181658241944868	0.670314253647587	turiya	4	test2	explicit
4	0.1520743812311075	0.1494107744107744	sbert_baseline	5	test2	explicit
4	0.1291370189622331	0.127665544332211	team031	6	test2	explicit
8	0.8311386022544768	0.6627384960718294	sövereign	1	test2	explicit
8	0.8034594278677059	0.647516835016835	twente-bms-nlp	2	test2	explicit
8	0.7304752871670588	0.5797558922558923	GESIS-DSM	3	test2	explicit
8	0.6865914880186403	0.6023809523809524	turiya	4	test2	explicit
8	0.1470920609146963	0.1408529741863075	sbert_baseline	5	test2	explicit
8	0.1289305941676711	0.1257014590347923	team031	6	test2	explicit
16	0.8138006848030526	0.558641975308642	sövereign	1	test2	explicit
16	0.7867159997762964	0.5454545454545454	twente-bms-nlp	2	test2	explicit
16	0.7021023592617129	0.4772774036662925	GESIS-DSM	3	test2	explicit
16	0.6508330940798412	0.5271792928358585	turiya	4	test2	explicit
16	0.1456174257912896	0.1349607182940516	sbert_baseline	5	test2	explicit
16	0.1326741155169621	0.1244739057239057	team031	6	test2	explicit
20	0.8087703608546549	0.5222222222222223	sövereign	1	test2	explicit
20	0.7818119173753864	0.5101290684624018	twente-bms-nlp	2	test2	explicit
20	0.6943552440639758	0.4434530490086045	GESIS-DSM	3	test2	explicit
20	0.6408987980659909	0.5031557783460844	turiya	4	test2	explicit
20	0.1459810843316855	0.1331088664421998	sbert_baseline	5	test2	explicit
20	0.1353178895431881	0.1253086419753086	team031	6	test2	explicit
avg	0.8234772623175767	0.6230850168350168	sövereign	1	test2	explicit
avg	0.7983634568968752	0.6100484006734006	twente-bms-nlp	2	test2	explicit
avg	0.7216646277903342	0.5443836513280957	GESIS-DSM	3	test2	explicit
avg	0.6741223010897399	0.5757575693026206	turiya	4	test2	explicit
avg	0.1476912380671947	0.1395833333333333	sbert_baseline	5	test2	explicit
avg	0.1315149045475136	0.1257873877665544	team031	6	test2	explicit
4	0.1491400488763286	0.1473063973063973	twente-bms-nlp	1	test2	implicit
4	0.1402859410706227	0.1404320987654321	GESIS-DSM	2	test2	implicit
4	0.138856828432963	0.1349607182940516	sbert_baseline	3	test2	implicit
4	0.1359238732277851	0.1352413019079685	sövereign	4	test2	implicit
4	0.1291370189622331	0.127665544332211	team031	5	test2	implicit
8	0.1487113747836123	0.1463243546576879	twente-bms-nlp	1	test2	implicit
8	0.1377485887860268	0.1367845117845118	sövereign	2	test2	implicit
8	0.136982480497495	0.1346801346801346	GESIS-DSM	3	test2	implicit
8	0.1359412289049273	0.1314534231200897	sbert_baseline	4	test2	implicit
8	0.1289305941676711	0.1257014590347923	team031	5	test2	implicit
16	0.1489668236672465	0.1416596520763187	twente-bms-nlp	1	test2	implicit
16	0.140683237693298	0.1360830527497194	sövereign	2	test2	implicit
16	0.1376507499622861	0.1309624017957351	GESIS-DSM	3	test2	implicit
16	0.1343245457902445	0.1265782828282828	sbert_baseline	4	test2	implicit
16	0.1326741155169621	0.1244739057239057	team031	5	test2	implicit
20	0.1508884226776939	0.1408249158249158	twente-bms-nlp	1	test2	implicit
20	0.1421854720014271	0.1354938271604938	sövereign	2	test2	implicit
20	0.1395176107479813	0.1308361391694725	GESIS-DSM	3	test2	implicit
20	0.1353178895431881	0.1253086419753086	team031	4	test2	implicit
20	0.1346876902929752	0.1248035914702581	sbert_baseline	5	test2	implicit
avg	0.1494266675012203	0.1440288299663299	twente-bms-nlp	1	test2	implicit
avg	0.1391352929271342	0.1359006734006733	sövereign	2	test2	implicit
avg	0.1386091955695962	0.1342276936026935	GESIS-DSM	3	test2	implicit
avg	0.1359525733552775	0.1294490039281705	sbert_baseline	4	test2	implicit
avg	0.1315149045475136	0.1257873877665544	team031	5	test2	implicit
4	0.955733702777924	0.9537037037037036	twente-bms-nlp	1	test3	baseline
4	0.8012470305565794	0.8240740740740741	turiya	2	test3	baseline
4	0.7723299898164238	0.7962962962962963	boulderNLP	3	test3	baseline
4	0.6694504532875712	0.6944444444444444	sbert_baseline	4	test3	baseline
4	0.6694504532875712	0.6944444444444444	GESIS-DSM	4	test3	baseline
4	0.6676689810638864	0.6666666666666666	sövereign	5	test3	baseline
4	0.6290818353689971	0.6388888888888888	team031	6	test3	baseline
4	0.3588072149065744	0.3611111111111111	bm25_baseline	7	test3	baseline
8	0.954941801480141	0.9537037037037036	twente-bms-nlp	1	test3	baseline
8	0.7971576499451849	0.8055555555555556	turiya	2	test3	baseline
8	0.7765162640957356	0.7870370370370371	boulderNLP	3	test3	baseline
8	0.6762677388062868	0.6898148148148148	sbert_baseline	4	test3	baseline
8	0.6762677388062868	0.6898148148148148	GESIS-DSM	4	test3	baseline
8	0.6394523697942182	0.625	sövereign	5	test3	baseline
8	0.6043411978764253	0.5972222222222222	team031	6	test3	baseline
8	0.3660008931850136	0.3703703703703703	bm25_baseline	7	test3	baseline
16	0.938689557839632	0.9305555555555556	twente-bms-nlp	1	test3	baseline
16	0.7549540209028123	0.7523148148148148	boulderNLP	2	test3	baseline
16	0.7290344996287841	0.7037037037037037	turiya	3	test3	baseline
16	0.6099855894459848	0.5902777777777778	sövereign	4	test3	baseline
16	0.6098835373261119	0.5902777777777778	sbert_baseline	5	test3	baseline
16	0.6098835373261119	0.5902777777777778	GESIS-DSM	5	test3	baseline
16	0.567728112759321	0.5486111111111112	team031	6	test3	baseline
16	0.378634673193704	0.386574074074074	bm25_baseline	7	test3	baseline
20	0.9259681562666584	0.9129629629629628	twente-bms-nlp	1	test3	baseline
20	0.7383461732610088	0.7277777777777777	boulderNLP	2	test3	baseline
20	0.7022747547006197	0.6685185185185185	turiya	3	test3	baseline
20	0.5953882623344432	0.5722222222222222	sövereign	4	test3	baseline
20	0.5904311320364634	0.5648148148148148	sbert_baseline	5	test3	baseline
20	0.5904311320364634	0.5648148148148148	GESIS-DSM	5	test3	baseline
20	0.5687193359245571	0.5537037037037037	team031	6	test3	baseline
20	0.3690742436025672	0.3703703703703703	bm25_baseline	7	test3	baseline
avg	0.9438333045910888	0.9377314814814814	twente-bms-nlp	1	test3	baseline
avg	0.7605366120189951	0.7658564814814816	boulderNLP	2	test3	baseline
avg	0.757428483707792	0.750462962962963	turiya	3	test3	baseline
avg	0.6365082153641083	0.6348379629629629	sbert_baseline	4	test3	baseline
avg	0.6365082153641083	0.6348379629629629	GESIS-DSM	4	test3	baseline
avg	0.6281238006596331	0.6135416666666667	sövereign	5	test3	baseline
avg	0.5924676204823252	0.5846064814814815	team031	6	test3	baseline
avg	0.3681292562219648	0.3721064814814814	bm25_baseline	7	test3	baseline
4	0.8198554563236097	0.7767489711934157	twente-bms-nlp	1	test3	explicit
4	0.7005811735273945	0.6722679469593049	turiya	2	test3	explicit
4	0.6988445277827787	0.6529492455418381	sövereign	3	test3	explicit
4	0.6557260465824912	0.6069958847736625	GESIS-DSM	4	test3	explicit
4	0.3812260923239645	0.3782578875171468	sbert_baseline	5	test3	explicit
4	0.3744073973289669	0.3710562414266118	team031	6	test3	explicit
8	0.7970654428040227	0.664437585733882	twente-bms-nlp	1	test3	explicit
8	0.6759496125041157	0.5493827160493827	sövereign	2	test3	explicit
8	0.6547858097440934	0.5804036841073877	turiya	3	test3	explicit
8	0.621075561362604	0.504286694101509	GESIS-DSM	4	test3	explicit
8	0.3892442896858138	0.3568244170096022	sbert_baseline	5	test3	explicit
8	0.3885170845229115	0.3621399176954732	team031	6	test3	explicit
16	0.7872409801663298	0.5279492455418381	twente-bms-nlp	1	test3	explicit
16	0.6621420160395799	0.4297839506172839	sövereign	2	test3	explicit
16	0.6214336086761723	0.4928626357638703	turiya	3	test3	explicit
16	0.5962899192860883	0.3859739368998628	GESIS-DSM	4	test3	explicit
16	0.4317931596800455	0.3436213991769547	team031	5	test3	explicit
16	0.4170067980409585	0.3157578875171468	sbert_baseline	6	test3	explicit
20	0.7866482366998089	0.4811385459533607	twente-bms-nlp	1	test3	explicit
20	0.6559260234324775	0.3853909465020576	sövereign	2	test3	explicit
20	0.6147280552525008	0.4653386916207184	turiya	3	test3	explicit
20	0.5911325176447614	0.3486968449931413	GESIS-DSM	4	test3	explicit
20	0.4577675273363525	0.3371056241426611	team031	5	test3	explicit
20	0.4379952789940077	0.3054869684499314	sbert_baseline	6	test3	explicit
avg	0.7977025289984427	0.6125685871056241	twente-bms-nlp	1	test3	explicit
avg	0.6732155449397379	0.5043767146776407	sövereign	2	test3	explicit
avg	0.6478821618000402	0.5527182396128203	turiya	3	test3	explicit
avg	0.6160560112189862	0.4614883401920439	GESIS-DSM	4	test3	explicit
avg	0.413121292217069	0.3534807956104252	team031	5	test3	explicit
avg	0.4063681147611861	0.3390817901234567	sbert_baseline	6	test3	explicit
4	0.5901002912696925	0.5905349794238683	twente-bms-nlp	1	test3	implicit
4	0.4416508936048149	0.4564471879286694	GESIS-DSM	2	test3	implicit
4	0.3922386671544742	0.3861454046639231	sövereign	3	test3	implicit
4	0.3821253012184295	0.3926611796982167	sbert_baseline	4	test3	implicit
4	0.3744073973289669	0.3710562414266118	team031	5	test3	implicit
8	0.6191061939721301	0.5824759945130316	twente-bms-nlp	1	test3	implicit
8	0.4561503447358589	0.428840877914952	GESIS-DSM	2	test3	implicit
8	0.4073474290111054	0.3736282578875171	sövereign	3	test3	implicit
8	0.3974903953664807	0.3729423868312757	sbert_baseline	4	test3	implicit
8	0.3885170845229115	0.3621399176954732	team031	5	test3	implicit
16	0.6863927918686149	0.5422668038408779	twente-bms-nlp	1	test3	implicit
16	0.4849864909233201	0.3737139917695473	GESIS-DSM	2	test3	implicit
16	0.4592592159817502	0.3548525377229081	sövereign	3	test3	implicit
16	0.4317931596800455	0.3436213991769547	team031	4	test3	implicit
16	0.4205015577091178	0.3228737997256515	sbert_baseline	5	test3	implicit
20	0.7252156135941026	0.5239368998628258	twente-bms-nlp	1	test3	implicit
20	0.5020455177182803	0.3526748971193416	GESIS-DSM	2	test3	implicit
20	0.4849894963542152	0.3441700960219478	sövereign	3	test3	implicit
20	0.4577675273363525	0.3371056241426611	team031	4	test3	implicit
20	0.4364310782867878	0.3064471879286694	sbert_baseline	5	test3	implicit
avg	0.655203722676135	0.5598036694101509	twente-bms-nlp	1	test3	implicit
avg	0.4712083117455685	0.4029192386831275	GESIS-DSM	2	test3	implicit
avg	0.4359587021253862	0.364699074074074	sövereign	3	test3	implicit
avg	0.413121292217069	0.3534807956104252	team031	4	test3	implicit
avg	0.4091370831452039	0.3487311385459533	sbert_baseline	5	test3	implicit

Diversity

Leaderboard Diversity

team	mean_rank	mean(αNDCG)	rank
twente-bms-nlp	1.67	0.672	1
sövereign	2.22	0.601	2
GESIS-DSM	3.44	0.579	3
turiya	4.67	0.495	4
sbert-baseline	5.0	0.419	5
team031	5.78	0.394	6
boulderNLP	6.56	0.271	7
bm25_baseline	8.0	0.185	8

Final Results Diversity

Detailed Results for Diversity CSV To HTML using codebeautify.org

Show Table

k	αNDCG@k	klDiv@k	team	team_rank	socioVar(lowest_α)	socioVar(highest_α)	test_set	scenario
4	0.9019970009344244	0.1545358338681438	GESIS-DSM	1	political_spectrum	Open Foreign Policy	test1	baseline
4	0.901690015804786	0.1551472473081518	sbert_baseline	2	political_spectrum	Open Foreign Policy	test1	baseline
4	0.9001443511545081	0.1735484649913478	sövereign	3	political_spectrum	gender	test1	baseline
4	0.8974267304082445	0.1597936041696669	boulderNLP	4	age	Expanded Welfare State	test1	baseline
4	0.8902245883191952	0.1748546359011083	turiya	5	political_spectrum	gender	test1	baseline
4	0.881596836723969	0.1744801535869937	twente-bms-nlp	6	political_spectrum	Open Foreign Policy	test1	baseline
4	0.8364699126677203	0.1541542514654867	team031	7	civil_status	gender	test1	baseline
4	0.6726820197638872	0.1523011191459042	bm25_baseline	8	political_spectrum	gender	test1	baseline
8	0.913536118382498	0.1583148521004212	sövereign	1	political_spectrum	gender	test1	baseline
8	0.9089622315243664	0.1394203854946891	GESIS-DSM	2	political_spectrum	gender	test1	baseline
8	0.908762417677734	0.139903804902831	sbert_baseline	3	political_spectrum	gender	test1	baseline
8	0.90804629536743	0.1426355833585971	boulderNLP	4	political_spectrum	Open Foreign Policy	test1	baseline
8	0.8999893260995866	0.1582434862398295	twente-bms-nlp	5	political_spectrum	gender	test1	baseline
8	0.8963954326730816	0.1580407037821194	turiya	6	political_spectrum	gender	test1	baseline
8	0.8395067256513395	0.1380091227682019	team031	7	political_spectrum	Liberal Society	test1	baseline
8	0.6431872932833367	0.1360515419120727	bm25_baseline	8	political_spectrum	gender	test1	baseline
16	0.9334435269621736	0.1244070719171076	sövereign	1	education	gender	test1	baseline
16	0.924272584352686	0.1230319761817763	twente-bms-nlp	2	education	gender	test1	baseline
16	0.924070105264207	0.1061699588125747	GESIS-DSM	3	education	gender	test1	baseline
16	0.9239978191796326	0.106429241058285	sbert_baseline	4	education	gender	test1	baseline
16	0.9215736888963632	0.107496322829392	boulderNLP	5	education	Open Foreign Policy	test1	baseline
16	0.9134880134828716	0.1222881690885618	turiya	6	education	gender	test1	baseline
16	0.8499053702539047	0.1029188979728469	team031	7	education	Liberal Society	test1	baseline
16	0.6086479469897234	0.1027533544093416	bm25_baseline	8	education	gender	test1	baseline
20	0.9414020542731196	0.1151429098272939	sövereign	1	education	gender	test1	baseline
20	0.932912354849055	0.1125954204983612	twente-bms-nlp	2	education	gender	test1	baseline
20	0.929634808717093	0.0970301722088382	GESIS-DSM	3	education	gender	test1	baseline
20	0.9295570582523096	0.0972666613453144	sbert_baseline	4	education	gender	test1	baseline
20	0.923571804546576	0.0978047524775676	boulderNLP	5	education	Open Foreign Policy	test1	baseline
20	0.919242030402476	0.1112468329139299	turiya	6	education	gender	test1	baseline
20	0.8508566258944938	0.0927611297043296	team031	7	education	Liberal Society	test1	baseline
20	0.5927598871171057	0.0935480838185558	bm25_baseline	8	education	gender	test1	baseline
avg	0.9221315126930748	0.1428533247090426	sövereign	1	education	gender	test1	baseline
avg	0.9161660366100228	0.1242890875960614	GESIS-DSM	2	education	gender	test1	baseline
avg	0.9160018277286156	0.1246867386536455	sbert_baseline	3	education	gender	test1	baseline
avg	0.9126546298046534	0.1269325657088059	boulderNLP	4	education	Open Foreign Policy	test1	baseline
avg	0.9096927755063242	0.1420877591267401	twente-bms-nlp	5	political_spectrum	gender	test1	baseline
avg	0.904837516219406	0.1416075854214298	turiya	6	political_spectrum	gender	test1	baseline
avg	0.8441846586168645	0.1219608504777162	team031	7	education	gender	test1	baseline
avg	0.6293192867885132	0.1211635248214685	bm25_baseline	8	education	gender	test1	baseline
4	0.843143049430011	0.206811326664089	twente-bms-nlp	1	age	Open Foreign Policy	test1	explicit
4	0.8389643364349352	0.2062025477390115	sövereign	2	political_spectrum	Open Foreign Policy	test1	explicit
4	0.814501349866188	0.2019475524427494	GESIS-DSM	3	age	Open Foreign Policy	test1	explicit
4	0.6723072323331305	0.195715049435303	turiya	4	age	gender	test1	explicit
4	0.1988500982088648	0.1699434119445776	sbert_baseline	5	political_spectrum	gender	test1	explicit
4	0.1678368307523686	0.1541460741310489	team031	6	civil_status	gender	test1	explicit
8	0.8483257755666023	0.1937447559212075	twente-bms-nlp	1	political_spectrum	Open Foreign Policy	test1	explicit
8	0.8419795433215165	0.1931365771629827	sövereign	2	political_spectrum	Open Foreign Policy	test1	explicit
8	0.8090017369108808	0.1891929693342114	GESIS-DSM	3	political_spectrum	Open Foreign Policy	test1	explicit
8	0.667602303255406	0.1813940535047481	turiya	4	political_spectrum	gender	test1	explicit
8	0.2040679201709478	0.1538031471988926	sbert_baseline	5	political_spectrum	gender	test1	explicit
8	0.1701345417762519	0.1380034529038456	team031	6	education	gender	test1	explicit
16	0.8572370964146122	0.1660187434028118	twente-bms-nlp	1	education	Open Foreign Policy	test1	explicit
16	0.8470379390612589	0.1655245690369987	sövereign	2	education	Open Foreign Policy	test1	explicit
16	0.803539135546609	0.1626582524156065	GESIS-DSM	3	education	Open Foreign Policy	test1	explicit
16	0.6618105993488821	0.150733262478512	turiya	4	education	gender	test1	explicit
16	0.2127694884642832	0.1200146960388071	sbert_baseline	5	education	gender	test1	explicit
16	0.1739887825464037	0.1029195802371271	team031	6	education	gender	test1	explicit
20	0.8594772939635447	0.1582004705605105	twente-bms-nlp	1	education	Open Foreign Policy	test1	explicit
20	0.8472002421000155	0.1576117216526112	sövereign	2	education	Open Foreign Policy	test1	explicit
20	0.8000801802116122	0.1551870652724488	GESIS-DSM	3	education	Open Foreign Policy	test1	explicit
20	0.6586734598980795	0.1419128868928617	turiya	4	education	gender	test1	explicit
20	0.2149343712386582	0.1105472062194354	sbert_baseline	5	education	gender	test1	explicit
20	0.1753115166608778	0.0927666777509362	team031	6	education	gender	test1	explicit
avg	0.8520458038436926	0.1811938241371546	twente-bms-nlp	1	political_spectrum	Open Foreign Policy	test1	explicit
avg	0.8437955152294315	0.180618853897901	sövereign	2	political_spectrum	Open Foreign Policy	test1	explicit
avg	0.8067806006338225	0.177246459866254	GESIS-DSM	3	political_spectrum	Open Foreign Policy	test1	explicit
avg	0.6650983987088745	0.1674388130778561	turiya	4	political_spectrum	gender	test1	explicit
avg	0.2076554695206885	0.1385771153504281	sbert_baseline	5	political_spectrum	gender	test1	explicit
avg	0.1718179179339755	0.1219589462557394	team031	6	education	gender	test1	explicit
4	0.1883944407371643	0.1662334296479237	sövereign	1	political_spectrum	gender	test1	implicit
4	0.1804283405800343	0.1540634630028475	GESIS-DSM	2	political_spectrum	gender	test1	implicit
4	0.1803229481745985	0.1541849972369482	twente-bms-nlp	3	political_spectrum	gender	test1	implicit
4	0.1798008941970487	0.1552243541769757	sbert_baseline	4	political_spectrum	gender	test1	implicit
4	0.1678368307523686	0.1541460741310489	team031	5	civil_status	gender	test1	implicit
8	0.1945992841774712	0.1506246755581603	sövereign	1	political_spectrum	gender	test1	implicit
8	0.1867422790987336	0.1390363162004441	twente-bms-nlp	2	political_spectrum	gender	test1	implicit
8	0.1861153373975238	0.1393779815541836	sbert_baseline	3	political_spectrum	gender	test1	implicit
8	0.186044488827547	0.1389492008070555	GESIS-DSM	4	political_spectrum	gender	test1	implicit
8	0.1701345417762519	0.1380034529038456	team031	5	education	gender	test1	implicit
16	0.2044708160928928	0.1169708662516098	sövereign	1	education	gender	test1	implicit
16	0.1953274585532708	0.1061332388430287	twente-bms-nlp	2	education	gender	test1	implicit
16	0.193697029870305	0.1058750763274714	GESIS-DSM	3	education	gender	test1	implicit
16	0.1936789912736826	0.1068908937580698	sbert_baseline	4	education	gender	test1	implicit
16	0.1739887825464037	0.1029195802371271	team031	5	education	gender	test1	implicit
20	0.2075851107416589	0.1076349985074671	sövereign	1	education	gender	test1	implicit
20	0.1993615280044656	0.0971124604070055	twente-bms-nlp	2	education	gender	test1	implicit
20	0.1974470597394484	0.0979820296831711	sbert_baseline	3	education	gender	test1	implicit
20	0.1972990058244904	0.0967778383028306	GESIS-DSM	4	education	gender	test1	implicit
20	0.1753115166608778	0.0927666777509362	team031	5	education	gender	test1	implicit
avg	0.1987624129372968	0.1353659924912902	sövereign	1	education	gender	test1	implicit
avg	0.1904385534577671	0.1241167531718566	twente-bms-nlp	2	education	gender	test1	implicit
avg	0.1893672162755941	0.1239163946100512	GESIS-DSM	3	education	gender	test1	implicit
avg	0.1892605706519259	0.1248688147931	sbert_baseline	4	education	gender	test1	implicit
avg	0.1718179179339755	0.1219589462557394	team031	5	education	gender	test1	implicit
4	0.8596691448400074	0.1435006656471971	twente-bms-nlp	1	gender	Expanded Welfare State	test2	baseline
4	0.8439015332463649	0.1514892083342836	turiya	2	gender	Expanded Welfare State	test2	baseline
4	0.8150526930285257	0.1539618236581867	boulderNLP	3	gender	Expanded Welfare State	test2	baseline
4	0.8116776686203651	0.1644556389412739	sövereign	4	gender	Expanded Welfare State	test2	baseline
4	0.7914032818077227	0.1486246850822561	sbert_baseline	5	gender	Enhanced Environmental Protection	test2	baseline
4	0.7914032818077227	0.1486246850822561	GESIS-DSM	6	gender	Enhanced Environmental Protection	test2	baseline
4	0.742380242951213	0.1518197265065421	team031	7	gender	Expanded Welfare State	test2	baseline
4	0.7002734965293811	0.1555105770050212	bm25_baseline	8	gender	Expanded Welfare State	test2	baseline
8	0.8681035217911394	0.1288537766567586	twente-bms-nlp	1	civil_status	Expanded Welfare State	test2	baseline
8	0.8543248675626207	0.1357647713122422	turiya	2	civil_status	Expanded Welfare State	test2	baseline
8	0.8198267217302753	0.1495308767256653	sövereign	3	civil_status	Expanded Welfare State	test2	baseline
8	0.8190315478761963	0.1383028778057128	boulderNLP	4	civil_status	Expanded Welfare State	test2	baseline
8	0.792509114975895	0.1320143736633611	sbert_baseline	5	civil_status	Enhanced Environmental Protection	test2	baseline
8	0.792507734956548	0.1320143736633611	GESIS-DSM	6	civil_status	Enhanced Environmental Protection	test2	baseline
8	0.7468215781005241	0.1351293292872165	team031	7	civil_status	Expanded Welfare State	test2	baseline
8	0.7037460286662772	0.1383383132480902	bm25_baseline	8	civil_status	Expanded Welfare State	test2	baseline
16	0.8746004728307052	0.0983103476562093	twente-bms-nlp	1	education	Expanded Welfare State	test2	baseline
16	0.860004904444542	0.1032616457482554	turiya	2	education	Expanded Welfare State	test2	baseline
16	0.8353194789929267	0.1184975195317492	sövereign	3	education	Expanded Welfare State	test2	baseline
16	0.825617365881163	0.1056594539569529	boulderNLP	4	education	Expanded Welfare State	test2	baseline
16	0.7925623055675046	0.0990963527045277	sbert_baseline	5	education	Enhanced Environmental Protection	test2	baseline
16	0.7925226856312766	0.0990917075203069	GESIS-DSM	6	education	Enhanced Environmental Protection	test2	baseline
16	0.7577701557590385	0.10090614928308	team031	7	education	Expanded Welfare State	test2	baseline
16	0.6846931483708592	0.1022053665741982	bm25_baseline	8	education	Expanded Welfare State	test2	baseline
20	0.8792657028220507	0.0896169083360676	twente-bms-nlp	1	education	Expanded Welfare State	test2	baseline
20	0.8611917878841686	0.0940871105379612	turiya	2	education	Expanded Welfare State	test2	baseline
20	0.8393628295034755	0.1093269998474775	sövereign	3	education	Expanded Welfare State	test2	baseline
20	0.8284238464054676	0.0962846509830787	boulderNLP	4	education	Expanded Welfare State	test2	baseline
20	0.7970344209471913	0.0904525730200898	GESIS-DSM	5	education	Enhanced Environmental Protection	test2	baseline
20	0.7970037191362768	0.0904550778161378	sbert_baseline	6	education	Enhanced Environmental Protection	test2	baseline
20	0.7656635421934611	0.0915341509681246	team031	7	education	Expanded Welfare State	test2	baseline
20	0.670001414383212	0.0922582352817289	bm25_baseline	8	education	Expanded Welfare State	test2	baseline
avg	0.8704097105709756	0.1150704245740581	twente-bms-nlp	1	civil_status	Expanded Welfare State	test2	baseline
avg	0.854855773284424	0.1211506839831856	turiya	2	civil_status	Expanded Welfare State	test2	baseline
avg	0.8265466747117607	0.1354527587615414	sövereign	3	civil_status	Expanded Welfare State	test2	baseline
avg	0.8220313632978382	0.1235522016009827	boulderNLP	4	civil_status	Expanded Welfare State	test2	baseline
avg	0.7933696053718498	0.1175476223165706	sbert_baseline	5	civil_status	Enhanced Environmental Protection	test2	baseline
avg	0.7933670308356847	0.1175458348215034	GESIS-DSM	6	civil_status	Enhanced Environmental Protection	test2	baseline
avg	0.7531588797510592	0.1198473390112408	team031	7	education	Expanded Welfare State	test2	baseline
avg	0.6896785219874324	0.1220781230272596	bm25_baseline	8	civil_status	Expanded Welfare State	test2	baseline
4	0.7929190001856363	0.1899376264586104	sövereign	1	civil_status	Expanded Welfare State	test2	explicit
4	0.7784182051472814	0.1889393159352049	twente-bms-nlp	2	civil_status	Expanded Welfare State	test2	explicit
4	0.7205589695235514	0.1867786313250642	GESIS-DSM	3	civil_status	Expanded Welfare State	test2	explicit
4	0.6860395691066639	0.1843875930866665	turiya	4	civil_status	Expanded Welfare State	test2	explicit
4	0.1427502082247471	0.1546644515612977	sbert_baseline	5	civil_status	Expanded Welfare State	test2	explicit
4	0.1211223870063154	0.1517016327007973	team031	6	residence	Expanded Welfare State	test2	explicit
8	0.7979022943300329	0.1777119422929806	sövereign	1	political_spectrum	Expanded Welfare State	test2	explicit
8	0.7746522830610651	0.176547534384226	twente-bms-nlp	2	political_spectrum	Expanded Welfare State	test2	explicit
8	0.7077334881305005	0.1745630961227365	GESIS-DSM	3	political_spectrum	Expanded Welfare State	test2	explicit
8	0.6722091045460789	0.1718995120510371	turiya	4	political_spectrum	Expanded Welfare State	test2	explicit
8	0.1411414998589877	0.1388524011251983	sbert_baseline	5	political_spectrum	Expanded Welfare State	test2	explicit
8	0.1229367115305565	0.1350837375198916	team031	6	political_spectrum	Expanded Welfare State	test2	explicit
16	0.7938226928593625	0.1519399235810757	sövereign	1	education	Expanded Welfare State	test2	explicit
16	0.7679247369615424	0.1511405244524261	twente-bms-nlp	2	education	Expanded Welfare State	test2	explicit
16	0.6914898029416457	0.1490039833580322	GESIS-DSM	3	political_spectrum	stance	test2	explicit
16	0.6509426544799443	0.1467837377412502	turiya	4	political_spectrum	Expanded Welfare State	test2	explicit
16	0.141853618763209	0.1060758328469115	sbert_baseline	5	political_spectrum	Expanded Welfare State	test2	explicit
16	0.1274992112929576	0.1009351873862183	team031	6	political_spectrum	age	test2	explicit
20	0.7914850377926509	0.1446822242273996	sövereign	1	political_spectrum	Expanded Welfare State	test2	explicit
20	0.7649914048158978	0.1441680606758572	twente-bms-nlp	2	education	Expanded Welfare State	test2	explicit
20	0.6859634979815318	0.1418695271609526	GESIS-DSM	3	political_spectrum	stance	test2	explicit
20	0.6440424751343595	0.1400439556270465	turiya	4	political_spectrum	Expanded Welfare State	test2	explicit
20	0.1424850706547802	0.0967405350877738	sbert_baseline	5	political_spectrum	Expanded Welfare State	test2	explicit
20	0.1300528050458294	0.0915591547872168	team031	6	political_spectrum	age	test2	explicit
avg	0.7940322562919206	0.1660679291400165	sövereign	1	civil_status	Expanded Welfare State	test2	explicit
avg	0.7714966574964467	0.1651988588619285	twente-bms-nlp	2	civil_status	Expanded Welfare State	test2	explicit
avg	0.7014364396443074	0.1630538094916963	GESIS-DSM	3	civil_status	Expanded Welfare State	test2	explicit
avg	0.6633084508167616	0.1607786996265	turiya	4	political_spectrum	Expanded Welfare State	test2	explicit
avg	0.142057599375431	0.1240833051552953	sbert_baseline	5	civil_status	Expanded Welfare State	test2	explicit
avg	0.1254027787189147	0.119819928098531	team031	6	political_spectrum	Expanded Welfare State	test2	explicit
4	0.139705387432964	0.1528616531800585	twente-bms-nlp	1	civil_status	Expanded Welfare State	test2	implicit
4	0.1308316585212354	0.152955124174716	sbert_baseline	2	civil_status	Enhanced Environmental Protection	test2	implicit
4	0.129996724876443	0.1495210966741525	GESIS-DSM	3	civil_status	Enhanced Environmental Protection	test2	implicit
4	0.1261766066723165	0.1522607920091116	sövereign	4	civil_status	Expanded Welfare State	test2	implicit
4	0.1211223870063154	0.1517016327007973	team031	5	residence	Expanded Welfare State	test2	implicit
8	0.1416882698296257	0.1360492441914399	twente-bms-nlp	1	civil_status	Expanded Welfare State	test2	implicit
8	0.1304867250898901	0.1372173655384356	sbert_baseline	2	civil_status	Enhanced Environmental Protection	test2	implicit
8	0.1301752198826576	0.1329377136336857	GESIS-DSM	3	civil_status	Enhanced Environmental Protection	test2	implicit
8	0.1296490126302001	0.1384695655926301	sövereign	4	civil_status	Expanded Welfare State	test2	implicit
8	0.1229367115305565	0.1350837375198916	team031	5	political_spectrum	Expanded Welfare State	test2	implicit
16	0.1438225542489765	0.1025865368387708	twente-bms-nlp	1	civil_status	Liberal Economic Policy	test2	implicit
16	0.1342452062627152	0.1094855428326242	sövereign	2	political_spectrum	Liberal Economic Policy	test2	implicit
16	0.1326183205722217	0.0999199234579163	GESIS-DSM	3	civil_status	Liberal Economic Policy	test2	implicit
16	0.1309450108709812	0.103861235235479	sbert_baseline	4	civil_status	Enhanced Environmental Protection	test2	implicit
16	0.1274992112929576	0.1009351873862183	team031	5	political_spectrum	age	test2	implicit
20	0.145637931583728	0.0938155982677546	twente-bms-nlp	1	civil_status	Liberal Economic Policy	test2	implicit
20	0.1360512624569728	0.1013573502040684	sövereign	2	political_spectrum	Liberal Economic Policy	test2	implicit
20	0.1344431683787441	0.0912305045109255	GESIS-DSM	3	civil_status	Liberal Economic Policy	test2	implicit
20	0.1314880488736023	0.0946881388644727	sbert_baseline	4	civil_status	Enhanced Environmental Protection	test2	implicit
20	0.1300528050458294	0.0915591547872168	team031	5	political_spectrum	age	test2	implicit
avg	0.1427135357738235	0.1213282581195059	twente-bms-nlp	1	civil_status	Expanded Welfare State	test2	implicit
avg	0.1318083584275166	0.11840230956917	GESIS-DSM	2	civil_status	Liberal Economic Policy	test2	implicit
avg	0.1315305220055511	0.1253933126596085	sövereign	3	civil_status	Expanded Welfare State	test2	implicit
avg	0.1309378608389272	0.1221804659532758	sbert_baseline	4	civil_status	Enhanced Environmental Protection	test2	implicit
avg	0.1254027787189147	0.119819928098531	team031	5	political_spectrum	Expanded Welfare State	test2	implicit
4	0.838573720660759	0.245924222883813	twente-bms-nlp	1	education	Open Foreign Policy	test3	baseline
4	0.7062034091878469	0.1981405921630443	turiya	2	education	Enhanced Environmental Protection	test3	baseline
4	0.668843694725362	0.2099744898227741	boulderNLP	3	education	Enhanced Environmental Protection	test3	baseline
4	0.5958230137740126	0.1867553722037808	sövereign	4	education	Enhanced Environmental Protection	test3	baseline
4	0.5842710757331174	0.1843420948958528	sbert_baseline	5	education	Enhanced Environmental Protection	test3	baseline
4	0.5807451792475774	0.1819401437218415	GESIS-DSM	6	education	Enhanced Environmental Protection	test3	baseline
4	0.5436666519490195	0.1986322397092366	team031	7	civil_status	Enhanced Environmental Protection	test3	baseline
4	0.3173606982686766	0.1863613314643273	bm25_baseline	8	education	Enhanced Environmental Protection	test3	baseline
8	0.884603239340912	0.2300043782965247	twente-bms-nlp	1	education	Enhanced Environmental Protection	test3	baseline
8	0.7414192287778192	0.1823366260899847	turiya	2	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
8	0.7132469494696095	0.194039519977546	boulderNLP	3	education	Enhanced Environmental Protection	test3	baseline
8	0.6218871176004452	0.1691557934866204	sbert_baseline	4	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
8	0.6215732813548899	0.1671033644716565	GESIS-DSM	5	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
8	0.6033834251372624	0.1746287777699152	sövereign	6	education	Enhanced Environmental Protection	test3	baseline
8	0.5586184745404498	0.181328073516437	team031	7	age	Enhanced Environmental Protection	test3	baseline
8	0.3389441491101829	0.1683568504455137	bm25_baseline	8	stance	Enhanced Environmental Protection	test3	baseline
16	0.9006475848662722	0.1938279483630162	twente-bms-nlp	1	education	Enhanced Environmental Protection	test3	baseline
16	0.719450717369105	0.1588100607377114	boulderNLP	2	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
16	0.7084268640623587	0.1502352699010248	turiya	3	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
16	0.5959901639914733	0.1457405183765326	sövereign	4	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
16	0.5895481594110684	0.1346325594523309	sbert_baseline	5	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
16	0.5893366282285519	0.1333701743091621	GESIS-DSM	6	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
16	0.547826537851428	0.145647288202764	team031	7	gender	Enhanced Environmental Protection	test3	baseline
16	0.3586846476945486	0.1314771732715052	bm25_baseline	8	civil_status	Enhanced Environmental Protection	test3	baseline
20	0.8960997242257639	0.1833046610255597	twente-bms-nlp	1	education	Enhanced Environmental Protection	test3	baseline
20	0.7107502438947255	0.1487424492145065	boulderNLP	2	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
20	0.6906575872768002	0.1404262919395938	turiya	3	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
20	0.586181668663252	0.1367752280479396	sövereign	4	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
20	0.5767143070819051	0.124776552690353	sbert_baseline	5	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
20	0.5766504736775157	0.1237109402251022	GESIS-DSM	6	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
20	0.5517461538184697	0.135267172769409	team031	7	gender	Enhanced Environmental Protection	test3	baseline
20	0.3537741480402498	0.1210055081175744	bm25_baseline	8	civil_status	Enhanced Environmental Protection	test3	baseline
avg	0.8799810672734267	0.2132653026422284	twente-bms-nlp	1	education	Enhanced Environmental Protection	test3	baseline
avg	0.7116767723262063	0.1677846950234119	turiya	2	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
avg	0.7030729013647006	0.1778916299381345	boulderNLP	3	education	Enhanced Environmental Protection	test3	baseline
avg	0.5953445678915001	0.160974974099542	sövereign	4	education	Enhanced Environmental Protection	test3	baseline
avg	0.5931051649566341	0.1532267501312892	sbert_baseline	5	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
avg	0.5920763906271337	0.1515311556819406	GESIS-DSM	6	Restrictive Immigration Policy	Enhanced Environmental Protection	test3	baseline
avg	0.5504644545398417	0.1652186935494616	team031	7	age	Enhanced Environmental Protection	test3	baseline
avg	0.3421909107784145	0.1518002158247301	bm25_baseline	8	education	Enhanced Environmental Protection	test3	baseline
4	0.7996306387079166	0.269793699300543	twente-bms-nlp	1	education	Law & Order	test3	explicit
4	0.6916271116432543	0.2409603804524061	turiya	2	stance	Liberal Economic Policy	test3	explicit
4	0.6873816606887692	0.2414749213689018	sövereign	3	age	Liberal Economic Policy	test3	explicit
4	0.6544835515192873	0.2388165755249987	GESIS-DSM	4	education	Liberal Economic Policy	test3	explicit
4	0.374824204910662	0.1945566043271698	sbert_baseline	5	stance	political_spectrum	test3	explicit
4	0.3643597175216282	0.1968215666976877	team031	6	stance	residence	test3	explicit
8	0.7940492522789729	0.2625567977951125	twente-bms-nlp	1	education	Law & Order	test3	explicit
8	0.6786006312566784	0.2310641790901396	sövereign	2	age	Liberal Economic Policy	test3	explicit
8	0.6663649758103218	0.2278067762664941	turiya	3	stance	Expanded Welfare State	test3	explicit
8	0.6347465803306451	0.2278933963337442	GESIS-DSM	4	age	Liberal Economic Policy	test3	explicit
8	0.3876115408337925	0.1790408765818324	sbert_baseline	5	stance	political_spectrum	test3	explicit
8	0.3827839799050314	0.1825351890017625	team031	6	stance	political_spectrum	test3	explicit
16	0.7898573099009154	0.2482207784164836	twente-bms-nlp	1	education	Law & Order	test3	explicit
16	0.670160293045864	0.2085165888647449	sövereign	2	stance	Liberal Economic Policy	test3	explicit
16	0.641645063755123	0.1979783046324642	turiya	3	stance	Expanded Welfare State	test3	explicit
16	0.6147447558167222	0.2048740258050405	GESIS-DSM	4	gender	education	test3	explicit
16	0.4185064486233371	0.1507247976846235	team031	5	stance	political_spectrum	test3	explicit
16	0.4099401957592709	0.1444574677927211	sbert_baseline	6	stance	political_spectrum	test3	explicit
20	0.7897402318053337	0.2441724769627611	twente-bms-nlp	1	education	Law & Order	test3	explicit
20	0.6651831469115177	0.2016538145874079	sövereign	2	stance	Liberal Economic Policy	test3	explicit
20	0.6358453902909823	0.1889384022334226	turiya	3	stance	age	test3	explicit
20	0.6100692803760382	0.1982938659682189	GESIS-DSM	4	gender	education	test3	explicit
20	0.4388359116306663	0.1401321376837655	team031	5	stance	education	test3	explicit
20	0.4257042513604356	0.1340496468113222	sbert_baseline	6	stance	political_spectrum	test3	explicit
avg	0.7933193581732846	0.256185938118725	twente-bms-nlp	1	education	Law & Order	test3	explicit
avg	0.6753314329757074	0.2206773759777985	sövereign	2	stance	Liberal Economic Policy	test3	explicit
avg	0.6588706353749204	0.2139209658961967	turiya	3	stance	Expanded Welfare State	test3	explicit
avg	0.6285110420106732	0.2174694659080006	GESIS-DSM	4	gender	Liberal Economic Policy	test3	explicit
avg	0.4011215144201658	0.1675534227669598	team031	5	stance	political_spectrum	test3	explicit
avg	0.3995200482160402	0.1630261488782614	sbert_baseline	6	stance	political_spectrum	test3	explicit
4	0.5755541372802974	0.2148056473370537	twente-bms-nlp	1	stance	political_spectrum	test3	implicit
4	0.4201438874604862	0.1897275909642647	GESIS-DSM	2	stance	political_spectrum	test3	implicit
4	0.3845465571284123	0.1889371223684448	sövereign	3	stance	residence	test3	implicit
4	0.3673911690294011	0.1900036696005424	sbert_baseline	4	stance	political_spectrum	test3	implicit
4	0.3643597175216282	0.1968215666976877	team031	5	stance	residence	test3	implicit
8	0.6103716977626453	0.2016624842908093	twente-bms-nlp	1	stance	political_spectrum	test3	implicit
8	0.4446600935366286	0.1736534408164564	GESIS-DSM	2	stance	political_spectrum	test3	implicit
8	0.4048372964665679	0.1737114535443616	sövereign	3	stance	political_spectrum	test3	implicit
8	0.3899244680065674	0.1738337241513623	sbert_baseline	4	stance	political_spectrum	test3	implicit
8	0.3827839799050314	0.1825351890017625	team031	5	stance	political_spectrum	test3	implicit
16	0.6645327220194603	0.1738079614532192	twente-bms-nlp	1	stance	political_spectrum	test3	implicit
16	0.4686411687348337	0.1382828642007774	GESIS-DSM	2	stance	political_spectrum	test3	implicit
16	0.4455751480354461	0.1437362823383521	sövereign	3	stance	political_spectrum	test3	implicit
16	0.4185064486233371	0.1507247976846235	team031	4	stance	political_spectrum	test3	implicit
16	0.4097824464274244	0.1385120668628046	sbert_baseline	5	stance	political_spectrum	test3	implicit
20	0.6940811563261714	0.1655064601759661	twente-bms-nlp	1	stance	political_spectrum	test3	implicit
20	0.4814478114960005	0.1281961532665239	GESIS-DSM	2	stance	political_spectrum	test3	implicit
20	0.4654077766944569	0.1347446335565081	sövereign	3	stance	political_spectrum	test3	implicit
20	0.4388359116306663	0.1401321376837655	team031	4	stance	education	test3	implicit
20	0.4219442250692948	0.1277438130032451	sbert_baseline	5	stance	political_spectrum	test3	implicit
avg	0.6361349283471436	0.1889456383142621	twente-bms-nlp	1	stance	political_spectrum	test3	implicit
avg	0.4537232403069873	0.1574650123120056	GESIS-DSM	2	stance	political_spectrum	test3	implicit
avg	0.4250916945812208	0.1602823729519166	sövereign	3	stance	political_spectrum	test3	implicit
avg	0.4011215144201658	0.1675534227669598	team031	4	stance	political_spectrum	test3	implicit
avg	0.3972605771331719	0.1575233184044886	sbert_baseline	5	stance	political_spectrum	test3	implicit

Organizing Committee

References

Vamvas, J., & Sennrich, R. (2020, June). X-stance: A Multilingual Multi-Target Dataset for Stance Detection. In 5th SwissText & 16th KONVENS Joint Conference 2020 (p. 9). CEUR-WS. Org.

Policy

We abide by the ACL anti-harassment policy.

The Perspective Argument Retrieval Shared Task

Latest News:

Registration and more:

About

Task Description

Retrieval Scenarios

Retrieval Evaluation

Data

Submission Policy

Important Dates

System Description Papers

Page Limitation

Reviewing

Leaderboards

First Evaluation Circle (Test set 1, election 2019)

Relevance - Scenario Baseline

Diversity - Scenario Baseline

Relevance - Scenario Explicit

Diversity - Scenario Explicit

Relevance - Scenario Implicit

Diversity - Scenario Implicit

Second Evaluation Circle (Test set 2, election 2023)

Relevance - Scenario Baseline

Diversity - Scenario Baseline

Relevance - Scenario Explicit

Diversity - Scenario Explicit

Relevance - Scenario Implicit

Diversity - Scenario Implicit

Third Evaluation Circle (Suprise Test Set, election 2023, User Study)

Relevance - Scenario Baseline

Diversity - Scenario Baseline

Relevance - Scenario Explicit

Diversity - Scenario Explicit

Relevance - Scenario Implicit

Diversity - Scenario Implicit

Leaderboards

Relevance

Diversity

Sponsor

Organizing Committee

References

Policy