SEAM Leaderboard

Back to Home Page

Below we present the leaderboard for the different datasets included in SEAM, for the following configuration:

config.json
Summarization Question Answering Coreference Resolution
MultiNews OpenASP FuseReviews MusiQue SciCo ECB+
Mean std Mean std Mean std Mean std Mean std Mean std
Llama3-8B 19.50.9 5.60.5 75.91.3 49.41.9 24.10.8 21.91.9
Llama3-70B 20.90.5 9.02.0 76.81.6 57.31.3 24.31.7 22.33.4
Mistral-7B 20.10.6 11.80.6 77.51.7 11.54.8 31.11.3 20.10.9
Mixtral-8x7B 21.40.7 11.50.5 65.75.5 5.61.2 17.83.4 14.13.8
Mixtral-8x22B* 20.30.4 11.10.3 65.02.4 45.83.8 21.52.7 12.32.5
Gemma1.1-2B 8.71.0 2.90.4 59.64.6 17.22.8 3.40.8 4.31.8
Gemma1.1-7B 2.40.2 2.50.3 38.53.0 0.80.4 3.30.8 NA

* Mixtral-8x22B is 4-bit quantized due to computational constraints.

To contribute to SEAM leaderboard, please contact us at seam.benchmark@gmail.com.