Below we present the leaderboard for the different datasets included in SEAM, for the following configuration:
Summarization | Question Answering | Coreference Resolution | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
MultiNews | OpenASP | FuseReviews | MusiQue | SciCo | ECB+ | |||||||
Mean | std | Mean | std | Mean | std | Mean | std | Mean | std | Mean | std | |
Llama3-8B | 19.5 | 0.9 | 5.6 | 0.5 | 75.9 | 1.3 | 49.4 | 1.9 | 24.1 | 0.8 | 21.9 | 1.9 |
Llama3-70B | 20.9 | 0.5 | 9.0 | 2.0 | 76.8 | 1.6 | 57.3 | 1.3 | 24.3 | 1.7 | 22.3 | 3.4 |
Mistral-7B | 20.1 | 0.6 | 11.8 | 0.6 | 77.5 | 1.7 | 11.5 | 4.8 | 31.1 | 1.3 | 20.1 | 0.9 |
Mixtral-8x7B | 21.4 | 0.7 | 11.5 | 0.5 | 65.7 | 5.5 | 5.6 | 1.2 | 17.8 | 3.4 | 14.1 | 3.8 |
Mixtral-8x22B* | 20.3 | 0.4 | 11.1 | 0.3 | 65.0 | 2.4 | 45.8 | 3.8 | 21.5 | 2.7 | 12.3 | 2.5 |
Gemma1.1-2B | 8.7 | 1.0 | 2.9 | 0.4 | 59.6 | 4.6 | 17.2 | 2.8 | 3.4 | 0.8 | 4.3 | 1.8 |
Gemma1.1-7B | 2.4 | 0.2 | 2.5 | 0.3 | 38.5 | 3.0 | 0.8 | 0.4 | 3.3 | 0.8 | NA |
* Mixtral-8x22B is 4-bit quantized due to computational constraints.
To contribute to SEAM leaderboard, please contact us at seam.benchmark@gmail.com.