Below we provide the prompt paraphrases used for each dataset in SEAM, as well as a pool of 5 few-shot examples.
In each evaluation run, SEAM resamples a single instruction paraphrase, a subset of few-shot examples, and their arbitrary order.