syntaxgym
¶
syntaxgym
is a Python package which provides easy, standardized, reproducible access to targeted syntactic evaluations of language models. It replicates the core behavior of the SyntaxGym website.
Quick example¶
You can define targeted syntactic evaluations using our standard JSON format. Here’s a simple one-item evaluation which tests language models’ knowledge of subject–verb number agreement:
{
"meta": {"name": "Sample subject--verb suite", "metric": "sum"},
"predictions": [{"type": "formula", "formula": "(2;%mismatch%) > (2;%match%)"}],
"region_meta": {"1": "Subject NP", "2": "Verb", "3": "Continuation"},
"items": [
{
"item_number": 1,
"conditions": [
{"condition_name": "match",
"regions": [{"region_number": 1, "content": "The woman"},
{"region_number": 2, "content": "plays"},
{"region_number": 3, "content": "the guitar"}]},
{"condition_name": "mismatch",
"regions": [{"region_number": 1, "content": "The woman"},
{"region_number": 2, "content": "play"},
{"region_number": 3, "content": "the guitar"}]}
]
}
]
}
You can then use syntaxgym
to evaluate a language model’s performance on
this test. Our tool is integrated with the LM Zoo, so you can instantly use any of the
models available in the Zoo.
Below, we evaluate GPT-2’s performance on the test suite:
$ syntaxgym run gpt2 my_suite.json
...
suite prediction_id item_number result
Sample subject--verb suite 0 1 True
We can do the same thing using a Python API:
from lm_zoo import get_registry
from syntaxgym import compute_surprisals, evaluate
model = get_registry()["gpt2"]
suite = compute_surprisals(model, "my_suite.json")
results = evaluate(suite)
print(results.to_csv(sep="\t"))
Acknowledgements¶
LM Zoo is maintained by the MIT Computational Psycholinguistics Laboratory.
If you use the website or command-line tools in your research, we ask that you please cite the ACL 2020 system demonstration paper:
@inproceedings{gauthier-etal-2020-syntaxgym,
title = "{S}yntax{G}ym: An Online Platform for Targeted Evaluation of Language Models",
author = "Gauthier, Jon and Hu, Jennifer and Wilcox, Ethan and Qian, Peng and Levy, Roger",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-demos.10",
pages = "70--76",
}
If you use the original test suites, models, or results presented on the website, please cite the ACL 2020 long paper:
@inproceedings{hu-etal-2020-systematic,
title = "A Systematic Assessment of Syntactic Generalization in Neural Language Models",
author = "Hu, Jennifer and Gauthier, Jon and Qian, Peng and Wilcox, Ethan and Levy, Roger",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.158",
pages = "1725--1744",
}