5 Commits

Author SHA1 Message Date
lewtun
26184f71ae
Refactor evaluation (#6) 2025-01-24 23:46:34 +01:00
Edward Beeching
9c398973e8
Adds Math-500 and AIME24 evals (#4)
* adds evals

* up max model len

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-01-24 23:09:07 +01:00
Leandro von Werra
52aefc29e2
Update README.md 2025-01-24 22:06:22 +01:00
lewtun
6acc9a0aa0
Add configs and stuff (#2) 2025-01-24 20:05:18 +01:00
lewtun
83f9c6c8da
Initial commit 2025-01-24 16:44:12 +01:00