71 Commits

Author SHA1 Message Date
lewtun
7564de2c24
Add diagram (#16) 2025-01-25 11:20:17 +01:00
Quentin Gallouédec
742cc008b2
Pin main for transformers and trl 2025-01-25 11:07:17 +01:00
Agus
33795e1b5a
Add math-verify to check accuracy of completions on GRPO (#14)
* Add math-verify to check accuracy of completions on GRPO

* Handle make_conversation

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* fix quality

* Remove unnecesary item access in parsed answer

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-25 11:03:58 +01:00
Loubna Ben Allal
2ceba252a3
Add SFT command to the readme (#15) 2025-01-25 10:56:33 +01:00
Gabriel Martín Blázquez
692e075715
Fix generate.slurm (#10) 2025-01-25 02:11:53 +01:00
elie
9987bb8995
use liger kernel 2025-01-25 01:51:39 +01:00
Gabriel Martín Blázquez
02bed5308c
Add synthetic data generation script (#9)
* Add synthetic data generation script

Co-authored-by: Anton <anton-l@users.noreply.github.com>
Co-authored-by: Agustin <plaguss@users.noreply.github.com>

* Fix format

* Fix imports sorting

---------

Co-authored-by: Anton <anton-l@users.noreply.github.com>
Co-authored-by: Agustin <plaguss@users.noreply.github.com>
2025-01-25 01:42:24 +01:00
Quentin Gallouédec
05496dcdab System prompt; Fix readme command 2025-01-25 00:21:21 +00:00
elie
9ae671a75e
fix slurm (#8) 2025-01-25 01:02:26 +01:00
Quentin Gallouédec
b47b1d058b
GRPO script (#3)
* inital commit

* with reward func

* fix box extract

* example line

* don't break when answer malformed

* command and logging

* holly simplicity

* move grpo

* reverse readme

* instructions
2025-01-25 00:19:38 +01:00
Lewis Tunstall
e660e43610 Fix configs 2025-01-24 23:13:20 +00:00
lewtun
ca8f35c143
REFACTOR TO THE MAX (#7) 2025-01-25 00:12:25 +01:00
lewtun
26184f71ae
Refactor evaluation (#6) 2025-01-24 23:46:34 +01:00
Edward Beeching
9c398973e8
Adds Math-500 and AIME24 evals (#4)
* adds evals

* up max model len

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-01-24 23:09:07 +01:00
elie
c421bc893b
Improve sft (#5)
* first commit

* working training

* change model_id

* Update scripts/training/sft.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-24 22:23:49 +01:00
Leandro von Werra
52aefc29e2
Update README.md 2025-01-24 22:06:22 +01:00
lewtun
6acc9a0aa0
Add configs and stuff (#2) 2025-01-24 20:05:18 +01:00
Quentin Gallouédec
a4bf90465f
Update setup.py (#1) 2025-01-24 19:13:04 +01:00
Lewis Tunstall
697c119dd8 Add data 2025-01-24 16:51:03 +00:00
Lewis Tunstall
2ff66e6cde Add skeleton 2025-01-24 16:50:13 +00:00
lewtun
83f9c6c8da
Initial commit 2025-01-24 16:44:12 +01:00