GitHub_collection_open-r1

Author	SHA1	Message	Date
lewtun	7564de2c24	Add diagram (#16 )	2025-01-25 11:20:17 +01:00
Quentin Gallouédec	742cc008b2	Pin main for transformers and trl	2025-01-25 11:07:17 +01:00
Agus	33795e1b5a	Add math-verify to check accuracy of completions on GRPO (#14 ) * Add math-verify to check accuracy of completions on GRPO * Handle make_conversation * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix quality * Remove unnecesary item access in parsed answer --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-25 11:03:58 +01:00
Loubna Ben Allal	2ceba252a3	Add SFT command to the readme (#15 )	2025-01-25 10:56:33 +01:00
Gabriel Martín Blázquez	692e075715	Fix `generate.slurm` (#10 )	2025-01-25 02:11:53 +01:00
elie	9987bb8995	use liger kernel	2025-01-25 01:51:39 +01:00
Gabriel Martín Blázquez	02bed5308c	Add synthetic data generation script (#9 ) * Add synthetic data generation script Co-authored-by: Anton <anton-l@users.noreply.github.com> Co-authored-by: Agustin <plaguss@users.noreply.github.com> * Fix format * Fix imports sorting --------- Co-authored-by: Anton <anton-l@users.noreply.github.com> Co-authored-by: Agustin <plaguss@users.noreply.github.com>	2025-01-25 01:42:24 +01:00
Quentin Gallouédec	05496dcdab	System prompt; Fix readme command	2025-01-25 00:21:21 +00:00
elie	9ae671a75e	fix slurm (#8 )	2025-01-25 01:02:26 +01:00
Quentin Gallouédec	b47b1d058b	GRPO script (#3 ) * inital commit * with reward func * fix box extract * example line * don't break when answer malformed * command and logging * holly simplicity * move grpo * reverse readme * instructions	2025-01-25 00:19:38 +01:00
Lewis Tunstall	e660e43610	Fix configs	2025-01-24 23:13:20 +00:00
lewtun	ca8f35c143	REFACTOR TO THE MAX (#7 )	2025-01-25 00:12:25 +01:00
lewtun	26184f71ae	Refactor evaluation (#6 )	2025-01-24 23:46:34 +01:00
Edward Beeching	9c398973e8	Adds Math-500 and AIME24 evals (#4 ) * adds evals * up max model len --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-01-24 23:09:07 +01:00
elie	c421bc893b	Improve sft (#5 ) * first commit * working training * change model_id * Update scripts/training/sft.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-24 22:23:49 +01:00
Leandro von Werra	52aefc29e2	Update README.md	2025-01-24 22:06:22 +01:00
lewtun	6acc9a0aa0	Add configs and stuff (#2 )	2025-01-24 20:05:18 +01:00
Quentin Gallouédec	a4bf90465f	Update setup.py (#1 )	2025-01-24 19:13:04 +01:00
Lewis Tunstall	697c119dd8	Add data	2025-01-24 16:51:03 +00:00
Lewis Tunstall	2ff66e6cde	Add skeleton	2025-01-24 16:50:13 +00:00
lewtun	83f9c6c8da	Initial commit	2025-01-24 16:44:12 +01:00

1 2

71 Commits