GitHub_collection_open-r1

Author	SHA1	Message	Date
Dongwei Jiang	736b59f9a3	Update grpo.py (#171 )	2025-02-06 00:25:08 +01:00
Edward Beeching	3fd56dc7b4	fix uv env path + details (#188 ) * fix uv env path + details * Update slurm/grpo.slurm --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-02-05 23:59:25 +01:00
Lewis	138df0ca44	chore(setup.py): bump vllm>=0.7.1 (#181 ) See https://github.com/huggingface/trl/pull/2766.	2025-02-05 09:53:31 +01:00
Edward Beeching	5aff57c919	GRPO training args fixes (#177 ) * grpo training args fixes * style --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-02-05 09:09:45 +01:00
Matt	1fc8d425a9	Fix code quality after adding puzzles (#178 )	2025-02-04 15:30:25 +00:00
Matt	b24ce903cb	Fix quality after adding puzzles	2025-02-04 15:18:23 +00:00
Matt	9ca4ed5a6a	Add puzzles	2025-02-04 15:09:21 +00:00
Matt	b2d7ba2f1d	Add puzzles	2025-02-04 14:34:57 +00:00
Kashif Rasul	a0d61ccece	use ruff (#137 ) * use ruff * reformat * re-run * update deps * undo * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix help strings * fix ruff version * fix formatting --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-31 13:36:08 +01:00
Jingze Shi	e450a6fbc4	Recipes for optimzing training scripts (#120 ) * Add recipe configs to optimize scripts (#73) * remove small models * Add README for recipes * Add README for recipes * Attempt to resolve conflicts * Optimize src scripts * Update recipe of DeepSeek-R1-Distill-Qwen-7B * Update recipe of Qwen2.5-1.5B * Updated recipe readme for qwen * Update training command for recipes * Update README.md Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Update preprocessing_num_workers from 36 to 8 * Add small language model recipes for quickly verify R1 * Fix src code quality * Add back the Slurm job command * Remove recipe of doge * Fix torch_dtype is not used * fix grpo yaml * fix grpo yaml * fix deprecation warning * fix config folder location * Remove duplicate variables in grpo.py * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-31 12:41:53 +01:00
Dongwei Jiang	22512e62bc	Update README.md (#132 )	2025-01-31 11:27:17 +01:00
Edward Beeching	99d1083b7c	Adds async model push to PushToHubRevisionCallback (#124 ) * adds async model push * style	2025-01-31 09:17:10 +01:00
Quentin Gallouédec	6820d395be	Fix message removal in GRPO script (#131 )	2025-01-30 21:51:18 +01:00
Sam Schorb	356f6a5c4f	Add Table of Contents to README for easier navigation (#125 ) * Update README.md * Update README.md	2025-01-30 16:32:13 +01:00
Kashif Rasul	c0b53fae29	Grpo slurm scripts (#112 ) * initial grpo.slurm script * initial zero3 yaml using 1 less gpu * add completion and promp length * initial doc * use main * fix typo * remove num_processes * use vllm 0.7.0 * remove double module load * update math-verify * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * overwrite num_procs in the slurm script * add vllm args to readme * update readme --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-30 10:22:45 +01:00
Edward Beeching	972e47eff0	Adds auto eval callbacks (#115 ) * adds auto eval callbacks * updates training scripts with callbacks * style * date * update gitignore with logs, eval results, etc * remove unused imports * nits	2025-01-30 09:39:47 +01:00
A Taylor	f7b8e527e8	Fix help text for --retries to match actual default value (#103 ) The argparse help text for --retries stated the default was 3, but the actual default set in code is 0. This update corrects the help text to prevent confusion.	2025-01-30 09:38:30 +01:00
Lewis	fb1b4c4e3f	docs(README): note about CUDA 12.1 (#121 ) will segfault for CUDA 14.1 under certain conditions; instructions are specific to 12.1 - fixes #106 - fixes #117	2025-01-30 08:42:43 +01:00
Edward Beeching	bd0e15bfb5	Update README.md (#93 )	2025-01-30 00:42:29 +01:00
Mayur Pagote	7a7682b6a4	Corrected Typos in README.md (#110 )	2025-01-30 00:38:47 +01:00
Deborah Shekinah Jacob	971294b018	Modified: pip install --upgrade pip (#99 )	2025-01-29 19:55:54 +01:00
María Grandury	401a219575	fix typos (#40 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-29 12:37:11 +01:00
Hynek Kydlíček	e2235cf978	Improve repoduction of r1 reported score (#92 ) * bump up deps, fix aime24 evals, make grpo more strict * minor fixes * 🤨 fmt * bump lighteval + set boxed to match first * remove dead code * bump lighteval * add ed's tp branch swtich --------- Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>	2025-01-29 11:29:05 +01:00
Andrés Marafioti	c4fdb69940	Change conda for uv (#91 ) * Change conda for uv * quentin's magical path	2025-01-28 21:16:48 +01:00
Gabriel Martín Blázquez	c5941ed5e4	Add `--timeout`, `--retries`, `--prompt-template` and TP and PP by slurm variables (#94 ) * Set TP and PP using slurm variables * Add `--timeout` argument * add `--prompt-template` argument * Group generations * Add `--retries` argument	2025-01-28 18:30:04 +01:00
dependabot[bot]	e2e403daff	Bump actions/setup-python from 2 to 5 (#75 ) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 5. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v2...v5) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-01-27 15:17:10 +01:00
dependabot[bot]	29015d7e4c	Bump actions/checkout from 2 to 4 (#76 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 4. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v2...v4) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-01-27 15:15:07 +01:00
Claudiu	9ab30a1ebe	Added dependabot integration (#70 )	2025-01-27 15:11:49 +01:00
Gabriel Martín Blázquez	b03480d868	Add `--input-batch-size`, `--client-replicas` args and download Ray logs (#71 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-27 14:58:13 +01:00
Quentin Gallouédec	67764bc6ae	Update README.md (#72 )	2025-01-27 13:24:25 +01:00
María Grandury	d6f1a179a5	Implement make evaluate command (#41 ) * implement evaluate make command * add example usage of make evaluate to readme	2025-01-27 10:45:56 +01:00
CharlesCNorton	8d37c5c27f	docs: fix grammar and phrasing issues (1, 2, 3) (#62 ) 1. Insert missing article "a": - Original: "This repo is work in progress..." - Revised: "This repo is a work in progress..." Rationale: The article "a" is needed before "work in progress" to make the sentence grammatically correct. 2. Add "as well as" for parallelism: - Original: "...scripts to train and evaluate models as well generate synthetic data..." - Revised: "...scripts to train and evaluate models as well as generate synthetic data..." Rationale: "As well as" is the correct conjunction to link multiple verbs or verb phrases, improving clarity. 3. Clarify GPU resource phrasing: - Original: "we used 2 nodes of 8xH100 each one..." - Revised: "we used 2 nodes, each with 8×H100 GPUs..." Rationale: This rewording removes redundant language ("each one") and more clearly states that each node has eight H100 GPUs.	2025-01-27 10:45:34 +01:00
Edward Beeching	feb59d2b42	Update evaluate.slurm typo in eval slurm	2025-01-27 09:43:02 +01:00
Ikko Eltociear Ashimine	7c01b59c44	docs: update README.md (#54 ) scipts -> scripts Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-26 18:48:07 +01:00
Hynek Kydlíček	90b0947382	Reward verification and evaluation fixes (#55 ) * bump up deps, fix aime24 evals, make grpo more strict * minor fixes * 🤨 fmt * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-26 18:35:48 +01:00
Anton Lozhkov	15df4fb134	vllm speed tweaks (#43 )	2025-01-26 01:59:50 +01:00
Agus	d98862a5c8	Add example of generating data with deepseek r1 and distilled models (#29 )	2025-01-25 17:34:21 +01:00
Quentin Gallouédec	e3a2864658	Simplify sft	2025-01-25 15:46:29 +00:00
elie	64c0ed2254	fix evaluate.slurm (#27 )	2025-01-25 15:28:49 +01:00
elie	f169d2cd8e	add evaluate.slurm (#26 )	2025-01-25 15:23:16 +01:00
Manuel Romero	c27a974b99	Fix typo (#25 )	2025-01-25 15:16:56 +01:00
Gabriel Martín Blázquez	a90b99686a	Fix passing `vLLM` server URL (#21 ) * Use head node ip as vLLM server url * Pass correct server url * Add num_generations argument * Fix style * Remove `select` --------- Co-authored-by: plaguss <agustin@argilla.io>	2025-01-25 15:01:15 +01:00
Quentin Gallouédec	f844eac629	Update README.md	2025-01-25 14:49:49 +01:00
elie	43cb6a0e0f	fix sft.slurm	2025-01-25 14:48:45 +01:00
Quentin Gallouédec	d0265f94ce	Copyrights (#20 ) * handle error in verification * command with zero2 and catch more error in verifier * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * deepseek distill and remove grad chekpoint * drop grad checkpoint * except * copyrights --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-25 14:41:46 +01:00
Quentin Gallouédec	ff34d8f651	Handle error in verifier + deepspeed command (#17 ) * handle error in verification * command with zero2 and catch more error in verifier * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * deepseek distill and remove grad chekpoint * drop grad checkpoint * except --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-25 13:58:04 +01:00
lewtun	2580fd8c1b	Fix Slurm SFT and gather Slurm scripts (#19 ) * Fix slurm * Fix generate * Fix install * Fix c	2025-01-25 13:47:52 +01:00
lewtun	64b4927a33	Update README.md	2025-01-25 13:10:13 +01:00
lewtun	13d8392b78	Fix eval comamnds (#18 )	2025-01-25 12:31:40 +01:00
Lewis Tunstall	5ecc11b50a	Scale image	2025-01-25 10:22:38 +00:00

1 2

71 Commits