GitHub_collection_open-r1

Author	SHA1	Message	Date
Jingze Shi	e450a6fbc4	Recipes for optimzing training scripts (#120 ) * Add recipe configs to optimize scripts (#73) * remove small models * Add README for recipes * Add README for recipes * Attempt to resolve conflicts * Optimize src scripts * Update recipe of DeepSeek-R1-Distill-Qwen-7B * Update recipe of Qwen2.5-1.5B * Updated recipe readme for qwen * Update training command for recipes * Update README.md Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Update preprocessing_num_workers from 36 to 8 * Add small language model recipes for quickly verify R1 * Fix src code quality * Add back the Slurm job command * Remove recipe of doge * Fix torch_dtype is not used * fix grpo yaml * fix grpo yaml * fix deprecation warning * fix config folder location * Remove duplicate variables in grpo.py * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-31 12:41:53 +01:00
Dongwei Jiang	22512e62bc	Update README.md (#132 )	2025-01-31 11:27:17 +01:00
Sam Schorb	356f6a5c4f	Add Table of Contents to README for easier navigation (#125 ) * Update README.md * Update README.md	2025-01-30 16:32:13 +01:00
Kashif Rasul	c0b53fae29	Grpo slurm scripts (#112 ) * initial grpo.slurm script * initial zero3 yaml using 1 less gpu * add completion and promp length * initial doc * use main * fix typo * remove num_processes * use vllm 0.7.0 * remove double module load * update math-verify * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * overwrite num_procs in the slurm script * add vllm args to readme * update readme --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-30 10:22:45 +01:00
Lewis	fb1b4c4e3f	docs(README): note about CUDA 12.1 (#121 ) will segfault for CUDA 14.1 under certain conditions; instructions are specific to 12.1 - fixes #106 - fixes #117	2025-01-30 08:42:43 +01:00
Edward Beeching	bd0e15bfb5	Update README.md (#93 )	2025-01-30 00:42:29 +01:00
Mayur Pagote	7a7682b6a4	Corrected Typos in README.md (#110 )	2025-01-30 00:38:47 +01:00
Deborah Shekinah Jacob	971294b018	Modified: pip install --upgrade pip (#99 )	2025-01-29 19:55:54 +01:00
María Grandury	401a219575	fix typos (#40 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-29 12:37:11 +01:00
Andrés Marafioti	c4fdb69940	Change conda for uv (#91 ) * Change conda for uv * quentin's magical path	2025-01-28 21:16:48 +01:00
Gabriel Martín Blázquez	b03480d868	Add `--input-batch-size`, `--client-replicas` args and download Ray logs (#71 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-27 14:58:13 +01:00
Quentin Gallouédec	67764bc6ae	Update README.md (#72 )	2025-01-27 13:24:25 +01:00
María Grandury	d6f1a179a5	Implement make evaluate command (#41 ) * implement evaluate make command * add example usage of make evaluate to readme	2025-01-27 10:45:56 +01:00
CharlesCNorton	8d37c5c27f	docs: fix grammar and phrasing issues (1, 2, 3) (#62 ) 1. Insert missing article "a": - Original: "This repo is work in progress..." - Revised: "This repo is a work in progress..." Rationale: The article "a" is needed before "work in progress" to make the sentence grammatically correct. 2. Add "as well as" for parallelism: - Original: "...scripts to train and evaluate models as well generate synthetic data..." - Revised: "...scripts to train and evaluate models as well as generate synthetic data..." Rationale: "As well as" is the correct conjunction to link multiple verbs or verb phrases, improving clarity. 3. Clarify GPU resource phrasing: - Original: "we used 2 nodes of 8xH100 each one..." - Revised: "we used 2 nodes, each with 8×H100 GPUs..." Rationale: This rewording removes redundant language ("each one") and more clearly states that each node has eight H100 GPUs.	2025-01-27 10:45:34 +01:00
Ikko Eltociear Ashimine	7c01b59c44	docs: update README.md (#54 ) scipts -> scripts Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-26 18:48:07 +01:00
Anton Lozhkov	15df4fb134	vllm speed tweaks (#43 )	2025-01-26 01:59:50 +01:00
Agus	d98862a5c8	Add example of generating data with deepseek r1 and distilled models (#29 )	2025-01-25 17:34:21 +01:00
Manuel Romero	c27a974b99	Fix typo (#25 )	2025-01-25 15:16:56 +01:00
Quentin Gallouédec	f844eac629	Update README.md	2025-01-25 14:49:49 +01:00
Quentin Gallouédec	ff34d8f651	Handle error in verifier + deepspeed command (#17 ) * handle error in verification * command with zero2 and catch more error in verifier * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * deepseek distill and remove grad chekpoint * drop grad checkpoint * except --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-25 13:58:04 +01:00
lewtun	2580fd8c1b	Fix Slurm SFT and gather Slurm scripts (#19 ) * Fix slurm * Fix generate * Fix install * Fix c	2025-01-25 13:47:52 +01:00
lewtun	64b4927a33	Update README.md	2025-01-25 13:10:13 +01:00
lewtun	13d8392b78	Fix eval comamnds (#18 )	2025-01-25 12:31:40 +01:00
Lewis Tunstall	5ecc11b50a	Scale image	2025-01-25 10:22:38 +00:00
lewtun	7564de2c24	Add diagram (#16 )	2025-01-25 11:20:17 +01:00
Loubna Ben Allal	2ceba252a3	Add SFT command to the readme (#15 )	2025-01-25 10:56:33 +01:00
Gabriel Martín Blázquez	02bed5308c	Add synthetic data generation script (#9 ) * Add synthetic data generation script Co-authored-by: Anton <anton-l@users.noreply.github.com> Co-authored-by: Agustin <plaguss@users.noreply.github.com> * Fix format * Fix imports sorting --------- Co-authored-by: Anton <anton-l@users.noreply.github.com> Co-authored-by: Agustin <plaguss@users.noreply.github.com>	2025-01-25 01:42:24 +01:00
Quentin Gallouédec	05496dcdab	System prompt; Fix readme command	2025-01-25 00:21:21 +00:00
Quentin Gallouédec	b47b1d058b	GRPO script (#3 ) * inital commit * with reward func * fix box extract * example line * don't break when answer malformed * command and logging * holly simplicity * move grpo * reverse readme * instructions	2025-01-25 00:19:38 +01:00
lewtun	ca8f35c143	REFACTOR TO THE MAX (#7 )	2025-01-25 00:12:25 +01:00
lewtun	26184f71ae	Refactor evaluation (#6 )	2025-01-24 23:46:34 +01:00
Edward Beeching	9c398973e8	Adds Math-500 and AIME24 evals (#4 ) * adds evals * up max model len --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-01-24 23:09:07 +01:00
Leandro von Werra	52aefc29e2	Update README.md	2025-01-24 22:06:22 +01:00
lewtun	6acc9a0aa0	Add configs and stuff (#2 )	2025-01-24 20:05:18 +01:00
lewtun	83f9c6c8da	Initial commit	2025-01-24 16:44:12 +01:00

35 Commits