Dongwei Jiang
736b59f9a3
Update grpo.py ( #171 )
2025-02-06 00:25:08 +01:00
Edward Beeching
3fd56dc7b4
fix uv env path + details ( #188 )
...
* fix uv env path + details
* Update slurm/grpo.slurm
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-02-05 23:59:25 +01:00
Lewis
138df0ca44
chore(setup.py): bump vllm>=0.7.1 ( #181 )
...
See https://github.com/huggingface/trl/pull/2766 .
2025-02-05 09:53:31 +01:00
Edward Beeching
5aff57c919
GRPO training args fixes ( #177 )
...
* grpo training args fixes
* style
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-02-05 09:09:45 +01:00
Matt
1fc8d425a9
Fix code quality after adding puzzles ( #178 )
2025-02-04 15:30:25 +00:00
Matt
b24ce903cb
Fix quality after adding puzzles
2025-02-04 15:18:23 +00:00
Matt
9ca4ed5a6a
Add puzzles
2025-02-04 15:09:21 +00:00
Matt
b2d7ba2f1d
Add puzzles
2025-02-04 14:34:57 +00:00
Kashif Rasul
a0d61ccece
use ruff ( #137 )
...
* use ruff
* reformat
* re-run
* update deps
* undo
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* fix help strings
* fix ruff version
* fix formatting
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-31 13:36:08 +01:00
Jingze Shi
e450a6fbc4
Recipes for optimzing training scripts ( #120 )
...
* Add recipe configs to optimize scripts (#73 )
* remove small models
* Add README for recipes
* Add README for recipes
* Attempt to resolve conflicts
* Optimize src scripts
* Update recipe of DeepSeek-R1-Distill-Qwen-7B
* Update recipe of Qwen2.5-1.5B
* Updated recipe readme for qwen
* Update training command for recipes
* Update README.md
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* Update preprocessing_num_workers from 36 to 8
* Add small language model recipes for quickly verify R1
* Fix src code quality
* Add back the Slurm job command
* Remove recipe of doge
* Fix torch_dtype is not used
* fix grpo yaml
* fix grpo yaml
* fix deprecation warning
* fix config folder location
* Remove duplicate variables in grpo.py
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-31 12:41:53 +01:00
Dongwei Jiang
22512e62bc
Update README.md ( #132 )
2025-01-31 11:27:17 +01:00
Edward Beeching
99d1083b7c
Adds async model push to PushToHubRevisionCallback ( #124 )
...
* adds async model push
* style
2025-01-31 09:17:10 +01:00
Quentin Gallouédec
6820d395be
Fix message removal in GRPO script ( #131 )
2025-01-30 21:51:18 +01:00
Sam Schorb
356f6a5c4f
Add Table of Contents to README for easier navigation ( #125 )
...
* Update README.md
* Update README.md
2025-01-30 16:32:13 +01:00
Kashif Rasul
c0b53fae29
Grpo slurm scripts ( #112 )
...
* initial grpo.slurm script
* initial zero3 yaml using 1 less gpu
* add completion and promp length
* initial doc
* use main
* fix typo
* remove num_processes
* use vllm 0.7.0
* remove double module load
* update math-verify
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* overwrite num_procs in the slurm script
* add vllm args to readme
* update readme
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-30 10:22:45 +01:00
Edward Beeching
972e47eff0
Adds auto eval callbacks ( #115 )
...
* adds auto eval callbacks
* updates training scripts with callbacks
* style
* date
* update gitignore with logs, eval results, etc
* remove unused imports
* nits
2025-01-30 09:39:47 +01:00
A Taylor
f7b8e527e8
Fix help text for --retries to match actual default value ( #103 )
...
The argparse help text for --retries stated the default was 3, but the actual default set in code is 0. This update corrects the help text to prevent confusion.
2025-01-30 09:38:30 +01:00
Lewis
fb1b4c4e3f
docs(README): note about CUDA 12.1 ( #121 )
...
will segfault for CUDA 14.1 under certain conditions; instructions are specific to 12.1
- fixes #106
- fixes #117
2025-01-30 08:42:43 +01:00
Edward Beeching
bd0e15bfb5
Update README.md ( #93 )
2025-01-30 00:42:29 +01:00
Mayur Pagote
7a7682b6a4
Corrected Typos in README.md ( #110 )
2025-01-30 00:38:47 +01:00
Deborah Shekinah Jacob
971294b018
Modified: pip install --upgrade pip ( #99 )
2025-01-29 19:55:54 +01:00
María Grandury
401a219575
fix typos ( #40 )
...
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-29 12:37:11 +01:00
Hynek Kydlíček
e2235cf978
Improve repoduction of r1 reported score ( #92 )
...
* bump up deps, fix aime24 evals, make grpo more strict
* minor fixes
* 🤨 fmt
* bump lighteval + set boxed to match first
* remove dead code
* bump lighteval
* add ed's tp branch swtich
---------
Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
2025-01-29 11:29:05 +01:00
Andrés Marafioti
c4fdb69940
Change conda for uv ( #91 )
...
* Change conda for uv
* quentin's magical path
2025-01-28 21:16:48 +01:00
Gabriel Martín Blázquez
c5941ed5e4
Add --timeout
, --retries
, --prompt-template
and TP and PP by slurm variables ( #94 )
...
* Set TP and PP using slurm variables
* Add `--timeout` argument
* add `--prompt-template` argument
* Group generations
* Add `--retries` argument
2025-01-28 18:30:04 +01:00
dependabot[bot]
e2e403daff
Bump actions/setup-python from 2 to 5 ( #75 )
...
Bumps [actions/setup-python](https://github.com/actions/setup-python ) from 2 to 5.
- [Release notes](https://github.com/actions/setup-python/releases )
- [Commits](https://github.com/actions/setup-python/compare/v2...v5 )
---
updated-dependencies:
- dependency-name: actions/setup-python
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-27 15:17:10 +01:00
dependabot[bot]
29015d7e4c
Bump actions/checkout from 2 to 4 ( #76 )
...
Bumps [actions/checkout](https://github.com/actions/checkout ) from 2 to 4.
- [Release notes](https://github.com/actions/checkout/releases )
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md )
- [Commits](https://github.com/actions/checkout/compare/v2...v4 )
---
updated-dependencies:
- dependency-name: actions/checkout
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-27 15:15:07 +01:00
Claudiu
9ab30a1ebe
Added dependabot integration ( #70 )
2025-01-27 15:11:49 +01:00
Gabriel Martín Blázquez
b03480d868
Add --input-batch-size
, --client-replicas
args and download Ray logs ( #71 )
...
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-27 14:58:13 +01:00
Quentin Gallouédec
67764bc6ae
Update README.md ( #72 )
2025-01-27 13:24:25 +01:00
María Grandury
d6f1a179a5
Implement make evaluate command ( #41 )
...
* implement evaluate make command
* add example usage of make evaluate to readme
2025-01-27 10:45:56 +01:00
CharlesCNorton
8d37c5c27f
docs: fix grammar and phrasing issues (1, 2, 3) ( #62 )
...
1. Insert missing article "a":
- Original: "This repo is work in progress..."
- Revised: "This repo is a work in progress..."
Rationale:
The article "a" is needed before "work in progress" to make the sentence grammatically correct.
2. Add "as well as" for parallelism:
- Original: "...scripts to train and evaluate models as well generate synthetic data..."
- Revised: "...scripts to train and evaluate models as well as generate synthetic data..."
Rationale:
"As well as" is the correct conjunction to link multiple verbs or verb phrases, improving clarity.
3. Clarify GPU resource phrasing:
- Original: "we used 2 nodes of 8xH100 each one..."
- Revised: "we used 2 nodes, each with 8×H100 GPUs..."
Rationale:
This rewording removes redundant language ("each one") and more clearly states that each node has eight H100 GPUs.
2025-01-27 10:45:34 +01:00
Edward Beeching
feb59d2b42
Update evaluate.slurm
...
typo in eval slurm
2025-01-27 09:43:02 +01:00
Ikko Eltociear Ashimine
7c01b59c44
docs: update README.md ( #54 )
...
scipts -> scripts
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-26 18:48:07 +01:00
Hynek Kydlíček
90b0947382
Reward verification and evaluation fixes ( #55 )
...
* bump up deps, fix aime24 evals, make grpo more strict
* minor fixes
* 🤨 fmt
* Update src/open_r1/grpo.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
---------
Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-26 18:35:48 +01:00
Anton Lozhkov
15df4fb134
vllm speed tweaks ( #43 )
2025-01-26 01:59:50 +01:00
Agus
d98862a5c8
Add example of generating data with deepseek r1 and distilled models ( #29 )
2025-01-25 17:34:21 +01:00
Quentin Gallouédec
e3a2864658
Simplify sft
2025-01-25 15:46:29 +00:00
elie
64c0ed2254
fix evaluate.slurm ( #27 )
2025-01-25 15:28:49 +01:00
elie
f169d2cd8e
add evaluate.slurm ( #26 )
2025-01-25 15:23:16 +01:00
Manuel Romero
c27a974b99
Fix typo ( #25 )
2025-01-25 15:16:56 +01:00
Gabriel Martín Blázquez
a90b99686a
Fix passing vLLM
server URL ( #21 )
...
* Use head node ip as vLLM server url
* Pass correct server url
* Add num_generations argument
* Fix style
* Remove `select`
---------
Co-authored-by: plaguss <agustin@argilla.io>
2025-01-25 15:01:15 +01:00
Quentin Gallouédec
f844eac629
Update README.md
2025-01-25 14:49:49 +01:00
elie
43cb6a0e0f
fix sft.slurm
2025-01-25 14:48:45 +01:00
Quentin Gallouédec
d0265f94ce
Copyrights ( #20 )
...
* handle error in verification
* command with zero2 and catch more error in verifier
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* deepseek distill and remove grad chekpoint
* drop grad checkpoint
* except
* copyrights
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-25 14:41:46 +01:00
Quentin Gallouédec
ff34d8f651
Handle error in verifier + deepspeed command ( #17 )
...
* handle error in verification
* command with zero2 and catch more error in verifier
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* deepseek distill and remove grad chekpoint
* drop grad checkpoint
* except
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-25 13:58:04 +01:00
lewtun
2580fd8c1b
Fix Slurm SFT and gather Slurm scripts ( #19 )
...
* Fix slurm
* Fix generate
* Fix install
* Fix c
2025-01-25 13:47:52 +01:00
lewtun
64b4927a33
Update README.md
2025-01-25 13:10:13 +01:00
lewtun
13d8392b78
Fix eval comamnds ( #18 )
2025-01-25 12:31:40 +01:00
Lewis Tunstall
5ecc11b50a
Scale image
2025-01-25 10:22:38 +00:00