71 Commits

Author SHA1 Message Date
Dongwei Jiang
736b59f9a3
Update grpo.py (#171) 2025-02-06 00:25:08 +01:00
Edward Beeching
3fd56dc7b4
fix uv env path + details (#188)
* fix uv env path + details

* Update slurm/grpo.slurm

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-02-05 23:59:25 +01:00
Lewis
138df0ca44
chore(setup.py): bump vllm>=0.7.1 (#181)
See https://github.com/huggingface/trl/pull/2766.
2025-02-05 09:53:31 +01:00
Edward Beeching
5aff57c919
GRPO training args fixes (#177)
* grpo training args fixes

* style

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-02-05 09:09:45 +01:00
Matt
1fc8d425a9
Fix code quality after adding puzzles (#178) 2025-02-04 15:30:25 +00:00
Matt
b24ce903cb Fix quality after adding puzzles 2025-02-04 15:18:23 +00:00
Matt
9ca4ed5a6a Add puzzles 2025-02-04 15:09:21 +00:00
Matt
b2d7ba2f1d Add puzzles 2025-02-04 14:34:57 +00:00
Kashif Rasul
a0d61ccece
use ruff (#137)
* use ruff

* reformat

* re-run

* update deps

* undo

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* fix help strings

* fix ruff version

* fix formatting

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-31 13:36:08 +01:00
Jingze Shi
e450a6fbc4
Recipes for optimzing training scripts (#120)
* Add recipe configs to optimize scripts (#73)

* remove small models

* Add README for recipes

* Add README for recipes

* Attempt to resolve conflicts

* Optimize src scripts

* Update recipe of DeepSeek-R1-Distill-Qwen-7B

* Update recipe of Qwen2.5-1.5B

* Updated recipe readme for qwen

* Update training command for recipes

* Update README.md

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

* Update preprocessing_num_workers from 36 to 8

* Add small language model recipes for quickly verify R1

* Fix src code quality

* Add back the Slurm job command

* Remove recipe of doge

* Fix torch_dtype is not used

* fix grpo yaml

* fix grpo yaml

* fix deprecation warning

* fix config folder location

* Remove duplicate variables in grpo.py

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-31 12:41:53 +01:00
Dongwei Jiang
22512e62bc
Update README.md (#132) 2025-01-31 11:27:17 +01:00
Edward Beeching
99d1083b7c
Adds async model push to PushToHubRevisionCallback (#124)
* adds async model push

* style
2025-01-31 09:17:10 +01:00
Quentin Gallouédec
6820d395be
Fix message removal in GRPO script (#131) 2025-01-30 21:51:18 +01:00
Sam Schorb
356f6a5c4f
Add Table of Contents to README for easier navigation (#125)
* Update README.md

* Update README.md
2025-01-30 16:32:13 +01:00
Kashif Rasul
c0b53fae29
Grpo slurm scripts (#112)
* initial grpo.slurm script

* initial zero3 yaml using 1 less gpu

* add completion and promp length

* initial doc

* use main

* fix typo

* remove num_processes

* use vllm 0.7.0

* remove double module load

* update math-verify

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* overwrite num_procs in the slurm script

* add vllm args to readme

* update readme

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-30 10:22:45 +01:00
Edward Beeching
972e47eff0
Adds auto eval callbacks (#115)
* adds auto eval callbacks

* updates training scripts with callbacks

* style

* date

* update gitignore with logs, eval results, etc

* remove unused imports

* nits
2025-01-30 09:39:47 +01:00
A Taylor
f7b8e527e8
Fix help text for --retries to match actual default value (#103)
The argparse help text for --retries stated the default was 3, but the actual default set in code is 0. This update corrects the help text to prevent confusion.
2025-01-30 09:38:30 +01:00
Lewis
fb1b4c4e3f
docs(README): note about CUDA 12.1 (#121)
will segfault for CUDA 14.1 under certain conditions; instructions are specific to 12.1

- fixes #106 
- fixes #117
2025-01-30 08:42:43 +01:00
Edward Beeching
bd0e15bfb5
Update README.md (#93) 2025-01-30 00:42:29 +01:00
Mayur Pagote
7a7682b6a4
Corrected Typos in README.md (#110) 2025-01-30 00:38:47 +01:00
Deborah Shekinah Jacob
971294b018
Modified: pip install --upgrade pip (#99) 2025-01-29 19:55:54 +01:00
María Grandury
401a219575
fix typos (#40)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-29 12:37:11 +01:00
Hynek Kydlíček
e2235cf978
Improve repoduction of r1 reported score (#92)
* bump up deps, fix aime24 evals, make grpo more strict

* minor fixes

* 🤨 fmt

* bump lighteval + set boxed to match first

* remove dead code

* bump lighteval

* add ed's tp branch swtich

---------

Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
2025-01-29 11:29:05 +01:00
Andrés Marafioti
c4fdb69940
Change conda for uv (#91)
* Change conda for uv

* quentin's magical path
2025-01-28 21:16:48 +01:00
Gabriel Martín Blázquez
c5941ed5e4
Add --timeout, --retries, --prompt-template and TP and PP by slurm variables (#94)
* Set TP and PP using slurm variables

* Add `--timeout` argument

* add `--prompt-template` argument

* Group generations

* Add `--retries` argument
2025-01-28 18:30:04 +01:00
dependabot[bot]
e2e403daff
Bump actions/setup-python from 2 to 5 (#75)
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 5.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](https://github.com/actions/setup-python/compare/v2...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-27 15:17:10 +01:00
dependabot[bot]
29015d7e4c
Bump actions/checkout from 2 to 4 (#76)
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v2...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-27 15:15:07 +01:00
Claudiu
9ab30a1ebe
Added dependabot integration (#70) 2025-01-27 15:11:49 +01:00
Gabriel Martín Blázquez
b03480d868
Add --input-batch-size, --client-replicas args and download Ray logs (#71)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-27 14:58:13 +01:00
Quentin Gallouédec
67764bc6ae
Update README.md (#72) 2025-01-27 13:24:25 +01:00
María Grandury
d6f1a179a5
Implement make evaluate command (#41)
* implement evaluate make command

* add example usage of make evaluate to readme
2025-01-27 10:45:56 +01:00
CharlesCNorton
8d37c5c27f
docs: fix grammar and phrasing issues (1, 2, 3) (#62)
1. Insert missing article "a":
   - Original: "This repo is work in progress..."
   - Revised:  "This repo is a work in progress..."
   Rationale:
     The article "a" is needed before "work in progress" to make the sentence grammatically correct.

2. Add "as well as" for parallelism:
   - Original: "...scripts to train and evaluate models as well generate synthetic data..."
   - Revised:  "...scripts to train and evaluate models as well as generate synthetic data..."
   Rationale:
     "As well as" is the correct conjunction to link multiple verbs or verb phrases, improving clarity.

3. Clarify GPU resource phrasing:
   - Original: "we used 2 nodes of 8xH100 each one..."
   - Revised:  "we used 2 nodes, each with 8×H100 GPUs..."
   Rationale:
     This rewording removes redundant language ("each one") and more clearly states that each node has eight H100 GPUs.
2025-01-27 10:45:34 +01:00
Edward Beeching
feb59d2b42
Update evaluate.slurm
typo in eval slurm
2025-01-27 09:43:02 +01:00
Ikko Eltociear Ashimine
7c01b59c44
docs: update README.md (#54)
scipts -> scripts

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-26 18:48:07 +01:00
Hynek Kydlíček
90b0947382
Reward verification and evaluation fixes (#55)
* bump up deps, fix aime24 evals, make grpo more strict

* minor fixes

* 🤨 fmt

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-26 18:35:48 +01:00
Anton Lozhkov
15df4fb134
vllm speed tweaks (#43) 2025-01-26 01:59:50 +01:00
Agus
d98862a5c8
Add example of generating data with deepseek r1 and distilled models (#29) 2025-01-25 17:34:21 +01:00
Quentin Gallouédec
e3a2864658 Simplify sft 2025-01-25 15:46:29 +00:00
elie
64c0ed2254
fix evaluate.slurm (#27) 2025-01-25 15:28:49 +01:00
elie
f169d2cd8e
add evaluate.slurm (#26) 2025-01-25 15:23:16 +01:00
Manuel Romero
c27a974b99
Fix typo (#25) 2025-01-25 15:16:56 +01:00
Gabriel Martín Blázquez
a90b99686a
Fix passing vLLM server URL (#21)
* Use head node ip as vLLM server url

* Pass correct server url

* Add num_generations argument

* Fix style

* Remove `select`

---------

Co-authored-by: plaguss <agustin@argilla.io>
2025-01-25 15:01:15 +01:00
Quentin Gallouédec
f844eac629
Update README.md 2025-01-25 14:49:49 +01:00
elie
43cb6a0e0f
fix sft.slurm 2025-01-25 14:48:45 +01:00
Quentin Gallouédec
d0265f94ce
Copyrights (#20)
* handle error in verification

* command with zero2 and catch more error in verifier

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* deepseek distill and remove grad chekpoint

* drop grad checkpoint

* except

* copyrights

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-25 14:41:46 +01:00
Quentin Gallouédec
ff34d8f651
Handle error in verifier + deepspeed command (#17)
* handle error in verification

* command with zero2 and catch more error in verifier

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* deepseek distill and remove grad chekpoint

* drop grad checkpoint

* except

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-25 13:58:04 +01:00
lewtun
2580fd8c1b
Fix Slurm SFT and gather Slurm scripts (#19)
* Fix slurm

* Fix generate

* Fix install

* Fix c
2025-01-25 13:47:52 +01:00
lewtun
64b4927a33
Update README.md 2025-01-25 13:10:13 +01:00
lewtun
13d8392b78
Fix eval comamnds (#18) 2025-01-25 12:31:40 +01:00
Lewis Tunstall
5ecc11b50a Scale image 2025-01-25 10:22:38 +00:00