35 Commits

Author SHA1 Message Date
Jingze Shi
e450a6fbc4
Recipes for optimzing training scripts (#120)
* Add recipe configs to optimize scripts (#73)

* remove small models

* Add README for recipes

* Add README for recipes

* Attempt to resolve conflicts

* Optimize src scripts

* Update recipe of DeepSeek-R1-Distill-Qwen-7B

* Update recipe of Qwen2.5-1.5B

* Updated recipe readme for qwen

* Update training command for recipes

* Update README.md

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

* Update preprocessing_num_workers from 36 to 8

* Add small language model recipes for quickly verify R1

* Fix src code quality

* Add back the Slurm job command

* Remove recipe of doge

* Fix torch_dtype is not used

* fix grpo yaml

* fix grpo yaml

* fix deprecation warning

* fix config folder location

* Remove duplicate variables in grpo.py

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-31 12:41:53 +01:00
Dongwei Jiang
22512e62bc
Update README.md (#132) 2025-01-31 11:27:17 +01:00
Sam Schorb
356f6a5c4f
Add Table of Contents to README for easier navigation (#125)
* Update README.md

* Update README.md
2025-01-30 16:32:13 +01:00
Kashif Rasul
c0b53fae29
Grpo slurm scripts (#112)
* initial grpo.slurm script

* initial zero3 yaml using 1 less gpu

* add completion and promp length

* initial doc

* use main

* fix typo

* remove num_processes

* use vllm 0.7.0

* remove double module load

* update math-verify

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* overwrite num_procs in the slurm script

* add vllm args to readme

* update readme

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-30 10:22:45 +01:00
Lewis
fb1b4c4e3f
docs(README): note about CUDA 12.1 (#121)
will segfault for CUDA 14.1 under certain conditions; instructions are specific to 12.1

- fixes #106 
- fixes #117
2025-01-30 08:42:43 +01:00
Edward Beeching
bd0e15bfb5
Update README.md (#93) 2025-01-30 00:42:29 +01:00
Mayur Pagote
7a7682b6a4
Corrected Typos in README.md (#110) 2025-01-30 00:38:47 +01:00
Deborah Shekinah Jacob
971294b018
Modified: pip install --upgrade pip (#99) 2025-01-29 19:55:54 +01:00
María Grandury
401a219575
fix typos (#40)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-29 12:37:11 +01:00
Andrés Marafioti
c4fdb69940
Change conda for uv (#91)
* Change conda for uv

* quentin's magical path
2025-01-28 21:16:48 +01:00
Gabriel Martín Blázquez
b03480d868
Add --input-batch-size, --client-replicas args and download Ray logs (#71)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-27 14:58:13 +01:00
Quentin Gallouédec
67764bc6ae
Update README.md (#72) 2025-01-27 13:24:25 +01:00
María Grandury
d6f1a179a5
Implement make evaluate command (#41)
* implement evaluate make command

* add example usage of make evaluate to readme
2025-01-27 10:45:56 +01:00
CharlesCNorton
8d37c5c27f
docs: fix grammar and phrasing issues (1, 2, 3) (#62)
1. Insert missing article "a":
   - Original: "This repo is work in progress..."
   - Revised:  "This repo is a work in progress..."
   Rationale:
     The article "a" is needed before "work in progress" to make the sentence grammatically correct.

2. Add "as well as" for parallelism:
   - Original: "...scripts to train and evaluate models as well generate synthetic data..."
   - Revised:  "...scripts to train and evaluate models as well as generate synthetic data..."
   Rationale:
     "As well as" is the correct conjunction to link multiple verbs or verb phrases, improving clarity.

3. Clarify GPU resource phrasing:
   - Original: "we used 2 nodes of 8xH100 each one..."
   - Revised:  "we used 2 nodes, each with 8×H100 GPUs..."
   Rationale:
     This rewording removes redundant language ("each one") and more clearly states that each node has eight H100 GPUs.
2025-01-27 10:45:34 +01:00
Ikko Eltociear Ashimine
7c01b59c44
docs: update README.md (#54)
scipts -> scripts

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-26 18:48:07 +01:00
Anton Lozhkov
15df4fb134
vllm speed tweaks (#43) 2025-01-26 01:59:50 +01:00
Agus
d98862a5c8
Add example of generating data with deepseek r1 and distilled models (#29) 2025-01-25 17:34:21 +01:00
Manuel Romero
c27a974b99
Fix typo (#25) 2025-01-25 15:16:56 +01:00
Quentin Gallouédec
f844eac629
Update README.md 2025-01-25 14:49:49 +01:00
Quentin Gallouédec
ff34d8f651
Handle error in verifier + deepspeed command (#17)
* handle error in verification

* command with zero2 and catch more error in verifier

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* deepseek distill and remove grad chekpoint

* drop grad checkpoint

* except

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-25 13:58:04 +01:00
lewtun
2580fd8c1b
Fix Slurm SFT and gather Slurm scripts (#19)
* Fix slurm

* Fix generate

* Fix install

* Fix c
2025-01-25 13:47:52 +01:00
lewtun
64b4927a33
Update README.md 2025-01-25 13:10:13 +01:00
lewtun
13d8392b78
Fix eval comamnds (#18) 2025-01-25 12:31:40 +01:00
Lewis Tunstall
5ecc11b50a Scale image 2025-01-25 10:22:38 +00:00
lewtun
7564de2c24
Add diagram (#16) 2025-01-25 11:20:17 +01:00
Loubna Ben Allal
2ceba252a3
Add SFT command to the readme (#15) 2025-01-25 10:56:33 +01:00
Gabriel Martín Blázquez
02bed5308c
Add synthetic data generation script (#9)
* Add synthetic data generation script

Co-authored-by: Anton <anton-l@users.noreply.github.com>
Co-authored-by: Agustin <plaguss@users.noreply.github.com>

* Fix format

* Fix imports sorting

---------

Co-authored-by: Anton <anton-l@users.noreply.github.com>
Co-authored-by: Agustin <plaguss@users.noreply.github.com>
2025-01-25 01:42:24 +01:00
Quentin Gallouédec
05496dcdab System prompt; Fix readme command 2025-01-25 00:21:21 +00:00
Quentin Gallouédec
b47b1d058b
GRPO script (#3)
* inital commit

* with reward func

* fix box extract

* example line

* don't break when answer malformed

* command and logging

* holly simplicity

* move grpo

* reverse readme

* instructions
2025-01-25 00:19:38 +01:00
lewtun
ca8f35c143
REFACTOR TO THE MAX (#7) 2025-01-25 00:12:25 +01:00
lewtun
26184f71ae
Refactor evaluation (#6) 2025-01-24 23:46:34 +01:00
Edward Beeching
9c398973e8
Adds Math-500 and AIME24 evals (#4)
* adds evals

* up max model len

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-01-24 23:09:07 +01:00
Leandro von Werra
52aefc29e2
Update README.md 2025-01-24 22:06:22 +01:00
lewtun
6acc9a0aa0
Add configs and stuff (#2) 2025-01-24 20:05:18 +01:00
lewtun
83f9c6c8da
Initial commit 2025-01-24 16:44:12 +01:00