DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation
Paper • 2511.06307 • Published • 53
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A complete training pipeline to fine-tune a code LLM for translating VB6 to C# using SFT + GRPO for maximum quality.
Qwen2.5-Coder-7B-Instruct
↓
SFT (360 examples, 3 epochs)
↓
simooo21/vb6-to-cs-qwen2.5-coder-7b-sft
↓
GRPO (reward: syntax + format + length, 2 epochs)
↓
simooo21/vb6-to-cs-qwen2.5-coder-7b-grpo
simooo21/vb6-to-csharp-translation[{"role": "user"}, {"role": "assistant"}])| Category | Examples | Key Mappings |
|---|---|---|
| variables | Dim → type declaration | Dim x As Integer → int x; |
| control_flow | If/Select Case → if/switch | Select Case → switch |
| loops | For/Do/For Each → for/while/foreach | For Each → foreach |
| functions | Sub/Function → methods | ByRef → ref, Optional → default params |
| arrays | 1-based → 0-based arrays | Dim arr(1 To 5) → int[] arr = new int[5] |
| strings | VB6 string funcs → C# methods | UCase/Trim/Len → ToUpper/Trim/Length |
| file_io | Open/Close → StreamReader/Writer | FreeFile → using statement |
| error_handling | On Error → try/catch | On Error GoTo → try { } catch |
| gui_events | Event subs → event handlers | cmd_Click() → cmd_Click(object, EventArgs) |
| gui_controls | VB6 controls → WinForms | ComboBox.AddItem → Items.Add |
| form_designer | .frm code → C# InitializeComponent | VB6 form layout → C# programmatic UI |
| database | ADO → SqlClient | ADODB.Connection → SqlConnection |
| api_calls | Declare → DllImport | Declare Function → [DllImport] |
| classes | VB6 class → C# class | Property Get/Let → C# properties |
| structs | Type → struct | Public Type → public struct |
| collections | Collection → Dictionary | Scripting.Dictionary → Dictionary<K,V> |
| regex | Like → Regex | Like "[A-Z]###" → Regex.IsMatch |
| enums | VB6 Enum → C# enum | Direct mapping with values |
| events | RaiseEvent → event invocation | RaiseEvent → ?.Invoke |
| interfaces | Implements → : interface | Implements INotifyPropertyChanged → : INotifyPropertyChanged |
| modules | Module-level → static class | Public Const → public const |
| dialogs | MsgBox/InputBox → MessageBox | vbModal → ShowDialog() |
| async | DoEvents → async/await | DoEvents → Application.DoEvents() / await |
| datetime | Date functions → DateTime | Now/DateAdd/DateDiff → DateTime.Now/AddDays |
| math | Rnd → Random | Rnd → Random.Next |
| formatting | Format → ToString | FormatCurrency → .ToString("C2") |
| conversions | CStr/CInt → Convert | CStr/CInt/CDate → Convert.ToString/Int32/DateTime |
| type_checks | IsNumeric → TryParse | IsNumeric → int.TryParse |
| operators | IIf → ternary | IIf(condition, a, b) → condition ? a : b |
| networking | Winsock → TcpClient | Winsock → TcpClient/NetworkStream |
| com_interop | CreateObject → Activator | CreateObject("Word.Application") → Activator.CreateInstance |
| printing | Printer → PrintDocument | Printer.Print → e.Graphics.DrawString |
| graphics | LoadPicture → Image.FromFile | Direct mapping |
| system | SendKeys → SendKeys.SendWait | Direct mapping |
| application | App.Path → Assembly | App.Path → Application.ExecutablePath |
| entry_point | Sub Main → static void Main | With [STAThread] attribute |
| null_checks | IsNull/Nothing → null/DBNull | Is Nothing → == null |
| access_modifiers | Friend → internal | Friend → internal |
Base model: Qwen/Qwen2.5-Coder-7B-Instruct
python train_sft.py
Hyperparameters (from OpenCodeInstruct):
| Parameter | Value | Rationale |
|---|---|---|
| lr | 5e-6 | Low LR for stable convergence on small dataset |
| epochs | 3 | Full coverage of 360 examples |
| batch | 1 × 8 accum | Effective batch 8 per GPU |
| seq_len | 1024 | All examples fit (max 570 tokens) |
| warmup | 50 steps | Smooth start |
| scheduler | cosine (default) | Proven for code tasks |
Base model: simooo21/vb6-to-cs-qwen2.5-coder-7b-sft
python train_grpo.py
Hyperparameters (from DRIVE / DeepSeekMath):
| Parameter | Value | Rationale |
|---|---|---|
| lr | 1e-6 | Very low for RL stability |
| epochs | 2 | Don't overfit on small data |
| num_generations | 8 | Group size per prompt |
| max_completion | 2048 | Long enough for code |
| rewards | syntax + format + length | Multi-objective optimization |
syntax_reward (0-1): Checks C# syntax patterns
format_reward (0-1): Checks for ```csharp code blocks
length_reward (0-1): Rewards reasonable code length (3-100 lines)
accelerate launchpython evaluate_model.py --model simooo21/vb6-to-cs-qwen2.5-coder-7b-grpo
Reports:
Per the literature review: