Syn-2.6M

Summary

Task: Text-Generation
Total training time: 1.8 hours
Inputs: text
Outputs: text
Params: 2,604,210
Final Loss: 2.37
Important Benchmark Scores:
   1. ARC Easy - 32.11%
   2. BLiMP - 65.33%
   3. HellaSwag - 27.03%
Framework: PyTorch, transformers
Authors: Paul Courneya, Jonathan LY

Description

Syn is a Tiny Language Model (TLM) trained on 2.7 billion tokens of synthetic data. The name Syn is an abbreviation of synthetic, reflecting the type of data the model was trained on.

Model Details

Architecture: Qwen3.5
Hidden Size: 140
Number of Layers: 9
Intermediate Size: 392 (a 2.8x expansion)
Number of Attention Heads: 4
Number of KV Heads: 1
Head Dim: 35
Vocab Size: 3584
Max Position Embeddings: 640
Total Parameters: 2,604,210

Training

Training Details

Maximum Learning Rate: 3e-3
Minimum Learning Rate: 0
Number of Epochs: 1
Sequence Length: 256 (yes, we accidentally trained at 256 instead of 640)
Batch Size: 428
Eval Split Ratio: 0.006
Gradient Accumulation Steps: 2
Gradient Checkpointing: True
Gradient Clipping: 1.0
Torch Compile: True
Torch Compile Mode: max-autotune-no-cudagraphs
AdamW Betas: (0.9, 0.95)
WSD Warmup Ratio: 0.02
WSD Stable Ratio: 0.78
WSD Decay Ratio: 0.20
DType: float16

Dataset

Dataset	Bytes	Size	Share
FinePhrase	8,000,000,000	8.000 GB	61.61%
Tiny-Strange-Textbooks	4,000,000,000	4.000 GB	30.81%
TinyStoriesv2	700,000,000	0.700 GB	5.39%
LongPage	284,000,000	0.284 GB	2.19%

Note: the byte counts are rounded.

Final Eval and Train Loss

Train: 2.37
Val: 2.357

Hardware

GPU: NVIDIA RTX 2060 (used for training)
CPU: AMD Ryzen 5 2600 (used for tokenization)

Benchmarks

Task	Value
BLiMP	65.33%
ARC Easy	32.11%
ARC Challenge	20.39%
HellaSwag	27.03%
SWAG	33.38%
PiQA	53.48%

ArithMark-2.0:

Ops = 1	Ops = 2	Ops = 3	Avg
25.04%	30.13%	24.60%	26.48%

For a comparison with other small language models like this one, go here.

Generation Sample

Prompt : 'Artificial intelligence is'
------------------------------------------------------------
Generated:
 a form of artificial intelligence that involves creating an attractive and reliable source of information. This involves using advanced technology to create interactive, user-friendly platforms for users who are looking to use them in their daily lives.
## II. Why Use Applications?
To enhance the user experience, it's important to have the opportunity to learn how to write and understand your content properly. To ensure that you are using new tools, take some time to read, then let go of yourself or others through conversations with the user as they look at them. Additionally, by practicing self-assessment, users can gain more control over their own ideas.
### IV. Conclusion
In this lesson, we learned about the different types of apps used for frames, their importance, and its applications. We also discussed the practical aspects of the application and practical applications of Frameworks. By understanding these concepts, readers can apply these skills to various scenarios in their careers.

Use Cases

Educational work and research
Fine-tuning for downstream use
Deployment on edge devices
Or just for fun.

Limitations

Cannot chat, reason, code, or answer questions
Almost always unfactual
No long-context handling

License

Before using, distributing, selling, or modifying this software, you must read the license here.

Inference

#!/usr/bin/env python3

MODEL_DIR = "fromziro/Syn-2.6M"
TOKENIZER_PATH = MODEL_DIR

PROMPT = "Artificial intelligence is"
MAX_NEW_TOKENS = 256
TEMPERATURE = 0.7
TOP_P = 0.95
TOP_K = 30
REPETITION_PENALTY = 1.2
DO_SAMPLE = True

import torch
from pathlib import Path
from transformers import AutoModelForCausalLM, AutoTokenizer, PreTrainedTokenizerFast

device = (
    "cuda" if torch.cuda.is_available() else
    "mps" if torch.backends.mps.is_available() else
    "cpu"
)
print(f"Device : {device}")

def load_tokenizer(path_or_repo: str):
    p = Path(path_or_repo)

    if p.exists() and p.is_file() and p.suffix.lower() == ".json":
        tok = PreTrainedTokenizerFast(tokenizer_file=str(p.resolve()))
    else:
        tok = AutoTokenizer.from_pretrained(path_or_repo, use_fast=True)

    if tok.bos_token is None:
        tok.add_special_tokens({"bos_token": "<|bos|>"})
    if tok.eos_token is None:
        tok.add_special_tokens({"eos_token": "<|eos|>"})
    if tok.unk_token is None:
        tok.add_special_tokens({"unk_token": "<|unk|>"})
    if tok.pad_token is None:
        tok.pad_token = tok.eos_token if tok.eos_token is not None else "<|pad|>"

    tok.padding_side = "left"
    return tok

print("Loading tokenizer...")
tokenizer = load_tokenizer(TOKENIZER_PATH)
print(f"  Vocab size : {len(tokenizer)}")
print(f"  BOS        : {tokenizer.bos_token!r}")
print(f"  EOS        : {tokenizer.eos_token!r}")
print(f"  PAD        : {tokenizer.pad_token!r}  (id={tokenizer.pad_token_id})")

print(f"\nLoading model from {MODEL_DIR} ...")
model = AutoModelForCausalLM.from_pretrained(
    MODEL_DIR,
    torch_dtype=torch.float16 if device == "cuda" else torch.float32,
    low_cpu_mem_usage=True,
)

model.eval()
model.to(device)
model.config.use_cache = False
if hasattr(model, "generation_config") and model.generation_config is not None:
    model.generation_config.use_cache = False

total_params = sum(p.numel() for p in model.parameters())
print(f"  Parameters : {total_params:,}")

def generate(
    prompt: str = PROMPT,
    max_new_tokens: int = MAX_NEW_TOKENS,
    temperature: float = TEMPERATURE,
    top_p: float = TOP_P,
    top_k: int = TOP_K,
    repetition_penalty: float = REPETITION_PENALTY,
    do_sample: bool = DO_SAMPLE,
) -> str:
    bos = tokenizer.bos_token or ""
    full_prompt = bos + prompt

    inputs = tokenizer(
        full_prompt,
        return_tensors="pt",
        add_special_tokens=False,
    ).to(device)

    inputs.pop("token_type_ids", None)

    gen_kwargs = dict(
        max_new_tokens=max_new_tokens,
        do_sample=do_sample,
        repetition_penalty=repetition_penalty,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        use_cache=False,
    )

    if do_sample:
        gen_kwargs["temperature"] = temperature
        gen_kwargs["top_p"] = top_p
        gen_kwargs["top_k"] = top_k

    with torch.inference_mode():
        output_ids = model.generate(**inputs, **gen_kwargs)

    prompt_len = inputs["input_ids"].shape[-1]
    new_ids = output_ids[0][prompt_len:]
    return tokenizer.decode(new_ids, skip_special_tokens=True)

if __name__ == "__main__":
    print(f"\nPrompt : {PROMPT!r}")
    print("-" * 60)
    output = generate(PROMPT)
    print("Generated:")
    print(output)

Copyright

Copyright (c) 2026 FromZero  
Copyright (c) 2026 Paul Courneya
Copyright (c) 2026 Jonathan LY

Citation

@misc{syn2.6m,
  title     = {Syn-2.6M: A Tiny Languagte Model trained on 2.7B tokens of Synthetic Data},
  author    = {FromZero},
  year      = {2026},
  url       = {https://huggingface.co/fromziro/Syn-2.6M}
}

Downloads last month: 82

Safetensors

Model size

2.6M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

fromziro
/

Syn-2.6M