-
lllyx/Qwen3-1.7B-SFT
Text Generation • 2B • Updated • 700 • 4 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 109 -
lllyx/Qwen3-4B-Base-GRPO
Text Generation • 4B • Updated • 411 • 3 -
lllyx/OpenThought3-Qwen3-4B
Viewer • Updated • 305k • 184 • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2604.13016
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 61 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
Code as Agent Harness
Paper • 2605.18747 • Published • 213 -
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
Paper • 2605.12500 • Published • 191 -
From Context to Skills: Can Language Models Learn from Context Skillfully?
Paper • 2604.27660 • Published • 166 -
PhysBrain 1.0 Technical Report
Paper • 2605.15298 • Published • 143
-
Visual Spatial Tuning
Paper • 2511.05491 • Published • 53 -
Adam's Law: Textual Frequency Law on Large Language Models
Paper • 2604.02176 • Published • 506 -
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
Paper • 2604.10098 • Published • 82 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 109
-
lllyx/Qwen3-1.7B-SFT
Text Generation • 2B • Updated • 700 • 4 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 109 -
lllyx/Qwen3-4B-Base-GRPO
Text Generation • 4B • Updated • 411 • 3 -
lllyx/OpenThought3-Qwen3-4B
Viewer • Updated • 305k • 184 • 2
-
Code as Agent Harness
Paper • 2605.18747 • Published • 213 -
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
Paper • 2605.12500 • Published • 191 -
From Context to Skills: Can Language Models Learn from Context Skillfully?
Paper • 2604.27660 • Published • 166 -
PhysBrain 1.0 Technical Report
Paper • 2605.15298 • Published • 143
-
Visual Spatial Tuning
Paper • 2511.05491 • Published • 53 -
Adam's Law: Textual Frequency Law on Large Language Models
Paper • 2604.02176 • Published • 506 -
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
Paper • 2604.10098 • Published • 82 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 109
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 61 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64