Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
๐
In a Training Loop
131.8
TFLOPS
NullSense
NullSense
4
1
29
Follow
0 followers
ยท
26 following
NullSense
AI & ML interests
None yet
Recent Activity
liked
a model
9 days ago
Comfy-Org/Krea-2
reacted
to
lbourdois
's
post
with ๐ฅ
10 days ago
We introduce FAT5 (Flash Attention T5) โก An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 different optimizations. The main one is that we have designed a CUDA kernel to expand the Flash Attention by @tridao with RPE biases and supports other PE such as RoPE, ALiBi or FIRE. The result kernel is 2 times faster than a SPDA implementation. We also use Triton kernels to optimize certain parts of the architecture, such as the cross-entropy and RMSNorm layer. The various kernels have been carefully built to be compatible with BF16 and torch.compile to go even faster and achieve efficient pretraining. All other optimizations are described in a ๐ subsequent blog post available on @huggingface ๐ค: https://huggingface.co/spaces/CATIE-AQ/FAT5-report. This methodology enabled us to efficiently pretrain as a proof of concept a FAT5 with 147M parameters in French in a reasonable time (1,461H for 419B tokens), with limited resources (1 A100 i.e. a computational budget of ~ โฌ1,900) and a low carbon footprint (13.5kg eq CO2). The model's weights are also available on Hugging Face: https://huggingface.co/CATIE-AQ/FAT5-small. Not very useful in practice, it's a PoC and not an instructed model (it's planned for later). All the code is available on GitHub if you want to pretrain your own model in your own language or for a specific ___domain: https://github.com/catie-aq/flashT5 โญ Ending by indicating that was a joint project with @BorisAlbar at hf.co/CATIE-AQ.
reacted
to
lbourdois
's
post
with โค๏ธ
10 days ago
We introduce FAT5 (Flash Attention T5) โก An implementation of T5 in PyTorch with UL2 objective optimized for GPGPU for both training and inference thanks to 13 different optimizations. The main one is that we have designed a CUDA kernel to expand the Flash Attention by @tridao with RPE biases and supports other PE such as RoPE, ALiBi or FIRE. The result kernel is 2 times faster than a SPDA implementation. We also use Triton kernels to optimize certain parts of the architecture, such as the cross-entropy and RMSNorm layer. The various kernels have been carefully built to be compatible with BF16 and torch.compile to go even faster and achieve efficient pretraining. All other optimizations are described in a ๐ subsequent blog post available on @huggingface ๐ค: https://huggingface.co/spaces/CATIE-AQ/FAT5-report. This methodology enabled us to efficiently pretrain as a proof of concept a FAT5 with 147M parameters in French in a reasonable time (1,461H for 419B tokens), with limited resources (1 A100 i.e. a computational budget of ~ โฌ1,900) and a low carbon footprint (13.5kg eq CO2). The model's weights are also available on Hugging Face: https://huggingface.co/CATIE-AQ/FAT5-small. Not very useful in practice, it's a PoC and not an instructed model (it's planned for later). All the code is available on GitHub if you want to pretrain your own model in your own language or for a specific ___domain: https://github.com/catie-aq/flashT5 โญ Ending by indicating that was a joint project with @BorisAlbar at hf.co/CATIE-AQ.
View all activity
Organizations
NullSense
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
utter-project/EuroMoE-2.6B-A0.6B-Instruct-2512
11 days ago
Please consider adding benchmarks
1
#3 opened 11 days ago by
NullSense
New activity in
yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF
15 days ago
Thank you and there is interest!
5
#8 opened 20 days ago by
NullSense
New activity in
nvidia/parakeet-tdt-0.6b-v2
about 1 year ago
ONNX conversion
16
#9 opened about 1 year ago by
Discipulab
quantized model?
6
#26 opened about 1 year ago by
alansrobotlab