CCoE: A Compact LLM with Collaboration of Experts
Abstract
CCoE architecture combines multiple ___domain-specific LLMs through a CoE layer to enhance a base LLM's performance with reduced training and inference resources.
In the ___domain of Large Language Model (LLM), LLMs demonstrate significant capabilities in natural language understanding and generation. With the growing needs of applying LLMs on various domains, it is a research question that how to efficiently train and build a model that has expertise in different domains but with a low training cost. We propose CCoE architecture, a framework of easily coupling multiple strong ___domain experts together to fuse into a big LLM, provides a collective way of utilizing the different ___domain expert LLMs. Besides, training a large collaborative of multiple expert LLMs requires a high requirements on training sources. CCoE bypasses this problem through isolating other experts and train each expert separately. The design of CCoE assembles multiple expert LLMs through the CoE (Collaboration of Experts) layer. Each CoE layer could have one or more expert LLMs. Expert LLMs have different number of layers and have been well-trained for different ___domain tasks. Each expert is fine-tuned to be able to achieve the comparable results with SOTA ___domain LLMs. We start from 5 experts in the ___domain of Code, Math, Law, text-to-SQL and Medical. The results indicate that our CCoE framework can easily and efficiently boost nearly 10%-20% performance on original base model in different domains but using less resources on training, as well as inference.
Get this paper in your agent:
hf papers read 2407.11686 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper