My observations from experimenting with mannequin merge, analysis, and fine-tuning
Let’s proceed our studying journey of’s , which is pure gold for the group. This time, we are going to deal with mannequin merge and analysis.
Maxime has a terrific article titled. I extremely advocate you test it out first. We is not going to repeat the steps he has already specified by his article, however we are going to discover some particulars I got here throughout that could be useful to you.
We’re going to experiment with mannequin merge and mannequin analysis within the following steps:
, we merge two fashions from the Hugging Face hub,
- Run AutoEval on the bottom mannequin
- Run AutoEval on the merged mannequin
- Positive-tune the merged mannequin with a personalized instruction dataset.
- Run AutoEval on the fine-tuned mannequin.
Let’s dive in.
First, how will we choose which fashions to merge?
Figuring out whether or not two or a number of fashions will be merged entails evaluating a number of key attributes and concerns:
- Mannequin Structure: Mannequin structure is a vital consideration when merging fashions. Make sure the fashions share a suitable structure (e.g., each transformer-based). Merging dissimilar architectures is usually difficult. The Hugging Face mannequin card normally particulars a mannequin’s structure. Should you can’t discover the mannequin structure information, you may attempt to error with Maxime’s , which we are going to discover later. Should you encounter an error, it’s normally due to the incompatibility of the mannequin architectures.
- Dependencies and Libraries: Be certain that…