Exploring mergekit for Mannequin Merge and AutoEval for Mannequin Analysis | by Wenqi Glantz | Jan, 2024

Thank you for reading this post, don't forget to subscribe!

My observations from experimenting with mannequin merge, analysis, and fine-tuning

Wenqi Glantz

Towards Data Science
Picture generated by DALL-E 3 by the writer

Let’s proceed our studying journey of Maxime Labonne’s llm-course, which is pure gold for the group. This time, we are going to deal with mannequin merge and analysis.

Maxime has a terrific article titled Merge Massive Language Fashions with mergekit. I extremely advocate you test it out first. We is not going to repeat the steps he has already specified by his article, however we are going to discover some particulars I got here throughout that could be useful to you.

We’re going to experiment with mannequin merge and mannequin analysis within the following steps:

  • Utilizing LazyMergekit, we merge two fashions from the Hugging Face hub, mistralai/Mistral-7B-Instruct-v0.2 and jan-hq/trinity-v1.
  • Run AutoEval on the bottom mannequin mistralai/Mistral-7B-Instruct-v0.2.
  • Run AutoEval on the merged mannequin MistralTrinity-7b-slerp.
  • Positive-tune the merged mannequin with a personalized instruction dataset.
  • Run AutoEval on the fine-tuned mannequin.
Diagram by writer

Let’s dive in.

First, how will we choose which fashions to merge?

Figuring out whether or not two or a number of fashions will be merged entails evaluating a number of key attributes and concerns:

  1. Mannequin Structure: Mannequin structure is a vital consideration when merging fashions. Make sure the fashions share a suitable structure (e.g., each transformer-based). Merging dissimilar architectures is usually difficult. The Hugging Face mannequin card normally particulars a mannequin’s structure. Should you can’t discover the mannequin structure information, you may attempt to error with Maxime’s LazyMergekit, which we are going to discover later. Should you encounter an error, it’s normally due to the incompatibility of the mannequin architectures.
  2. Dependencies and Libraries: Be certain that…

Leave a Reply

Your email address will not be published. Required fields are marked *