Beginner guide to finetune model

Hello everyone! Welcome to Virtual Protocol (VP). You are here today because you have been invited to be the contributors on our platform. Currently, the characters on Virtual Protocol have 3 cores, the Cognitive Core, Audio Core and Visual Core. Today we will be focusing only the Cognitive Core which largely made up of LLM datasets and models.

You, being the contributor on VP, means you will be uploading datasets or LLM models onto our platform. This process itself can be intellectual stimulating in LLM terms because you will be experiencing the full end-to-end process from collecting datasets, pre-process datasets and then finally finetune to a functional model.

Do note this guide is an instructional guide that get you started, I will not include much on how LLM works and why do we use certain steps. There are a lot of conceptual learning material that can be found here.

Let’s get started!

(remember, this is an instructional guide, if you wish to learn more in-depth, feel free to ask in discord channel)

Llama Factory: Webui for finetuning

Download model from https://github.com/hiyouga/LLaMA-Factory
Have a quick read, then choose the Colab service. Link: https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing
In Google colab, connect to free runtime T4.
Run “Install Dependencies”
Skip “Log in with Hugging Face account” first. ****You will need your Huggingface token if you want to upload your model.
Head to the section “Fine-tune model via LLaMA Board”, change the code from create_ui().queue().launch(share=True) → create_ui().queue().launch(share=True, debug=True) . Adding debug=True helps in the future trouble shooting.
Click Run and wait for it to complete.
Click the public URL, that looks like https://xxxxxxxxxxxx.gradio.live/
Now you should see a webui in a new tab.
To ensure things work properly, we can start with basic settings.