MIXTRAL 8x7B - Mixture of Experts¶

This will not run on the free T4 GPU from Google Colab. You will need A100 to run this.

Install Required Packages¶

In [ ]:

  Copied!     
 
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets scipy
!pip install -q trl
!pip install flash-attn --no-build-isolation
!pip install -q -U bitsandbytes !pip install -q -U git+https://github.com/huggingface/transformers.git !pip install -q -U git+https://github.com/huggingface/peft.git !pip install -q -U git+https://github.com/huggingface/accelerate.git !pip install -q datasets scipy !pip install -q trl !pip install flash-attn --no-build-isolation

Loading the Base Model¶

Load the model in 4bit, with double quantization, with bfloat16 as the compute dtype.

In this case we are using the instruct-tuned model - instead of the base model. For fine-tuning a base model will need a lot more data!

Load dataset for finetuning¶

Lets Load the Dataset¶

For this tutorial, we will fine-tune Mistral 7B Instruct for code generation.

We will be using this dataset which is curated by TokenBender (e/xperiments) and is an excellent data source for fine-tuning models for code generation. It follows the alpaca style of instructions, which is an excellent starting point for this task. The dataset structure should resemble the following:

{
  "instruction": "Create a function to calculate the sum of a sequence of integers.",
  "input": "[1, 2, 3, 4, 5]",
  "output": "# Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum"
}

In [9]:

  Copied!     
 
model_id = "mistralai/Mixtral-8x7B-v0.1"
model_id = "mistralai/Mixtral-8x7B-v0.1"

In [10]:

  Copied!     
 
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)
import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig nf4_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16 )

In [ ]:

  Copied!     
 
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map='auto',
    quantization_config=nf4_config,
    use_cache=False,
    attn_implementation="flash_attention_2"

)
model = AutoModelForCausalLM.from_pretrained( model_id, device_map='auto', quantization_config=nf4_config, use_cache=False, attn_implementation="flash_attention_2" )

In [13]:

  Copied!     
 
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
tokenizer = AutoTokenizer.from_pretrained(model_id) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right"

Let's example how well the model does at this task currently:

In [14]:

  Copied!     
 
def generate_response(prompt, model):
  encoded_input = tokenizer(prompt,  return_tensors="pt", add_special_tokens=True)
  model_inputs = encoded_input.to('cuda')

  generated_ids = model.generate(**model_inputs,
                                 max_new_tokens=512,
                                 do_sample=True,
                                 pad_token_id=tokenizer.eos_token_id)

  decoded_output = tokenizer.batch_decode(generated_ids)

  return decoded_output[0].replace(prompt, "")
def generate_response(prompt, model): encoded_input = tokenizer(prompt, return_tensors="pt", add_special_tokens=True) model_inputs = encoded_input.to('cuda') generated_ids = model.generate(**model_inputs, max_new_tokens=512, do_sample=True, pad_token_id=tokenizer.eos_token_id) decoded_output = tokenizer.batch_decode(generated_ids) return decoded_output[0].replace(prompt, "")

In [ ]:

  Copied!     
 
prompt="""[INST]Use the provided input to create an instruction that could have been used to generate the response with an LLM. \nThere are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.[\INST]"""

generate_response(prompt, model)
prompt="""[INST]Use the provided input to create an instruction that could have been used to generate the response with an LLM. \nThere are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.[\INST]""" generate_response(prompt, model)

In [ ]:

  Copied!     
 
print(model)
print(model)

In [ ]:

  Copied!     
 
from datasets import load_dataset

dataset = load_dataset("TokenBender/code_instructions_122k_alpaca_style", split="train")
dataset
from datasets import load_dataset dataset = load_dataset("TokenBender/code_instructions_122k_alpaca_style", split="train") dataset

In [ ]:

  Copied!     
 
df = dataset.to_pandas()
df.head(10)
df = dataset.to_pandas() df.head(10)

Instruction Fintuning - Prepare the dataset under the format of "prompt" so the model can better understand :

the function generate_prompt : take the instruction and output and generate a prompt
shuffle the dataset
tokenizer the dataset

Formatting the Dataset¶

Now, let's format the dataset in the required Mistral-7B-Instruct-v0.1 format.

Many tutorials and blogs skip over this part, but I feel this is a really important step.

We'll put each instruction and input pair between [INST] and [/INST] output after that, like this:

<s>[INST] What is your favorite condiment? [/INST]
Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavor to whatever I'm cooking up in the kitchen!</s>

You can use the following code to process your dataset and create a JSONL file in the correct format:

In [19]:

  Copied!     
 
def generate_prompt(data_point):
    """Gen. input text based on a prompt, task instruction, (context info.), and answer

    :param data_point: dict: Data point
    :return: dict: tokenzed prompt
    """
    prefix_text = 'Below is an instruction that describes a task. Write a response that ' \
               'appropriately completes the request.\n\n'
    # Samples with additional context into.
    if data_point['input']:
        text = f"""<s>[INST]{prefix_text} {data_point["instruction"]} here are the inputs {data_point["input"]} [/INST]{data_point["output"]}</s>"""
    # Without
    else:
        text = f"""<s>[INST]{prefix_text} {data_point["instruction"]} [/INST]{data_point["output"]} </s>"""
    return text

# add the "prompt" column in the dataset
text_column = [generate_prompt(data_point) for data_point in dataset]
dataset = dataset.add_column("prompt", text_column)
def generate_prompt(data_point): """Gen. input text based on a prompt, task instruction, (context info.), and answer :param data_point: dict: Data point :return: dict: tokenzed prompt """ prefix_text = 'Below is an instruction that describes a task. Write a response that ' \ 'appropriately completes the request.\n\n' # Samples with additional context into. if data_point['input']: text = f"""[INST]{prefix_text} {data_point["instruction"]} here are the inputs {data_point["input"]} [/INST]{data_point["output"]}""" # Without else: text = f"""[INST]{prefix_text} {data_point["instruction"]} [/INST]{data_point["output"]} """ return text # add the "prompt" column in the dataset text_column = [generate_prompt(data_point) for data_point in dataset] dataset = dataset.add_column("prompt", text_column)

In [ ]:

  Copied!     
 
dataset = dataset.shuffle(seed=1234)  # Shuffle dataset here
dataset = dataset.map(lambda samples: tokenizer(samples["prompt"]), batched=True)
dataset = dataset.shuffle(seed=1234) # Shuffle dataset here dataset = dataset.map(lambda samples: tokenizer(samples["prompt"]), batched=True)

In [21]:

  Copied!     
 
dataset = dataset.train_test_split(test_size=0.2)
train_data = dataset["train"]
test_data = dataset["test"]
dataset = dataset.train_test_split(test_size=0.2) train_data = dataset["train"] test_data = dataset["test"]

In [ ]:

  Copied!     
 
train_data
train_data

In [ ]:

  Copied!     
 
train_data["input_ids"][:10]
train_data["input_ids"][:10]

After Formatting, We should get something like this¶

{
"text":"<s>[INST] Create a function to calculate the sum of a sequence of integers. here are the inputs [1, 2, 3, 4, 5] [/INST]
# Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum</s>",
"instruction":"Create a function to calculate the sum of a sequence of integers",
"input":"[1, 2, 3, 4, 5]",
"output":"# Python code def sum_sequence(sequence): sum = 0 for num in,
 sequence: sum += num return sum"
"prompt":"<s>[INST] Create a function to calculate the sum of a sequence of integers. here are the inputs [1, 2, 3, 4, 5] [/INST]
# Python code def sum_sequence(sequence): sum = 0 for num in sequence: sum += num return sum</s>"

}

While using SFT (Supervised Fine-tuning Trainer) for fine-tuning, we will be only passing in the “text” column of the dataset for fine-tuning.

In [ ]:

  Copied!     
 
print(test_data)
print(test_data)

Setting up the Training¶

we will be using the huggingface and the peft library!

In [28]:

  Copied!     
 
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
        target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    task_type="CAUSAL_LM"
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training peft_config = LoraConfig( lora_alpha=16, lora_dropout=0.1, r=64, bias="none", target_modules=[ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "lm_head", ], task_type="CAUSAL_LM" )

we need to prepare the model to be trained in 4bit so we will use the prepare_model_for_kbit_training function from peft

Indented block

In [29]:

  Copied!     
 
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)
model = prepare_model_for_kbit_training(model) model = get_peft_model(model, peft_config)

In [30]:

  Copied!     
 
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )
def print_trainable_parameters(model): """ Prints the number of trainable parameters in the model. """ trainable_params = 0 all_param = 0 for _, param in model.named_parameters(): all_param += param.numel() if param.requires_grad: trainable_params += param.numel() print( f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}" )

In [ ]:

  Copied!     
 
print_trainable_parameters(model)
print_trainable_parameters(model)

Model after Adding Lora Config¶

In [ ]:

  Copied!     
 
print(model)
print(model)

Hyper-paramters for training¶

These parameters will depend on how long you want to run training for. Most important to consider:

num_train_epochs/max_steps: How many iterations over the data you want to do, BE CAREFUL, don't try too many, you will over-fit!!!!!

learning_rate: Controls the speed of convergence

In [33]:

  Copied!     
 
if torch.cuda.device_count() > 1: # If more than 1 GPU
    print(torch.cuda.device_count())
    model.is_parallelizable = True
    model.model_parallel = True
if torch.cuda.device_count() > 1: # If more than 1 GPU print(torch.cuda.device_count()) model.is_parallelizable = True model.model_parallel = True

In [36]:

  Copied!     
 
from transformers import TrainingArguments

args = TrainingArguments(
  output_dir = "Mixtral_Alpace_v3",
  #num_train_epochs=5,
  max_steps = 100, # comment out this line if you want to train in epochs
  per_device_train_batch_size = 32,
  warmup_steps = 0.03,
  logging_steps=10,
  save_strategy="epoch",
  #evaluation_strategy="epoch",
  evaluation_strategy="steps",
  eval_steps=10, # comment out this line if you want to evaluate at the end of each epoch
  learning_rate=2.5e-5,
  bf16=True,
  # lr_scheduler_type='constant',
)
from transformers import TrainingArguments args = TrainingArguments( output_dir = "Mixtral_Alpace_v3", #num_train_epochs=5, max_steps = 100, # comment out this line if you want to train in epochs per_device_train_batch_size = 32, warmup_steps = 0.03, logging_steps=10, save_strategy="epoch", #evaluation_strategy="epoch", evaluation_strategy="steps", eval_steps=10, # comment out this line if you want to evaluate at the end of each epoch learning_rate=2.5e-5, bf16=True, # lr_scheduler_type='constant', )

Setting up the trainer.

max_seq_length: Context window size

In [ ]:

  Copied!     
 
from trl import SFTTrainer

max_seq_length = 1024

trainer = SFTTrainer(
  model=model,
  peft_config=peft_config,
  max_seq_length=max_seq_length,
  tokenizer=tokenizer,
  packing=True,
  args=args,
  dataset_text_field="prompt",
  train_dataset=train_data,
  eval_dataset=test_data,
)
from trl import SFTTrainer max_seq_length = 1024 trainer = SFTTrainer( model=model, peft_config=peft_config, max_seq_length=max_seq_length, tokenizer=tokenizer, packing=True, args=args, dataset_text_field="prompt", train_dataset=train_data, eval_dataset=test_data, )

In [ ]:

  Copied!     
 
trainer.train()
trainer.train()

In [ ]:

  Copied!     
 
trainer.save_model("Mixtral_Alpace_v2")
trainer.save_model("Mixtral_Alpace_v2")

Save Model and Push to Hub¶

In [ ]:

  Copied!     
 
# !pip install huggingface-hub -qU
# !pip install huggingface-hub -qU

In [ ]:

  Copied!     
 
# from huggingface_hub import notebook_login

# notebook_login()
# from huggingface_hub import notebook_login # notebook_login()

In [ ]:

  Copied!     
 
# trainer.push_to_hub("Promptengineering/mistral-instruct-generation")
# trainer.push_to_hub("Promptengineering/mistral-instruct-generation")

In [ ]:

  Copied!     
 
merged_model = model.merge_and_unload()
merged_model = model.merge_and_unload()

In [ ]:

  Copied!     
 
def generate_response(prompt, model):
  encoded_input = tokenizer(prompt,  return_tensors="pt", add_special_tokens=True)
  model_inputs = encoded_input.to('cuda')

  generated_ids = model.generate(**model_inputs,
                                 max_new_tokens=150,
                                 do_sample=True,
                                 pad_token_id=tokenizer.eos_token_id)

  decoded_output = tokenizer.batch_decode(generated_ids)

  return decoded_output[0]
def generate_response(prompt, model): encoded_input = tokenizer(prompt, return_tensors="pt", add_special_tokens=True) model_inputs = encoded_input.to('cuda') generated_ids = model.generate(**model_inputs, max_new_tokens=150, do_sample=True, pad_token_id=tokenizer.eos_token_id) decoded_output = tokenizer.batch_decode(generated_ids) return decoded_output[0]

In [ ]:

  Copied!     
 
prompt = "[INST]Use the provided input to create an instruction that could have been used to generate the response with an LLM.\nThere are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.[/INST]"
prompt = "[INST]Use the provided input to create an instruction that could have been used to generate the response with an LLM.\nThere are more than 12,000 species of grass. The most common is Kentucky Bluegrass, because it grows quickly, easily, and is soft to the touch. Rygrass is shiny and bright green colored. Fescues are dark green and shiny. Bermuda grass is harder but can grow in drier soil.[/INST]" 

In [ ]:

  Copied!     
 
generate_response(prompt, merged_model)
generate_response(prompt, merged_model)

In [ ]:

  Copied!     
 
250*32
250*32