ChatML Inference

In [1]:

  Copied!     
 
!pip install -qU transformers accelerate
!pip install -qU transformers accelerate

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

In [2]:

  Copied!     
 
from transformers import AutoTokenizer
import transformers
import torch

model = "TokenBender/navaran_hindi_dpo_merged"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
from transformers import AutoTokenizer import transformers import torch model = "TokenBender/navaran_hindi_dpo_merged" messages = [{"role": "user", "content": "What is a large language model?"}] tokenizer = AutoTokenizer.from_pretrained(model) prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"])

tokenizer_config.json:   0%|          | 0.00/1.60k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/51.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/420 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

config.json:   0%|          | 0.00/653 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/115 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.

<|im_start|>user
What is a large language model?<|im_end|>
<|im_start|>assistant
A large language model is a type of artificial intelligence algorithm that is designed to understand and generate human language. These models are trained on vast amounts of text data, allowing them to learn patterns and relationships within language. Large language models are used in various applications, such as natural language processing, machine translation, and chatbots. They can understand and generate text in a way that is similar to how humans do, making them a powerful tool for language understanding and generation.

In [15]:

  Copied!     
 
messages = [{"role": "user", "content": "वर्चुअल रियलिटी और ऑगमेंटेड रियलिटी में क्या अंतर है?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=1024, do_sample=True, temperature=0.1, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
messages = [{"role": "user", "content": "वर्चुअल रियलिटी और ऑगमेंटेड रियलिटी में क्या अंतर है?"}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = pipeline(prompt, max_new_tokens=1024, do_sample=True, temperature=0.1, top_k=50, top_p=0.95) print(outputs[0]["generated_text"])

/opt/conda/lib/python3.10/site-packages/transformers/pipelines/base.py:1123: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
  warnings.warn(
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.

<|im_start|>user
वर्चुअल रियलिटी और ऑगमेंटेड रियलिटी में क्या अंतर है?<|im_end|>
<|im_start|>assistant
उत्तर: वीडियो गेम विकास का इतिहास और वीडियो गेम उद्योग पर इसका प्रभाव। यह लेख वीडियो गेम विकास के विकास और वीडियो गेम उद्योग पर इसके प्रभाव की पड़ताल करता है। यह वीडियो गेम विकास के विभिन्न चरणों और विभिन्न प्लेटफार्मों पर इसके प्रभाव पर चर्चा करता है। जबकि यह वीडियो गेम विकास के बारे में मूल्यवान जानकारी प्रदान करता है, यह विशेष रूप से वीडियो गेम विकास के लिए उपयोग किए जाने वाले उन्नत प्रोग्रामिंग भाषाओं पर ध्यान केंद्रित नहीं करता है।

In [ ]: