ChatML Inference
In [1]:
Copied!
!pip install -qU transformers accelerate
!pip install -qU transformers accelerate
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
In [2]:
Copied!
from transformers import AutoTokenizer
import transformers
import torch
model = "TokenBender/navaran_hindi_dpo_merged"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
from transformers import AutoTokenizer
import transformers
import torch
model = "TokenBender/navaran_hindi_dpo_merged"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
tokenizer_config.json: 0%| | 0.00/1.60k [00:00<?, ?B/s]
tokenizer.model: 0%| | 0.00/493k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/1.80M [00:00<?, ?B/s]
added_tokens.json: 0%| | 0.00/51.0 [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/420 [00:00<?, ?B/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
config.json: 0%| | 0.00/653 [00:00<?, ?B/s]
model.safetensors.index.json: 0%| | 0.00/23.9k [00:00<?, ?B/s]
Downloading shards: 0%| | 0/3 [00:00<?, ?it/s]
model-00001-of-00003.safetensors: 0%| | 0.00/4.94G [00:00<?, ?B/s]
model-00002-of-00003.safetensors: 0%| | 0.00/5.00G [00:00<?, ?B/s]
model-00003-of-00003.safetensors: 0%| | 0.00/4.54G [00:00<?, ?B/s]
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
generation_config.json: 0%| | 0.00/115 [00:00<?, ?B/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
<|im_start|>user What is a large language model?<|im_end|> <|im_start|>assistant A large language model is a type of artificial intelligence algorithm that is designed to understand and generate human language. These models are trained on vast amounts of text data, allowing them to learn patterns and relationships within language. Large language models are used in various applications, such as natural language processing, machine translation, and chatbots. They can understand and generate text in a way that is similar to how humans do, making them a powerful tool for language understanding and generation.
In [15]:
Copied!
messages = [{"role": "user", "content": "वर्चुअल रियलिटी और ऑगमेंटेड रियलिटी में क्या अंतर है?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=1024, do_sample=True, temperature=0.1, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
messages = [{"role": "user", "content": "वर्चुअल रियलिटी और ऑगमेंटेड रियलिटी में क्या अंतर है?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=1024, do_sample=True, temperature=0.1, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
/opt/conda/lib/python3.10/site-packages/transformers/pipelines/base.py:1123: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset warnings.warn( Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
<|im_start|>user वर्चुअल रियलिटी और ऑगमेंटेड रियलिटी में क्या अंतर है?<|im_end|> <|im_start|>assistant उत्तर: वीडियो गेम विकास का इतिहास और वीडियो गेम उद्योग पर इसका प्रभाव। यह लेख वीडियो गेम विकास के विकास और वीडियो गेम उद्योग पर इसके प्रभाव की पड़ताल करता है। यह वीडियो गेम विकास के विभिन्न चरणों और विभिन्न प्लेटफार्मों पर इसके प्रभाव पर चर्चा करता है। जबकि यह वीडियो गेम विकास के बारे में मूल्यवान जानकारी प्रदान करता है, यह विशेष रूप से वीडियो गेम विकास के लिए उपयोग किए जाने वाले उन्नत प्रोग्रामिंग भाषाओं पर ध्यान केंद्रित नहीं करता है।
In [ ]:
Copied!