Back to Blog
AI Solutions14 min read

Building Custom AI Chatbots with Custom Training Data

Create domain-specific AI assistants trained on your own data for superior accuracy.

March 4, 2024
#AI#Chatbot#NLP

Approaches to Custom AI


1. Fine-tuning

Retrain a base model on your data. Powerful but expensive.

2. Retrieval-Augmented Generation (RAG)

Keep your data separate and augment prompts with relevant context. More cost-effective.

3. Prompt Engineering

Guide the model with detailed instructions. Simplest but less precise.

Implementing RAG


Step 1: Prepare Your Data


// Split documents into chunks

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';


const splitter = new RecursiveCharacterTextSplitter({

chunkSize: 1000,

chunkOverlap: 200,

});


const docs = await splitter.createDocuments([

'Your product documentation here...'

]);

Step 2: Create Embeddings


import { OpenAIEmbeddings } from 'langchain/embeddings';

import { PineconeStore } from 'langchain/vectorstores';


const embeddings = new OpenAIEmbeddings({

openaiApiKey: process.env.OPENAI_API_KEY,

});


const vectorStore = await PineconeStore.fromDocuments(

docs,

embeddings,

{ pineconeIndex }

);

Step 3: Build the Chat


import { ConversationalRetrievalQAChain } from 'langchain/chains';

import { ChatOpenAI } from 'langchain/chat_models';

import { PromptTemplate } from 'langchain/prompts';


const llm = new ChatOpenAI({ temperature: 0 });


const chain = ConversationalRetrievalQAChain.fromLLM(

llm,

vectorStore.asRetriever(),

{

questionGeneratorTemplate: QUESTION_PROMPT,

responseTemplate: RESPONSE_TEMPLATE,

}

);


const result = await chain.call({

question: 'How do I reset my password?',

chatHistory: [],

});

Best Practices


  • Clean and structure your data well
  • Implement proper chunking strategies
  • Add source citations for responses
  • Monitor and iterate based on user feedback
  • Conclusion


    Custom-trained chatbots provide much better answers for domain-specific questions. RAG offers the best balance of accuracy and cost.

    // Split documents into chunks import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter'; const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 200, }); const docs = await splitter.createDocuments([ 'Your product documentation here...' ]);
    import { OpenAIEmbeddings } from 'langchain/embeddings'; import { PineconeStore } from 'langchain/vectorstores'; const embeddings = new OpenAIEmbeddings({ openaiApiKey: process.env.OPENAI_API_KEY, }); const vectorStore = await PineconeStore.fromDocuments( docs, embeddings, { pineconeIndex } );
    import { ConversationalRetrievalQAChain } from 'langchain/chains'; import { ChatOpenAI } from 'langchain/chat_models'; import { PromptTemplate } from 'langchain/prompts'; const llm = new ChatOpenAI({ temperature: 0 }); const chain = ConversationalRetrievalQAChain.fromLLM( llm, vectorStore.asRetriever(), { questionGeneratorTemplate: QUESTION_PROMPT, responseTemplate: RESPONSE_TEMPLATE, } ); const result = await chain.call({ question: 'How do I reset my password?', chatHistory: [], });

    Need Help with Your Project?

    Our team can help you implement these patterns in your application.

    Get in Touch