Zero Budget RAG (Part 2)
In Part 2, our primary focus is on reducing API calls.
- Utilize the "all-MiniLM-L6-v2" model to locate the most similar sentence within our dataset.
- Employ the "Mistral-7B-Instruct-v0.2" LLM to generate a more human-like answer based on the Question and Answer.
Currently, we rely on these two Hugging Face inference API calls to achieve these tasks. While this approach is feasible for limited usage, scaling it for a larger user base could become cost-prohibitive, diverging from our "Zero Budget RAG" principle.
To mitigate this, we'll explore a vanilla JavaScript solution for the first API call. GPT4o has generated JavaScript code for us, which appears quite accurate.
function performSearch(query) {
// Preprocess the documents
const docTokensArray = documents.map((doc) => tokenize(doc.content.toLowerCase()));
console.log(docTokensArray);
// Calculate TF-IDF vectors for documents
const tfidfVectors = docTokensArray.map((tokens) => calculateTfIdf(tokens, docTokensArray));
// Tokenize the query and calculate its TF-IDF vector
const queryTokens = tokenize(query);
console.log(queryTokens);
const queryTfIdf = calculateTfIdf(queryTokens, docTokensArray);
// Calculate the cosine similarity between the query and each document
const results = tfidfVectors.map((tfidfVector, index) => {
const cosineSimilarity = calculateCosineSimilarity(queryTfIdf, tfidfVector);
return { id: documents[index].id, similarity: cosineSimilarity };
});
// Sort the results by similarity in descending order
results.sort((a, b) => b.similarity - a.similarity);
return results;
}
function tokenize(text) {
return text.split(/\W+/).filter(token => token.length > 0);
}
function calculateTfIdf(tokens, docTokensArray) {
const tf = {};
const df = {};
const idf = {};
tokens.forEach((token) => {
tf[token] = (tf[token] || 0) + 1;
});
docTokensArray.forEach((docTokens) => {
const uniqueTokens = new Set(docTokens);
uniqueTokens.forEach((token) => {
df[token] = (df[token] || 0) + 1;
});
});
Object.keys(tf).forEach((token) => {
idf[token] = Math.log(docTokensArray.length / (df[token] || 1));
});
const tfidf = {};
Object.keys(tf).forEach((token) => {
tfidf[token] = tf[token] * idf[token];
});
return tfidf;
}
function calculateCosineSimilarity(vectorA, vectorB) {
const dotProduct = Object.keys(vectorA).reduce((sum, key) => sum + (vectorA[key] * (vectorB[key] || 0)), 0);
const magnitudeA = Math.sqrt(Object.values(vectorA).reduce((sum, value) => sum + value ** 2, 0));
const magnitudeB = Math.sqrt(Object.values(vectorB).reduce((sum, value) => sum + value ** 2, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
We can use it by calling `performSearch` function. Example
performSearch("Does August Host Offer SEO service?").
Fortunately, it returns the last index, and the result is:
August Host offers a variety of services, including web development, app programming, digital marketing, SEO, and tech consulting. They use modern tools like PHP, JS, Typescript, and Serverless to create custom websites and applications for clients. Additionally, they provide expertise in advertising, SEO, and domain email creation.
For more details, visit this CodePen: link.
In Part 2, we've implemented Vanilla JS to find similar sentences, reducing our API calls to just one for obtaining the appropriate answer. This streamlined approach involves sending only the final Question and Answer to LLM 'Mistral-7B-Instruct-v0.2', significantly cutting down on API usage. Moreover, with Hugging Face offering unlimited calls for Free accounts, we've successfully utilized RAG without any budget constraints.
Checkout this final result https://zerobudgetrag.netlify.app/
Contact us for the source code.
Thank you for reading, and I hope you find this method useful!