Here at dataRecode Labs, we’ve been exploring how AI and Large Language Models (LLMs) can be made more useful in offline, locally controlled environments — especially when working with sensitive data like medical records, financial reports, or confidential company archives. In these cases, privacy isn’t just a preference — it’s a requirement.
Most LLMs, including models like ChatGPT, are trained on vast public datasets. That means they can’t access or respond to your private, local information unless you explicitly provide it. This is where Retrieval-Augmented Generation (RAG) comes in — a method that enriches an LLM’s responses by feeding it your own documents at query time.
Think of it as giving your model a private bookshelf
to browse before it speaks. Instead of relying solely on what it
“remembers” from training, it can scan your handpicked sources — Excel
files, PDFs, notes, datasets — and integrate that knowledge into its
answers.
The result: responses that are more grounded, relevant, and
tailored to your context.
To test this out, we used a locally hosted LLM via
Ollama — specifically the
qwen2.5:7b
model. Running the model locally means
everything stays on your machine, with no cloud dependency — ideal for
privacy, speed, and control.
We connected a few sample documents to the model using RAG, and used
the ellmer
R package to inspect how well it retrieved and
responded. This mini blog (a reproducible gist) walks through that
workflow step by step — from assumptions and setup, to preparing sample
local documents, indexing and embedding, implementing RAG, testing and
verification, and finally, key takeaways.
Without further ado, let’s dive in.
Fig 1. A high-level flowchart showing how a user query (via R) connects to local documents through Ollama’s retrieval layer, then passes to a locally hosted LLM (e.g., chatGPT, quemu) to generate a response.
This gist assumes you’ve already set up a local LLM using Ollama and the qwen2.5:7b
model. If you’re comfortable running queries locally and have Ollama
installed, you’re good to go.
💡 Note
This workflow was tested in an R environment with the following packages loaded:
-ellmer
for RAG implementation and response inspection
-dplyr
for data wrangling and pipeline operations
Make sure these packages are installed and loaded before running the code chunks.
+-----------------------------+ +-----------------------------+ +-----------------------------+
| Sample Documents |-->| Document Indexing |-->| Local LLM Query |
| (PDFs, Excel, etc) | | (Embedding via R) | | + RAG Retrieval |
+-----------------------------+ +-----------------------------+ +-----------------------------+
|
v
+-----------------------------+ +-----------------------------+ +-----------------------------+
| Local Storage |-->| Ollama LLM Host |-->| Response Output |
| (No Cloud!) | | (qwen2.5:7b) | | via ellmer R |
+-----------------------------+ +-----------------------------+ +-----------------------------+
Privacy, Speed, and Control
The walkthrough includes:
* Preparing sample documents
* Generating a vector index using GloVe embeddings via
text2vec
* Implementing RAG with ellmer
and querying the local
LLM
* Testing and verifying responses
* Reflecting on takeaways and next steps
Create two sample .txt
files to simulate local documents
for retrieval.
doc1.txt
:Sales Report - Q3 2025
Revenue generated: $7500
Date: 2025-09-10
Key product: Widget A
Notes: Strong demand in the eastern region.
Market analysis shows a 15% growth in sales.
Customer feedback highlights product durability.
Eastern region accounts for 40% of revenue.
New marketing campaign launched last month.
Inventory restocked on 2025-09-05.
Competitor activity remains low in this quarter.
doc2.txt
:Financial Update - Q4 2025
Revenue generated: $9200
Date: 2025-09-12
Key product: Widget B
Notes: Inventory levels are stable.
Profit margins improved by 8% this quarter.
Widget B sales spiked after the trade show.
Western region contributed 35% of revenue.
Supply chain delays resolved by 2025-09-08.
Customer retention rate is at 90%.
Planning a product launch for next month.
This section uses three key R packages to build a semantic index of your documents:
text2vec
: Handles tokenization,
vocabulary creation, and GloVe-based word embeddingsdplyr
: Powers the data wrangling
pipeline for filtering, grouping, and summarizingellmer
: Later used to connect your
query to the index and inspect retrieval behaviorThis gist assumes you’re already familiar with basic R workflows and tidyverse-style syntax.
For deeper technical details ontext2vec
, see: https://cran.r-project.org/package=text2vec
Generate a vector index of document embeddings using
text2vec
and GloVe function — enabling semantic retrieval
for RAG.
# Load and preprocess documents
docs <- tibble(
id = c("doc1", "doc2"),
text = c(
paste(readLines("doc1.txt"), collapse = "\n"),
paste(readLines("doc2.txt"), collapse = "\n")
)
)
# Tokenize and filter stop words
tokens <- docs %>%
unnest_tokens(word, text) %>%
filter(!word %in% stop_words$word)
# Reconstruct document texts
docs_by_id <- tokens %>%
group_by(id) %>%
summarise(text = paste(word, collapse = " "), .groups = "drop")
# Create token iterator
it <- itoken(docs_by_id$text, progressbar = TRUE)
# Build vocabulary and vectorizer
vocab <- create_vocabulary(it)
vectorizer <- vocab_vectorizer(vocab)
# Create term co-occurrence matrix
tcm <- create_tcm(it, vectorizer, skip_grams_window = 2L)
print(dim(tcm))
# Train GloVe model
glove <- GlobalVectors$new(rank = 30, x_max = 10, learning_rate = 0.05)
word_vectors <- glove$fit_transform(tcm, n_iter = 10, convergence_tol = 0.1)
# Create document-level embeddings
doc_embeddings <- tokens %>%
left_join(as_tibble(word_vectors, rownames = "word"), by = "word") %>%
group_by(id) %>%
summarise(across(where(is.numeric), mean, na.rm = TRUE), .groups = "drop") %>%
column_to_rownames("id")
index <- list(
embeddings = as.matrix(doc_embeddings),
ids = rownames(doc_embeddings),
texts = docs$text
)
ellmer
Augment LLM responses with document context for accurate, grounded answers.
# Load ellmer and initialize chat object
library(ellmer)
chat <- chat_ollama(model = "qwen2.5:7b", base_url = "http://localhost:11434")
# Query with doc1 context
query <- "What was the revenue in Q3 2025?"
relevant_doc <- list(id = "doc1", text = index$texts[index$ids == "doc1"])
augmented_prompt <- paste(relevant_doc$text, "\nQuestion:", query)
response <- chat$chat(augmented_prompt)
print(response)
# Query with doc2 context
query <- "What was the revenue in Q4 2025?"
relevant_doc <- list(id = "doc2", text = index$texts[index$ids == "doc2"])
augmented_prompt <- paste(relevant_doc$text, "\nQuestion:", query)
response <- chat$chat(augmented_prompt)
print(response)
# Query with both documents
query <- "What were the revenues for Q3 and Q4 2025?"
relevant_doc1 <- list(id = "doc1", text = index$texts[index$ids == "doc1"])
relevant_doc2 <- list(id = "doc2", text = index$texts[index$ids == "doc2"])
augmented_prompt <- paste(relevant_doc1$text, "\n", relevant_doc2$text, "\nQuestion:", query)
response <- chat$chat(augmented_prompt)
print(response)
This workflow demonstrates how Retrieval-Augmented Generation (RAG) can extend the capabilities of a locally hosted LLM by integrating your own documents and data into its responses. Instead of relying solely on what the model was trained on, RAG allows you to enrich its output with context-specific, private, or domain-relevant information — all without sending anything to the cloud.
The result is a more grounded, responsive, and customizable AI experience — one that respects privacy, adapts to your local needs, and opens new possibilities for offline data interaction.
💡 Note
While this setup works well for prototyping and exploration, production use may require more robust testing, performance tuning, and system resources. Depending on your data size and model complexity, additional RAM, CPU, or GPU capacity might be needed to ensure smooth operation.
Whether you’re exploring Retrieval-Augmented Generation (RAG), fine-tuning LLM workflows, or seeking tailored solutions for your organization, we’d be glad to connect. For inquiries, consultations, or partnership opportunities, please reach out to us at info@datarecode.com.