Building Semantic Search in a Desktop App with Rust and SQLite
A step-by-step walkthrough of how I built offline, ML-powered semantic search for my note-taking app using rust-bert, sqlite-vec, and Tauri.
Most note-taking apps offer keyword search. You type "meeting notes" and get back notes that contain those exact words. But what if you wrote about a "team standup recap" instead? Keyword search fails. Semantic search doesn't.
In this post, I'll walk through how I built fully offline semantic search for Insight Notes , a desktop note-taking app built with Tauri. No API calls, no cloud — everything runs on your machine.
The system has four key pieces:
Let's go through each one.
You can't just embed an entire note as one vector. Long documents lose nuance when compressed into a single embedding. Instead, I split each note into chunks using langchain-rust 's MarkdownSplitter :
async fn split_markdown_content(content: &str) -> Vec<String> {
let options = SplitterOptions {
chunk_size: 256,
..Default::default()
};
MarkdownSplitter::new(options)
.split_text(content)
.await
.unwrap()
} The MarkdownSplitter is markdown-aware, so it respects heading boundaries and code blocks rather than splitting mid-sentence. A chunk size of 256 tokens turned out to be a good balance — small enough for precise matching, large enough to retain context.
Each chunk gets converted into a 384-dimensional vector using the AllMiniLM-L12-V2 model via rust-bert :
async fn create_sentence_embedding(
text: &str,
sentence_encoder: &SentenceEncoder,
) -> Vec<f32> {
let sentences = [text.to_string()];
let output = sentence_encoder
.encode(sentences.to_vec())
.await
.unwrap();
output[0].clone()
}I also compute a centroid embedding for each note — the average of all its chunk embeddings. This is useful for finding notes similar to a given note without comparing every individual chunk:
fn compute_centroid_from_note_content_chunks(
chunks: &Vec<NoteChunkToInsert>,
) -> Option<Vec<f32>> {
let num_vectors = chunks.len();
let dimension = chunks[0]
.sentence_embedding_vector
.len();
let mut sum_vector = vec![0.0; dimension];
for chunk in chunks {
for i in 0..dimension {
sum_vector[i] += chunk
.sentence_embedding_vector[i];
}
}
Some(
sum_vector
.iter()
.map(|&sum| sum / num_vectors as f32)
.collect(),
)
}Here's where it gets interesting. Instead of reaching for a dedicated vector database like Pinecone or Qdrant, I used sqlite-vec — a SQLite extension that adds vector search capabilities through virtual tables.
The schema:
CREATE VIRTUAL TABLE vec_note_chunks USING vec0(
sentence_embedding float[384]
);That's it. One line gives you a vector index. Embeddings get inserted alongside regular note data:
WITH related_note_chunks AS (
SELECT rowid, sentence_embedding
FROM note_chunks
WHERE note_chunks.note_id = ?1
)
INSERT INTO vec_note_chunks (
rowid, sentence_embedding
)
SELECT rowid, sentence_embedding
FROM related_note_chunks; sqlite-vec is loaded as an auto-extension at startup:
unsafe {
libsqlite3_sys::sqlite3_auto_extension(
Some(std::mem::transmute(
sqlite3_vec_init as *const (),
)),
);
}When the user searches, the query text gets embedded into the same 384-dimensional space, then matched against stored vectors:
WITH matches AS (
SELECT rowid, distance
FROM vec_note_chunks
WHERE sentence_embedding MATCH (?1)
AND distance > 0.8
ORDER BY distance
LIMIT 10
)
SELECT DISTINCT
notes.id,
notes.content,
notes.created_at,
notes.updated_at
FROM matches
JOIN note_chunks
ON note_chunks.id = matches.rowid
LEFT JOIN notes
ON notes.id = note_chunks.note_id The MATCH keyword triggers a vector similarity search. The distance > 0.8 threshold filters for high-confidence matches (cosine similarity). Results come back ranked by relevance.
The CTE pattern here is important — it lets sqlite-vec do the expensive vector search first, then joins back to get the full note data. This is much faster than filtering after the join.
With this setup, searching for "project deadlines" will surface notes about "timeline pressure" or "delivery dates" even if those exact words don't appear in the query. The search feels almost magical compared to Ctrl+F.
The entire pipeline runs in under 100ms on a modern laptop. No network latency, no API costs, no privacy concerns.
A few things I learned along the way:
If you're building a local-first app and need search that actually understands meaning, this stack — rust-bert + sqlite-vec + Tauri — is surprisingly capable. You get the intelligence of embeddings with the simplicity of SQLite.