Building Semantic Search in a Desktop App with Rust and SQLite

A step-by-step walkthrough of how I built offline, ML-powered semantic search for my note-taking app using rust-bert, sqlite-vec, and Tauri.

Most note-taking apps offer keyword search. You type "meeting notes" and get back notes that contain those exact words. But what if you wrote about a "team standup recap" instead? Keyword search fails. Semantic search doesn't.

In this post, I'll walk through how I built fully offline semantic search for Insight Notes, a desktop note-taking app built with Tauri. No API calls, no cloud — everything runs on your machine.

The Architecture at a Glance

The system has four key pieces:

Let's go through each one.

Step 1: Chunking Markdown

You can't just embed an entire note as one vector. Long documents lose nuance when compressed into a single embedding. Instead, I split each note into chunks using langchain-rust's MarkdownSplitter:

async fn split_markdown_content(content: &str) -> Vec<String> {
    let options = SplitterOptions {
        chunk_size: 256,
        ..Default::default()
    };

    MarkdownSplitter::new(options)
        .split_text(content)
        .await
        .unwrap()
}

The MarkdownSplitter is markdown-aware, so it respects heading boundaries and code blocks rather than splitting mid-sentence. A chunk size of 256 tokens turned out to be a good balance — small enough for precise matching, large enough to retain context.

Step 2: Computing Embeddings

Each chunk gets converted into a 384-dimensional vector using the AllMiniLM-L12-V2 model via rust-bert:

async fn create_sentence_embedding(
    text: &str,
    sentence_encoder: &SentenceEncoder,
) -> Vec<f32> {
    let sentences = [text.to_string()];
    let output = sentence_encoder
        .encode(sentences.to_vec())
        .await
        .unwrap();
    output[0].clone()
}

I also compute a centroid embedding for each note — the average of all its chunk embeddings. This is useful for finding notes similar to a given note without comparing every individual chunk:

fn compute_centroid_from_note_content_chunks(
    chunks: &Vec<NoteChunkToInsert>,
) -> Option<Vec<f32>> {
    let num_vectors = chunks.len();
    let dimension = chunks[0]
        .sentence_embedding_vector
        .len();
    let mut sum_vector = vec![0.0; dimension];

    for chunk in chunks {
        for i in 0..dimension {
            sum_vector[i] += chunk
                .sentence_embedding_vector[i];
        }
    }

    Some(
        sum_vector
            .iter()
            .map(|&sum| sum / num_vectors as f32)
            .collect(),
    )
}

Step 3: Storing Vectors in SQLite

Here's where it gets interesting. Instead of reaching for a dedicated vector database like Pinecone or Qdrant, I used sqlite-vec — a SQLite extension that adds vector search capabilities through virtual tables.

The schema:

CREATE VIRTUAL TABLE vec_note_chunks USING vec0(
    sentence_embedding float[384]
);

That's it. One line gives you a vector index. Embeddings get inserted alongside regular note data:

WITH related_note_chunks AS (
    SELECT rowid, sentence_embedding
    FROM note_chunks
    WHERE note_chunks.note_id = ?1
)
INSERT INTO vec_note_chunks (
    rowid, sentence_embedding
)
SELECT rowid, sentence_embedding
FROM related_note_chunks;

sqlite-vec is loaded as an auto-extension at startup:

unsafe {
    libsqlite3_sys::sqlite3_auto_extension(
        Some(std::mem::transmute(
            sqlite3_vec_init as *const (),
        )),
    );
}

Step 4: Querying with Vector Similarity

When the user searches, the query text gets embedded into the same 384-dimensional space, then matched against stored vectors:

WITH matches AS (
    SELECT rowid, distance
    FROM vec_note_chunks
    WHERE sentence_embedding MATCH (?1)
        AND distance > 0.8
    ORDER BY distance
    LIMIT 10
)
SELECT DISTINCT
    notes.id,
    notes.content,
    notes.created_at,
    notes.updated_at
FROM matches
JOIN note_chunks
    ON note_chunks.id = matches.rowid
LEFT JOIN notes
    ON notes.id = note_chunks.note_id

The MATCH keyword triggers a vector similarity search. The distance > 0.8 threshold filters for high-confidence matches (cosine similarity). Results come back ranked by relevance.

The CTE pattern here is important — it lets sqlite-vec do the expensive vector search first, then joins back to get the full note data. This is much faster than filtering after the join.

The Result

With this setup, searching for "project deadlines" will surface notes about "timeline pressure" or "delivery dates" even if those exact words don't appear in the query. The search feels almost magical compared to Ctrl+F.

The entire pipeline runs in under 100ms on a modern laptop. No network latency, no API costs, no privacy concerns.

What I'd Do Differently

A few things I learned along the way:

Chunk overlap matters. I didn't implement overlapping chunks, which means context at chunk boundaries can get lost. Adding 20-30% overlap would improve recall.

Hybrid search is better. Combining vector search with traditional full-text search (SQLite FTS5) would catch cases where exact keyword matches are actually what the user wants.

Re-ranking helps. The top 10 vector matches could be re-ranked with a cross-encoder for better precision, though this adds latency.

If you're building a local-first app and need search that actually understands meaning, this stack — rust-bert + sqlite-vec + Tauri — is surprisingly capable. You get the intelligence of embeddings with the simplicity of SQLite.