Running ML Models in Tauri Without Blocking the UI
How I used a dedicated thread, sync channels, and oneshot responses to run rust-bert inference in a Tauri app without freezing the frontend.
If you've ever tried running a machine learning model inside a desktop app, you've probably hit this problem: the model takes 50-200ms per inference call, and during that time your UI is completely frozen. Users hate it.
Tauri runs on tokio's async runtime, which is great for I/O-bound work but terrible for CPU-bound ML inference. If you await a blocking computation on a tokio worker thread, you starve the entire runtime. Other tasks — including UI event handling — can't make progress.
Here's how I solved this in Insight Notes , a note-taking app that runs AllMiniLM-L12-V2 for on-device semantic search.
Tauri commands run as async functions on tokio's thread pool:
#[tauri::command]
pub async fn search_notes(
state: tauri::State<'_, AppState>,
query: String,
) -> Result<Vec<Note>, String> {
// This needs to call the ML model...
// But we can't block here!
} Calling model.encode() directly here would block a tokio worker thread. With the default tokio config (usually equal to the number of CPU cores), a few concurrent requests could lock up the entire runtime.
You might think tokio::task::spawn_blocking solves this, but there's a catch: the rust-bert model isn't Send + Sync . You can't safely move it between threads. The model needs to live on one thread and stay there.
The pattern is straightforward: spawn a dedicated OS thread that owns the model, and communicate with it via channels.
pub struct SentenceEncoder {
sender: mpsc::SyncSender<Message>,
}
type Message = (
Vec<String>,
oneshot::Sender<Vec<Embedding>>,
); The SentenceEncoder is just a handle that holds the sending end of a channel. The actual model lives on a separate thread that nobody else can touch.
pub fn spawn() -> (
JoinHandle<anyhow::Result<()>>,
SentenceEncoder,
) {
let (sender, receiver) =
mpsc::sync_channel(100);
let handle = thread::spawn(
move || Self::runner(receiver),
);
(handle, SentenceEncoder { sender })
} thread::spawn — not tokio::spawn . This is a real OS thread, completely independent of tokio's runtime. The bounded channel ( sync_channel(100) ) provides backpressure: if 100 requests are queued, the sender blocks until space opens up.
fn runner(
receiver: mpsc::Receiver<Message>,
) -> anyhow::Result<()> {
let model = SentenceEmbeddingsBuilder::remote(
SentenceEmbeddingsModelType::AllMiniLmL12V2,
)
.create_model()
.unwrap();
while let Ok((texts, sender)) =
receiver.recv()
{
let texts: Vec<&str> =
texts.iter().map(String::as_str).collect();
let embeddings =
model.encode(&texts).unwrap();
sender
.send(embeddings)
.expect("sending embedding results");
}
Ok(())
} The model is created once when the thread starts, then sits in a loop processing requests. Each request comes with a oneshot::Sender for returning the result. When the channel is dropped (app shutdown), receiver.recv() returns Err and the loop exits cleanly.
This is the critical piece — bridging the sync channel with Tauri's async world:
pub async fn encode(
&self,
texts: Vec<String>,
) -> anyhow::Result<Vec<Embedding>> {
let (sender, receiver) = oneshot::channel();
task::block_in_place(|| {
self.sender.send((texts, sender))
})?;
Ok(receiver.await?)
} task::block_in_place is the key. It tells tokio "I'm about to do something blocking, move other tasks off this thread." This is different from spawn_blocking — it doesn't move the closure to a separate thread, it just signals tokio to be smart about scheduling.
The oneshot::channel gives us a future we can .await on the tokio side, while the inference thread sends the result through the sync sender.
At app startup:
let (_handle, sentence_encoder) =
SentenceEncoder::spawn();
app.manage(AppState {
db,
word_embeddings_db,
sentence_encoder,
base_dir: /* ... */,
}); The SentenceEncoder handle gets stored in Tauri's managed state. Every command that needs embeddings just calls state.sentence_encoder.encode() . From the command's perspective, it's a normal async call:
#[tauri::command]
pub async fn search_notes(
state: tauri::State<'_, AppState>,
query: String,
) -> Result<Vec<Note>, String> {
let output = state
.sentence_encoder
.encode(vec![query.clone()])
.await
.unwrap();
// ... vector search with the embedding
}The frontend never knows or cares that ML inference is happening on a separate thread. It just gets results.
A few properties make this robust:
Model isolation. The rust-bert model never crosses thread boundaries. It's created on the inference thread and dies on the inference thread. No Arc<Mutex<Model>> headaches.
Natural backpressure. The bounded channel (capacity 100) means the system gracefully handles bursts. If the model can't keep up, senders wait. No unbounded queue growing forever.
Clean shutdown. When the SentenceEncoder handle is dropped, the channel closes, the runner loop exits, and the thread joins. No leaked resources.
Zero-copy responses. Each request gets its own oneshot channel, so results go directly to the caller. No shared result buffer, no contention.
This pattern isn't specific to ML models. Any time you have a resource that:
Send + Sync ) ...you can use the same dedicated-thread-with-channels approach. Database connections, hardware interfaces, video encoders — the pattern applies anywhere.
The key insight is that channels are the bridge between sync and async worlds . You don't need to make everything async. You just need clean boundaries.
The full source for the SentenceEncoder is about 50 lines of Rust. It's one of the smallest files in the project, and one of the most important. Sometimes the best architecture is the simplest one.