FluidInference Parakeet TDT CoreML

Current

FluidInference Parakeet TDT CoreML

A 0.6B parameter multilingual automatic speech recognition model optimized for Core ML inference on Apple hardware with support for 25 European languages.

Currency ID parakeet-tdt-0.6b-v3-coreml

Date Mar 16, 2026

Language English

Signal

FluidInference Parakeet TDT CoreML · Hugging Face (FluidInference/parakeet-tdt-0.6b-v3-coreml) · 2026-03-15

Context

This entry represents a Core ML conversion of the NVIDIA Parakeet TDT 0.6B model. The conversion enables execution of the automatic speech recognition (ASR) pipeline on Apple Silicon devices without requiring cloud connectivity. The model is part of the FluidAudio ecosystem, which provides batch ASR services and utilizes this specific variant for its backend processing. The base architecture relies on NVIDIA NeMo libraries, specifically the Transducer and FastConformer components.

Relevance

The model addresses the demand for low-latency, privacy-preserving speech processing on edge devices. By converting a 0.6B parameter transformer to Core ML, it reduces memory footprint and inference latency on consumer hardware. Support for 25 European languages indicates a focus on regional multilingual capability rather than global coverage. The CC-BY-4.0 license allows for redistribution and modification, aligning with open infrastructure principles.

Current State

The model is hosted on Hugging Face with active download traffic exceeding 144,000 instances. The repository includes conversion scripts for reproducible Core ML generation and benchmark data for transcription accuracy. It is tagged as automatic-speech-recognition and hf-asr-leaderboard compatible. The model is designed for offline operation, removing dependency on external inference APIs.

Open Questions

How does the Core ML conversion handle updates to the underlying NeMo architecture compared to PyTorch versions?
What is the maintenance cadence for the FluidAudio conversion scripts relative to upstream NVIDIA releases?
How does performance scale across different Apple Silicon generations (M1 vs M3) regarding memory constraints?
Can this model be integrated into existing agent frameworks (e.g., OpenClaw, CrewAI) for voice-first workflows?

Connections

This entry functions as a specific implementation within the local-inference-baseline circuit, demonstrating the shift toward ordinary local infrastructure for multimodal tasks. It operates alongside ollama and lm-studio as an alternative runtime for personal hardware, though specialized for audio rather than general text. The model lineage traces back to NVIDIA NeMo, distinguishing it from generic open-weight models by its specific optimization for transducer-based speech recognition.

Openflows