Back to blog

I Built a Transcription App for My iPad Using Free Apple Tools

7 min readTools & Automation

I've been running Whisper on my Mac and Windows machines to transcribe course videos. The Mac uses faster-whisper with Python, the Windows machine uses CUDA on a GTX 1080 Ti. Both work great. But I wanted to run Whisper on my iPad too.

Not through a browser. Not by SSH-ing into another machine. Actually running it natively on the iPad itself.

My iPad is an M1 iPad Pro 11-inch with 8GB RAM. The M1 chip is the same architecture Apple uses in their Macs. It can handle local AI inference. The question was whether I could actually build and deploy an app to use it without paying $99/year for an Apple Developer account.

Turns out you can. It's not pretty, but it works.

The Stack

The secret ingredient is whisper.cpp. It's a C/C++ port of OpenAI's Whisper model that runs entirely on-device with no server, no API calls, no internet required. It has first-class support for Apple Silicon through Core ML and the ANE (Apple Neural Engine).

The app itself is SwiftUI (Apple's declarative UI framework). whisper.cpp ships with a working SwiftUI example app. The plan was to take that example, strip it down, and add the features I needed: batch file processing, progress tracking, and text export.

For signing, I'd use a free Apple ID through Xcode. The catch: free signing expires every 7 days. I'll get to that.

The Build

The initial build was straightforward. Clone whisper.cpp, download the small.en model (~466MB, the right balance for 8GB RAM), build the XCFramework, and open the SwiftUI project in Xcode.

I had to install Xcode first. It's free from the Mac App Store but it's a 12GB download. After that, the build steps took about 45 minutes, including writing the custom Swift code.

The features I added on top of the stock example:

  • Folder selection that recursively finds all audio files
  • Batch queue with checkboxes so you can skip files
  • Sequential processing (one at a time to respect the 8GB memory limit)
  • Progress bar showing overall and per-file status
  • Auto-save transcripts as .txt files
  • Export all transcripts at once

I built the app to work with a USB drive workflow. I convert videos to WAV files on my Mac using ffmpeg, copy the WAVs to a USB drive, plug the drive into the iPad, and let it process through the queue.

The Three Bugs That Nearly Killed It

The build compiled clean. The app deployed to the iPad. But the transcription results were blank. Every single file came back empty. What followed was three hours of debugging across three separate bugs.

Bug 1: The Model Wasn't Actually Loading

The status bar showed green. "Model loaded." But the Resources/models folder was empty. The app was showing a success state for a model that didn't exist. The UI was lying to me.

Fix: Actually bundle the ggml-small.en.bin file into the Xcode project, and gate the green status on the model load actually succeeding instead of just setting it to true optimistically.

Bug 2: Float32 vs Int16

The audio converter was writing WAV files in Float32 PCM format. But whisper.cpp's decoder reads Int16 PCM. So it was interpreting float data as integer data, producing complete garbage, which resulted in empty transcripts.

The fix was a one-line change in the audio format: switch from pcmFormatFloat32 to pcmFormatInt16. One format flag. Hours of head-scratching.

Bug 3: WAV Header Offset

Even after fixing the format, some files still failed. The decoder was hardcoding byte offset 44 for the audio data start. That's where the standard WAV header ends. But ffmpeg-written WAVs often have extra metadata chunks that push the audio data past offset 44.

Fix: Scan for the actual data chunk in the header instead of assuming it starts at a fixed position.

After those three fixes, the app finally worked. Audio file formats will lie to you constantly. Trust nothing. Verify everything.

The Security-Scope Problem

There was another issue I didn't expect. When the user selects a folder through the iOS document picker, iOS grants temporary security-scoped access to those files. The access expires when the app returns to the main thread.

In practice, this meant the app could read the file list and display it in the queue, but by the time transcription started, the access had expired and every file read failed silently.

My first fix was to copy all the WAVs into the app's sandbox before the access expired. That worked, but it meant transcripts saved to the app's Documents folder instead of back to the USB drive. Then I had to worry about flattening the folder structure so files from different directories didn't overwrite each other.

The better fix was to hold the security-scoped access for the entire session. Read directly from USB, write transcripts directly back to USB, no sandbox copying. One startAccessingSecurityScopedResource call held open for the duration of the batch.

The 7-Day Problem

This is the part nobody warns you about. With a free Apple ID, apps you build and deploy yourself expire every 7 days. After that, they simply won't open. The app is still installed, but iOS refuses to launch it.

The fix is to rebuild and redeploy from Xcode. Connect the iPad, press Cmd+R, wait two minutes. Done. The signing refreshes for another 7 days.

For a personal tool that I use in bursts, this is fine. I transcribe for a few days, then don't touch it for a week or two. When I need it again, I rebuild and deploy. The whole process takes about 90 seconds.

If you want to automate the refresh, there's an app called SideStore that handles it in the background. I haven't set that up yet since the manual refresh is fast enough for how I use it.

What It's Good For

The M1 iPad Pro with the small.en model handles single-speaker, clear audio well. Course videos, podcasts, interviews with decent recording quality. A 45-minute video takes roughly 20-30 minutes to transcribe.

Multi-speaker content with varying audio quality and heavy accents will choke on small.en. For that, you'd need the medium or large model, but 8GB RAM gets tight with larger models. The Mac with its 32GB unified memory is better suited for those files.

The iPad shines as a portable transcription station. Plug in a USB drive, let it work through a batch, export the transcripts when it's done. Everything stays on the device. No cloud uploads, no API costs, no internet required.

The Takeaway

You don't need a developer account to build useful native apps for your iPad. whisper.cpp plus SwiftUI plus a free Apple ID gets you a fully functional, on-device AI tool. It takes some MacGyvering (the bugs alone cost me an afternoon), but the result is a transcription tool that runs entirely offline on hardware I already owned.

The 7-day signing limit is annoying but manageable. You also have to pick the right model size for 8GB of RAM. And iOS sandbox security will fight you when you try to read from external drives.

But it works. And that's the part that matters.

If you're building something and keep running into walls, that's what I do.

Share

More writing