PhoTranscriptor - A Free Vietnamese Speech-to-Text Tool for Researchers

Lately, I've been buried under hours of audio recordings—each file stretching close to or beyond an hour. Naturally, I went on a hunt for a reliable Vietnamese speech-to-text tool to make my life easier.

That's when I stumbled upon PhoWhisper (yes, it's pronounced "Phở" 😆). It's a fine-tuned version of the original Whisper model by OpenAI, trained on 844 hours of Vietnamese audio covering a wide range of accents. The result? Excellent transcription quality for Vietnamese speech.

But here's the catch:
For non-coders, setting it up and running the model locally can be quite a hassle. Plus, if your computer doesn't have a GPU, the transcription process can be painfully slow.

To remove the technical hurdles, I wrapped PhoWhisper in a clean point-and-click Google Colab interface that anyone can use. The moment you hit Upload, a Google Colab session spins up in the background, grabs a free GPU, and powers through your audio in a fraction of the usual time before returning a neatly formatted transcript. No installs, no commands, just drag, drop, and let the magic happen.

For context, transcribing a 40-minute audio clip on my GPU-less laptop used to take 4 hours—now it's down to just 10 minutes on Colab!

Why you'll love this tool:

Free to use
High-quality Vietnamese speech recognition (approx. 10% error rate)
Handles long and large audio files
Secure and private — your audio files are automatically deleted after session
Fast processing with Google Colab (limited by free-tier quotas)

🛠️ How to Use PhoTranscriptor (Google Colab Version)

Step 1: Open the notebook

Go to this link:
👉 PhoWhisper on Google Colab

Step 2: Make a copy to your Google Drive

Click File → Save a copy in Drive
This allows you to run the notebook on your own Google account.

Step 3: Run the setup cells

Press the Play button (▶️) next to each code cell, starting from the top.
Wait for the environment to finish setting up (it will install necessary libraries and load the model).
When complete, a public URL (usually starting with https://) will appear.

Step 4: Click the public URL to open the interface

This opens the drag-and-drop web interface in a new tab.
You can now interact with the transcription tool visually.

Step 5: Upload your audio file

In the interface, drag and drop your .mp3, .wav, or other audio file formats.
The file is stored temporarily and will be deleted once the browser session ends.

Step 6: Start the transcription

Click the Transcribe button in the interface.
You'll see real-time progress of the transcription.

Step 7 (Optional): Copy and edit the transcript

Once finished, the transcript will appear on the screen.
Copy it to your editor and listen back to the audio if you'd like to clean it up.

Try it out:

📺 Step-by-step video tutorial:
Watch Video Tutorial

Feel free to make a copy of the Colab notebook to your own Google Drive to get started. I've included all the instructions in the video as well.

Happy transcribing and good luck with your research!