Speech to Text Guide

Overview

Transform any audio recording into text with Fish Audio’s speech recognition. Perfect for transcriptions, subtitles, and voice commands.

Getting Started

Web Interface

Transcribe audio instantly:

Visit Fish Audio

Go to fish.audio and log in

Navigate to Transcribe

Click on “Speech to Text” in your dashboard

Upload Audio

Select your audio file (MP3, WAV, M4A)

Get Transcription

Click “Transcribe” and copy your text

Supported Formats

Audio Files

Accepted formats:

MP3 (recommended)
WAV
M4A
OGG
FLAC
AAC

File requirements:

Maximum size: 20MB
Maximum duration: 60 minutes
Minimum duration: 1 second

Language Support

Automatic Detection

The system automatically detects the language spoken in your audio. No configuration needed!

Manual Selection

For better accuracy, specify the language: Major Languages:

English (en)
Chinese (zh)
Japanese (ja)

With additional languages to be supported soon!

Audio Quality Tips

For Best Results

Recording Environment:

Quiet room with minimal echo
No background music
Clear, consistent speaking voice
One speaker at a time

Audio Settings:

Sample rate: 16kHz or higher
Bit rate: 128kbps or higher
Mono or stereo (mono preferred)

Common Issues

Poor transcription quality?

Remove background noise
Increase microphone volume
Speak clearly and not too fast
Avoid multiple speakers talking over each other

Use Cases

Meeting Transcription

Convert recorded meetings into searchable text:

Record your meeting (Zoom, Teams, etc.)
Export the audio file
Upload to Fish Audio
Get formatted transcription with timestamps

Podcast Transcripts

Create written versions of your podcasts:

Generate show notes automatically
Create searchable content
Improve accessibility
Enable translations

Video Subtitles

Generate subtitles for your videos:

Extract audio from video
Transcribe with Fish Audio
Get timestamped text
Import into video editor

Voice Notes

Convert voice memos to text:

Dictate ideas quickly
Transcribe later for editing
Search through voice notes
Share as text documents

Advanced Features

Timestamps

Get precise timing for each spoken segment:

[00:00:00] Welcome to our podcast.
[00:00:03] Today we're discussing AI technology.
[00:00:07] Let's dive right in.

Perfect for:

Creating subtitles
Navigating long recordings
Synchronizing with video
Building searchable archives

Speaker Detection

Identify different speakers in conversations:

Speaker 1: "What do you think about the proposal?"
Speaker 2: "I think it has potential."
Speaker 1: "Let's discuss the details."

Punctuation & Formatting

Automatic formatting includes:

Sentence capitalization
Punctuation marks
Paragraph breaks
Number formatting

Tips for Different Content

Interviews

Best practices:

Use a good microphone for each speaker
Record in a quiet environment
Speak one at a time
Keep consistent volume levels

Lectures & Presentations

Optimize for:

Clear articulation of technical terms
Pause between topics
Repeat important points
Avoid reading too fast

Phone Calls

Considerations:

Phone audio is lower quality
Expect slightly lower accuracy
Speak clearly and slowly
Avoid speakerphone if possible

Accuracy Expectations

What Affects Accuracy

Positive factors:

Clear audio quality
Native speaker accent
Common vocabulary
Single speaker

Challenging factors:

Heavy accents
Technical jargon
Multiple speakers
Background noise

Typical Accuracy Rates

Professional recording: 95-98%
Clean amateur recording: 90-95%
Phone/video calls: 85-90%
Noisy environments: 75-85%

Post-Processing Tips

Editing Transcriptions

After transcription:

Review for accuracy - Check names and technical terms
Add formatting - Break into paragraphs
Correct errors - Fix any misheard words
Add context - Include speaker names

Export Options

Save your transcriptions as:

Plain text (.txt)
Word document (.docx)
Subtitle file (.srt)
PDF document

Common Applications

Business

Meeting minutes
Interview transcripts
Call recordings
Training materials

Education

Lecture notes
Research interviews
Student recordings
Language learning

Content Creation

Video scripts
Podcast show notes
Social media captions
Blog post drafts

Accessibility

Hearing impaired support
Multi-language content
Searchable archives
Documentation

Troubleshooting

No Text Output

Check:

Audio file isn’t corrupted
File format is supported
Audio contains speech
Volume is audible

Incorrect Language

Solutions:

Manually select the correct language
Ensure majority of audio is in one language
Separate multi-language content

Missing Words

Common causes:

Speaking too fast
Mumbling or unclear speech
Technical terms not recognized
Very quiet sections

Privacy & Security

Your Data

Audio files are processed securely
Transcriptions are private to your account
Files are not used for training
Delete anytime from your account

Sensitive Content

For confidential audio:

Use on-premise solutions if available
Review privacy policy
Consider redacting sensitive information
Download and delete after processing

Best Practices Summary

Start with quality audio - Good input = good output
Choose the right environment - Quiet spaces work best
Speak clearly - Articulate and consistent pace
Review and edit - All transcriptions benefit from review
Use appropriate tools - Different content needs different approaches

Get Support

Need help with transcription?

Try it free: fish.audio
Community: Discord
Email: support@fish.audio
Status: status.fish.audio

Getting Started

Models & Pricing

Core Features

Developer SDKs

Best Practices

Product Guides

Self-Hosting

Integrations

Tutorials

Resources

Documentation Index

​Overview

​Getting Started

​Web Interface

​Supported Formats

​Audio Files

​Language Support

​Automatic Detection

​Manual Selection

​Audio Quality Tips

​For Best Results

​Common Issues

​Use Cases

​Meeting Transcription

​Podcast Transcripts

​Video Subtitles

​Voice Notes

​Advanced Features

​Timestamps

​Speaker Detection

​Punctuation & Formatting

​Tips for Different Content

​Interviews

​Lectures & Presentations

​Phone Calls

​Accuracy Expectations

​What Affects Accuracy

​Typical Accuracy Rates

​Post-Processing Tips

​Editing Transcriptions

​Export Options

​Common Applications

​Business

​Education

​Content Creation

​Accessibility

​Troubleshooting

​No Text Output

​Incorrect Language

​Missing Words

​Privacy & Security

​Your Data

​Sensitive Content

​Best Practices Summary

​Get Support

Overview

Getting Started

Web Interface

Supported Formats

Audio Files

Language Support

Automatic Detection

Manual Selection

Audio Quality Tips

For Best Results

Common Issues

Use Cases

Meeting Transcription

Podcast Transcripts

Video Subtitles

Voice Notes

Advanced Features

Timestamps

Speaker Detection

Punctuation & Formatting

Tips for Different Content

Interviews

Lectures & Presentations

Phone Calls

Accuracy Expectations

What Affects Accuracy

Typical Accuracy Rates

Post-Processing Tips

Editing Transcriptions

Export Options

Common Applications

Business

Education

Content Creation

Accessibility

Troubleshooting

No Text Output

Incorrect Language

Missing Words

Privacy & Security

Your Data

Sensitive Content

Best Practices Summary

Get Support