Transcribe WhatsApp Audio Messages with Whisper AI via Groq

Last edited 58 days ago

WhatsApp Audio Transcriber Bot

Overview

Automatically transcribe WhatsApp audio messages to text using AI-powered speech recognition. This workflow receives audio messages via webhook, processes them through Groq's Whisper API, and replies with the transcribed text in the same conversation.

Use Cases

  • Accessibility: Help users with hearing impairments access audio content
  • Workplace Communication: Quickly scan audio messages in professional settings
  • Language Learning: Get text versions of audio for better comprehension
  • Meeting Notes: Convert voice messages to searchable text format
  • Multilingual Support: Transcribe audio in Portuguese (configurable for other languages)

How it Works

  1. Message Reception: Webhook receives WhatsApp messages in real-time
  2. Audio Detection: Filters only audio messages using Switch node
  3. Format Conversion: Converts base64 audio to MP3 file format
  4. AI Transcription: Processes audio through Groq API with Whisper Large V3 model
  5. Response Delivery: Sends transcribed text back to the original conversation

Key Features

  • Real-time Processing: Instant transcription of incoming audio messages
  • High Accuracy: Uses Whisper Large V3 model for reliable transcription
  • Auto-Reply: Automatically responds in the same WhatsApp conversation
  • Message Quoting: References the original audio message in the reply
  • Portuguese Optimized: Configured for Brazilian Portuguese transcription
  • Self-Message Filtering: Ignores messages sent by the bot itself

Prerequisites

Required Services

  • Evolution API: WhatsApp integration service
  • Groq API: AI transcription service (Whisper model)
  • n8n Instance: Workflow automation platform

API Keys & Configuration

  • Groq API key (set as environment variable: GROQ_API_KEY)
  • Evolution API instance properly configured
  • Webhook URL configured in Evolution API

Setup Instructions

  1. Import Workflow: Import the JSON workflow into your n8n instance
  2. Configure Environment: Set GROQ_API_KEY environment variable
  3. Setup Webhook: Configure Evolution API to send messages to the webhook endpoint
  4. Test Connection: Send a test audio message to verify the workflow

Workflow Nodes

  • Webhook: Receives WhatsApp messages from Evolution API
  • Edit Fields: Extracts relevant data (number, name, message, audio)
  • Switch: Filters only audio messages (audioMessage type)
  • Convert to File: Transforms base64 audio to MP3 format
  • HTTP Request: Sends audio to Groq API for transcription
  • Evolution API: Sends transcribed text back to WhatsApp

Configuration Options

Groq API Settings

  • Model: whisper-large-v3
  • Language: pt (Portuguese)
  • Temperature: 0 (maximum accuracy)
  • Response Format: json

Customization Options

  • Change language by modifying the language parameter
  • Adjust temperature for different accuracy/creativity balance
  • Modify response format for different output styles

Response Format

*Mensagem transcrita automaticamente.*
[Transcribed text content]

Technical Specifications

  • Input: Base64 encoded audio from WhatsApp
  • Output: Plain text transcription
  • Processing Time: Typically 2-5 seconds per audio message
  • Supported Audio: MP3 format (converted from WhatsApp audio)
  • Language: Portuguese (configurable)

Troubleshooting

  • No Response: Check Groq API key and webhook configuration
  • Poor Transcription: Ensure audio quality and check language settings
  • Error Messages: Monitor n8n execution logs for detailed error information

Version History

  • v0.0.1: Initial release with basic transcription functionality

New to n8n?

Need help building new n8n workflows? Process automation for you or your company will save you time and money, and it's completely free!