Vue 3.5 + WebSocket 语音识别功能深度实现@慕课网原创_慕课网

这篇文章将详细介绍如何在前端使用 Vue 3.5，WebSocket 和 WangEditor 富文本编辑器集成完整的语音识别功能，涵盖从音频采集、格式转换到实时识别结果的插入，并提供一套完整且高效的技术解决方案。

一、功能定位与技术选型

1.1 核心需求拆解

实时语音转文字：支持通过麦克风采集音频，并实时返回识别结果插入到编辑器中。
良好的用户体验：提供录音状态的可视化、快捷键控制和错误提示。
兼容性保障：适配主流浏览器，结合现代 API 和降级方案，确保广泛兼容。
稳定性设计：提供连接诊断、错误处理及资源自动释放机制，确保稳定运行。

1.2 技术栈选型

框架：Vue 3.5 + Composition API，使用响应式特性高效管理组件状态。
编辑器：WangEditor，轻量且可扩展，便于与其他功能集成。
音频处理：AudioWorklet 和 ScriptProcessor，用于高效的音频格式转换。
通信方式：WebSocket，支持实时传输音频数据和识别结果。
UI 组件：Element Plus，提供错误提示、弹窗等交互组件。

选型核心考量：Vue 3.5 的响应式特性适合状态管理，WangEditor 提供的灵活 API 便于扩展，WebSocket 确保了数据传输的实时性，同时双重音频处理方案保障兼容性和性能。

二、整体架构设计

语音识别功能采用分层设计，清晰地划分各模块的职责，便于后续的维护与扩展：

UI 交互层 (NoteEditor.vue)：负责用户操作入口（如录音按钮）和结果插入逻辑。
核心控制层 (simpleSpeech.js)：封装录音控制、连接管理、消息处理等核心逻辑。
音频处理层 (audioPcmProcessor.js)：将麦克风采集的音频转换为识别服务支持的 PCM 格式。
通信层 (WebSocket)：通过 WebSocket 与后端语音识别服务实时交互。
诊断工具层 (speechConfigChecker.js)：检测环境兼容性和服务可用性，辅助问题排查。

三、核心模块实现详解

3.1 音频处理层：PCM 格式转换 (audioPcmProcessor.js)

语音识别服务要求输入 PCM 格式音频，而浏览器的麦克风采集默认格式为 Float32Array。为此，我们需要使用 AudioWorklet 实现音频格式转换，确保音频数据与识别服务兼容。

class AudioPcmProcessor extends AudioWorkletProcessor {
    constructor() {
        super();
        this.bufferSize = 4096; // 缓冲区大小
        this.buffer = new Float32Array(this.bufferSize);
        this.bufferIndex = 0;
    }
    process(inputs, outputs, parameters) {
        const input = inputs[0];
        if (input.length > 0) {
            const inputChannel = input[0];
            // 填充缓冲区
            for (let i = 0; i < inputChannel.length; i++) {
                this.buffer[this.bufferIndex] = inputChannel[i];
                this.bufferIndex++;
                if (this.bufferIndex >= this.bufferSize) {
                    this.processAndSendPcm();
                    this.bufferIndex = 0;
                }
            }
        }
        return true;
    }
    processAndSendPcm() {
        try {
            // PCM 格式转换
            const pcmData = new Int16Array(this.buffer.length);
            for (let i = 0; i < this.buffer.length; i++) {
                const sample = Math.max(-1, Math.min(1, this.buffer[i]));
                pcmData[i] = sample < 0 ? sample * 0x8000 : sample * 0x7fff;
            }
            // 使用 Transferable 对象传输，提升效率
            this.port.postMessage(
                { type: 'pcmData', data: pcmData.buffer, length: pcmData.length },
                [pcmData.buffer]
            );
        } catch (error) {
            this.port.postMessage({ type: 'error', error: error.message });
        }
    }
}
registerProcessor('audio-pcm-processor', AudioPcmProcessor);

核心亮点：

使用缓冲区批量处理音频数据，减少通信开销。
采用 Transferable 对象，提升数据传输效率。
严格限制音频采样范围，避免失真。

3.2 核心控制层：语音识别封装 (simpleSpeech.js)

该模块封装了录音控制、WebSocket 连接、消息处理等逻辑，并提供简洁的 API 供外部调用。

export class SimpleSpeech {
    constructor(options = {}) {
        const baseUrl = import.meta.env.VITE_API_BASE_URL || 'http://localhost:3000';
        this.serverUrl = `${baseUrl.replace('http', 'ws')}/api/v1/ai/speech-recognition`;
        this.sampleRate = options.sampleRate || 16000;
        this.model = options.model || 'bigmodel';
        this.isRecording = false;
        this.isConnected = false;
        this.isReady = false;
        this.audioContext = null;
        this.audioStream = null;
        this.workletNode = null;
        this.ws = null;
        this.onReady = options.onReady || (() => {});
        this.onPartial = options.onPartial || (() => {});
        this.onFinal = options.onFinal || (() => {});
        this.onError = options.onError || (() => {});
    }

    // 建立 WebSocket 连接
    async connect() {
        try {
            const url = `${this.serverUrl}?sampleRate=${this.sampleRate}&model=${this.model}`;
            this.ws = new WebSocket(url);
            this.ws.onopen = () => {
                this.isConnected = true;
                this.onStatusChange?.('connected');
            };
            this.ws.onmessage = (event) => {
                const data = JSON.parse(event.data);
                this.handleMessage(data); 
            };
            this.ws.onclose = () => {
                this.isConnected = false;
                this.isReady = false;
                this.onStatusChange?.('disconnected');
            };
        } catch (error) {
            this.onError('连接失败: ' + error.message);
        }
    }

    // 开始录音
    async startRecording() {
        if (!this.isConnected || !this.isReady) {
            this.onError('请先连接语音识别服务');
            return;
        }
        try {
            this.audioStream = await navigator.mediaDevices.getUserMedia({ audio: { sampleRate: this.sampleRate, channelCount: 1 } });
            this.audioContext = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: this.sampleRate });
            const audioSource = this.audioContext.createMediaStreamSource(this.audioStream);
            
            try {
                await this.audioContext.audioWorklet.addModule('/audioPcmProcessor.js');
                this.workletNode = new AudioWorkletNode(this.audioContext, 'audio-pcm-processor');
                this.workletNode.port.onmessage = (event) => {
                    if (event.data.type === 'pcmData' && this.ws?.readyState === WebSocket.OPEN) {
                        this.ws.send(event.data.data); // 发送 PCM 数据
                    }
                };
                audioSource.connect(this.workletNode);
            } catch (error) {
                this.createScriptProcessor(audioSource); // 降级方案
            }
            this.isRecording = true;
        } catch (error) {
            this.onError('录音启动失败: ' + error.message);
        }
    }

    // 停止录音
    stopRecording() {
        if (!this.isRecording) return;
        if (this.workletNode) this.workletNode.disconnect();
        if (this.audioStream) {
            this.audioStream.getTracks().forEach(track => track.stop());
            this.audioContext.close();
        }
        this.isRecording = false;
    }

    handleMessage(data) {
        switch (data.type) {
            case 'ready':
                this.isReady = true;
                this.onReady();
                break;
            case 'incremental_result':
                this.onPartial({ text: data.text, timestamp: data.timestamp });
                break;
            case 'final_result':
                this.onFinal({ text: data.text, timestamp: data.timestamp });
                break;
            case 'error':
                this.onError(this.formatErrorMsg(data.message));
                break;
        }
    }

    formatErrorMsg(message) {
        if (message.includes('缺少配置')) return '后端缺少配置，请设置 API Key';
        if (message.includes('连接失败')) return '无法连接服务，请检查网络';
        return message;
    }
}

3.3 UI 交互层：编辑器集成 (NoteEditor.vue)

在 Vue 组件中集成 WangEditor 与语音识别核心逻辑。

模板部分

<template>
    <div class="note-editor">
        <div class="editor-toolbar">
            <Toolbar :editor="editorRef" :defaultConfig="toolbarConfig" :mode="mode" />
        </div>
        <div class="editor-content">
            <Editor v-model="valueHtml" :defaultConfig="editorConfig" @onCreated="handleCreated" />
        </div>
        
        <div v-if="isSpeechRecording" class="speech-recording-indicator">
            <div class="recording-animation">
                <div class="pulse-ring"></div>
                <div class="microphone-icon"></div>
            </div>
            <div class="recording-text">正在录音中...</div>
            <button @click="stopSpeechRecognition" class="stop-btn">停止录音</button>
            <div class="esc-hint">按 ESC 键停止</div>
        </div>
    </div>
</template>

逻辑部分

<script setup>
import { ref } from 'vue'

// 语音识别相关状态
const speechRecognition = ref(null)
const isSpeechRecording = ref(false)

// 初始化语音识别实例
const initSpeechRecognition = () => { /* ... */ }
// 开始语音识别
const startSpeechRecognitionDirect = async () => { /* ... */ }
// 停止语音识别
const stopSpeechRecognition = () => { /* ... */ }
// 插入识别结果到编辑器
const insertSpeechText = (text) => { /* ... */ }
</script>

四、总结

通过本文的实现，我们成功将语音识别功能集成到了 Vue 3.5 和 WangEditor 基础上。关键技术亮点包括：

高效的音频格式转换，确保识别服务兼容性。
完善的状态管理与错误处理，提升用户体验。
自动化配置诊断工具，减少问题排查成本。

延伸阅读与完整实现

想看更完整的项目实现与可直接复用的代码结构，可以扫码下方二维码了解更多

Vue 3.5 + WebSocket 语音识别功能深度实现

一、 功能定位与技术选型