JSAI2025

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4N3-GS-7] Vision, speech media processing:

Fri. May 30, 2025 2:00 PM - 3:40 PM Room N (Room 1009)

座長:品川 政太朗(SB Intuitions)

2:00 PM - 2:20 PM

[4N3-GS-7-01] Error Correction for Japanese Speech Recognition by Combining N-best Hypotheses and Large Language Models

〇Kengo Fujii1, Sonoyama Masashi1, Takahashi Ichiro1 (1. KONICA MINOLTA, INC.)

Keywords:ASR, Error Correction, LLM

Our company, which provides services utilizing speech recognition, recognizes the accuracy of speech recognition as essential for successful service deployment. While there are numerous methods to correct speech recognition results, this study focuses on scoring N-best hypotheses generated by a speech recognition model using large language models (LLMs) to correct Japanese speech recognition results. The scoring process demonstrated improvements in both WER (Word Error Rate) and CER (Character Error Rate).

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password