Speech to text api google Mp3

2

I want to convert an audio to text, but the audio is in mp3 and it has a duration of one hour I'm doing it asynchronously, what can I do, I have my code in this way:

public class PruebasSpeech {
    public static void main(String... args) throws Exception {  
        asyncRecognizeGcs("gs://cloud_at/2015-077.MP3");
    }

    public static void asyncRecognizeGcs(String gcsUri) throws Exception {


        // Instantiates a client with GOOGLE_APPLICATION_CREDENTIALS
        try (SpeechClient speech = SpeechClient.create()) {

            // Configure remote file request for Linear16
            RecognitionConfig config =
                RecognitionConfig.newBuilder()
                    .setEncoding(AudioEncoding.FLAC)
                    .setLanguageCode("es-CO")
                    .setSampleRateHertz(8000)
                    .build();
            RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();

            // Use non-blocking call for getting file transcription
            OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
                speech.longRunningRecognizeAsync(config, audio);
            while (!response.isDone()) {
                System.out.println("Waiting for response...");
                Thread.sleep(10000);
            }

            List<SpeechRecognitionResult> results = response.get().getResultsList();

            for (SpeechRecognitionResult result : results) {
                // There can be several alternative transcripts for a given chunk of speech. Just use the
                // first (most likely) one here.
                SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
                System.out.printf("Transcription: %s\n", alternative.getTranscript());
            }
        }
    }
}
    
asked by José Daniel Mesa 30.08.2018 в 22:37
source

1 answer

0

To correctly transcribe your audio you must have a valid encoding and match the one you are defining in the following lines:

...
        RecognitionConfig config =
            RecognitionConfig.newBuilder()
                .setEncoding(AudioEncoding.FLAC)
                .setLanguageCode("es-CO")
                .setSampleRateHertz(8000) // En esta línea
                .build();
...

If the Sample Rate does not match, the Google SDK will throw an error indicating that it does not match the one received in the header.

To correct this error, it is necessary to omit that line or place the correct value.

For more information on this topic you can check this link: Introduction to Audio Encoding

To know the details of your audio you can use the avprobe 2015-077.MP3 command to give you the complete information.

If the above does not work you can always transform your audio using a tool like ffmpeg

For example:

ffmpeg -i 2015-077.MP3 -c copy -acodec pcm_s16le -ar 8k -ac 1 -map a 2015-077_out.MP3

I leave this link Audio Types that will help you in case you take this option.

    
answered by 11.10.2018 в 00:54