I am starting a voice recognition project. I want to work with a text read aloud by me, and then introduce it to my program so that I translate it into ASCII text.
As I have started, I would like to know how I can segment the continuous wave into fragments of n milliseconds in duration.
I have researched and that is called a time window, but I have not found a way to do it.
I want to make a time window to see if I can get the phonemes of the voice and apply some algorithm.
I am working with the C ++ language, if you could give me an example, I would be grateful.
Thanks in advance.