(Hint: Here in this text is the word phone the phonetic phone, and not a phone device. And please do not confuse a phonetic phone with a phonetic phoneme, which is indeed something similar to a phonetic phone)
Today I'm going to tell you about my singable Text-To-Speech engine, which I also integrated into my own DAW. For simplicity's sake, I will primarily explain how MIDI events become the phone information data including f0 fundamental frequency envelopes, which my implementation of the MBROLA algorithm can process directly without re-parsing. And since a company is interested in my technology, where the DAW, which is quite well known, is also implemented in Object Pascal, I will not provide a code snippet this time. Let's start with the explanation of the procedure.
The first step is to extract all lyric text MIDI events from the remaining MIDI events and put them into a separate array, where each lyric text item has three fields, a text UTF8 string, a start time position 64-bit integer and an end time … (read more)