VoiceGuide IVR Software Main Page
Jump to content

Google TTS Streaming and ElevenLabs Integration in VoiceGuide

Recommended Posts

Two quick technical questions regarding TTS in VoiceGuide:

  1. Does VoiceGuide support streaming audio from Google TTS (for example starting playback while audio is still being generated), or is it limited to waiting for the full WAV file returned by SynthesizeSpeech?
    Are there any plans to support Google TTS streaming in the future?

  2. Are there any plans to support direct ElevenLabs TTS integration, especially using their low-latency or streaming API?

Currently it seems VoiceGuide waits for the complete audio file before playback, which introduces noticeable delay for longer prompts.

 

Share this post


Link to post

VoiceGuide supports Live Streaming from Google and from MS Azure.

Here is a version that has them both enabled:

https://www.voiceguide.com/release/VoiceGuide_7.7.11_260319_BCHXTWZP.exe

Please refer to the online "Configuring Text to Speech (TTS) help page for configuration details:

https://www.voiceguide.com/vghelp/source/html/config_tts.htm

The streamed versions were not included beforehand in public versions of VoiceGuide, but recently the delays in cloud service based TTS services have become lower, so those services can now be more practically used.

Please note that in most deployments the voice prompts played to caller can just be pre-recorded - and such files then be played to caller with no delay at all. This results in a significantly faster response and higher user satisfaction. Even free-flowing speech-recognition based systems can function with having the entire set of answers/variations for all situations pre-generated. A few hundred pre-generated prompts/variations are enough to make such systems sound live/personalised. And the callflow logic then just picks appropriate pre-generated/cached sound file instead of issuing a request to TTS. This approach saves money and gives a more responsive system.

 

 

To change to a different version of VoiceGuide:

  1. Stop VoiceGuide Service. VoiceGuide can be stopped by clicking on the VoiceGuide Service Monitor in the Windows's Icon Tray on bottom right of the taskbar and selecting "Stop".
  2. Exit all VoiceGuide programs. This  includes the Service Monitor applet in the Icon Tray area in bottom right of the screen,  as well as the Script Designer, Line Status Monitor, etc.
  3. Do NOT uninstall the previous VoiceGuide installation.
  4. Stop the Dialogic service using the Dialogic Configuration Manager (DCM), or Windows' Services Applet.
  5. Run the VoiceGuide install and install into same directory as existing installation.
  6. Start VoiceGuide service.

Note: Running a VoiceGuide install over the top of an existing install will NOT overwrite existing configuration or license files (Config.xml, ConfigLine.xml, VG.INI, etc) and will not remove any of users script or sound files, and will not remove any log files etc.

 

Share this post


Link to post

I have a follow-up question regarding the streaming-enabled version.

Does installing this version eliminate the creation of the temporary TTS file that was previously accessible via the variable:

$RV[tts_save_filename_last]

With the Live Streaming implementation, is the audio still written to a temporary file on disk, or is it streamed directly without creating a local file?

 

Share this post


Link to post

The temporary TTS file is still created.

That temporary file is used if the module times out awaiting input and the TTS generated message needs to be replayed. That most recently generated temporary sound file is then used for the replay instead of asking TTS service to re-generate same voice data.

You will be able to see that RV getting set in the vgEngine trace file to confirm.

Share this post


Link to post

That’s good news.

I’m generating a prompt library as you previously suggested, so having the temporary TTS file still created is actually beneficial in my case.

Share this post


Link to post

However, I noticed that my other question in this thread has not yet been addressed:

Are there any plans to support direct ElevenLabs TTS integration, especially using their low-latency or streaming API?

I would appreciate any information on this.

Share this post


Link to post

Currently there are no plans to add ElevenLabs. ElevenLabs only has official SDKs/Librarians for Node/Typescript and Python, and releases new versions of them often. VoiceGuide is coded in .NET/C++.

If ElevenLabs is required please contact sales@voiceguide.com to discuss. We would have to use ElevenLabs API direct, which is more work then if a suitable SDK/Library existed, so there would have to be a business case to proceed with this integration.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×