VoiceGuide IVR Software Main Page
Jump to content

Google TTS Streaming and ElevenLabs Integration in VoiceGuide

Recommended Posts

Two quick technical questions regarding TTS in VoiceGuide:

  1. Does VoiceGuide support streaming audio from Google TTS (for example starting playback while audio is still being generated), or is it limited to waiting for the full WAV file returned by SynthesizeSpeech?
    Are there any plans to support Google TTS streaming in the future?

  2. Are there any plans to support direct ElevenLabs TTS integration, especially using their low-latency or streaming API?

Currently it seems VoiceGuide waits for the complete audio file before playback, which introduces noticeable delay for longer prompts.

 

Share this post


Link to post

VoiceGuide supports Live Streaming from Google and from MS Azure.

Here is a version that has them both enabled:

https://www.voiceguide.com/release/VoiceGuide_7.7.11_260319_BCHXTWZP.exe

Please refer to the online "Configuring Text to Speech (TTS) help page for configuration details:

https://www.voiceguide.com/vghelp/source/html/config_tts.htm

The streamed versions were not included beforehand in public versions of VoiceGuide, but recently the delays in cloud service based TTS services have become lower, so those services can now be more practically used.

Please note that in most deployments the voice prompts played to caller can just be pre-recorded - and such files then be played to caller with no delay at all. This results in a significantly faster response and higher user satisfaction. Even free-flowing speech-recognition based systems can function with having the entire set of answers/variations for all situations pre-generated. A few hundred pre-generated prompts/variations are enough to make such systems sound live/personalised. And the callflow logic then just picks appropriate pre-generated/cached sound file instead of issuing a request to TTS. This approach saves money and gives a more responsive system.

 

 

To change to a different version of VoiceGuide:

  1. Stop VoiceGuide Service. VoiceGuide can be stopped by clicking on the VoiceGuide Service Monitor in the Windows's Icon Tray on bottom right of the taskbar and selecting "Stop".
  2. Exit all VoiceGuide programs. This  includes the Service Monitor applet in the Icon Tray area in bottom right of the screen,  as well as the Script Designer, Line Status Monitor, etc.
  3. Do NOT uninstall the previous VoiceGuide installation.
  4. Stop the Dialogic service using the Dialogic Configuration Manager (DCM), or Windows' Services Applet.
  5. Run the VoiceGuide install and install into same directory as existing installation.
  6. Start VoiceGuide service.

Note: Running a VoiceGuide install over the top of an existing install will NOT overwrite existing configuration or license files (Config.xml, ConfigLine.xml, VG.INI, etc) and will not remove any of users script or sound files, and will not remove any log files etc.

 

Share this post


Link to post

I have a follow-up question regarding the streaming-enabled version.

Does installing this version eliminate the creation of the temporary TTS file that was previously accessible via the variable:

$RV[tts_save_filename_last]

With the Live Streaming implementation, is the audio still written to a temporary file on disk, or is it streamed directly without creating a local file?

 

Share this post


Link to post

The temporary TTS file is still created.

That temporary file is used if the module times out awaiting input and the TTS generated message needs to be replayed. That most recently generated temporary sound file is then used for the replay instead of asking TTS service to re-generate same voice data.

You will be able to see that RV getting set in the vgEngine trace file to confirm.

Share this post


Link to post

That’s good news.

I’m generating a prompt library as you previously suggested, so having the temporary TTS file still created is actually beneficial in my case.

Share this post


Link to post

However, I noticed that my other question in this thread has not yet been addressed:

Are there any plans to support direct ElevenLabs TTS integration, especially using their low-latency or streaming API?

I would appreciate any information on this.

Share this post


Link to post

Currently there are no plans to add ElevenLabs. ElevenLabs only has official SDKs/Librarians for Node/Typescript and Python, and releases new versions of them often. VoiceGuide is coded in .NET/C++.

If ElevenLabs is required please contact sales@voiceguide.com to discuss. We would have to use ElevenLabs API direct, which is more work then if a suitable SDK/Library existed, so there would have to be a business case to proceed with this integration.

Share this post


Link to post

Unfortunately, after installing this version, Google TTS prompt generation is no longer working on our side. The same configuration works correctly in version 7.7.10, so the issue seems to be specific to this release.

We’re using identical settings, so this looks like a regression. Has anyone else experienced this?

155849.872   20  11   3     1 state [TTS test] Play TTS | 
155849.873   20  11   3     1       rvns  add   tts_save_filename_last|C:\Program Files (x86)\VoiceGuide\temp\tts_11_1.wav
155849.873   20  11   3     1       rvns  add   TTS test_tts_save_filename_last|C:\Program Files (x86)\VoiceGuide\temp\tts_11_1.wav
155849.873   20  11   3     1       tts    start strTtsFname=C:\Program Files (x86)\VoiceGuide\temp\tts_11_1.wav, bMakeTtsFile=True, rv_tts_EngineType=Google
155849.873   20  11   3     1       tts   gcp   start strTtsFname=C:\Program Files (x86)\VoiceGuide\temp\tts_11_1.wav, bMakeTtsFile=True
155849.873   20  11   3     1       tts   gcp   generate start:[Infolinia AI w sprzedaży odbiera i kwalifikuje leady 24/7, zadaje pytania wstępne, zapisuje dane w CRM i od razu kieruje gorące kontakty do handlowca. W marketingu pozwala mierzyć skuteczność kampanii (źródło połączenia, intencja, konwersje), automatycznie umawia konsultacje i zbiera zgody/briefy bez udziału człowieka.] fname:[C:\Program Files (x86)\VoiceGuide\temp\tts_11_1.wav], vgm=45, play_file_idx=1
155849.873   20  11   3     1 state [TTS test] tts gcp generate start (len=320) | 
155849.873   20  11   3     1       tts   gcp   taskGcpSynthesizeSpeech.Start call
155849.874   20  11   3     1       tts   gcp   taskGcpSynthesizeSpeech.Start call returned
155849.874   20  11   3     1       LineEvCallState CONNECTED stats update end
155849.879   33  11   3     1       tts   gcp   taskGcpSynthesizeSpeech beginning
155849.880   33  11   3     1       tts   gcp   taskGcpSynthesizeSpeech ini_GCP_TTS_LanguageCode: pl-PL
155849.880   33  11   3     1       tts   gcp   taskGcpSynthesizeSpeech ini_GCP_TTS_SsmlGender:   Female
155849.880   33  11   3     1       tts   gcp   taskGcpSynthesizeSpeech ini_GCP_TTS_Name:         pl-PL-Wavenet-E
155849.880   33  11   3     1       tts   gcp   AudioEncoding.Alaw as iRecFormatWavDefault=6
155849.880   33  11   3     1       tts   gcp   TaskQue_Scr_Add cmdPlayBytes GCP_PlayBytes_Start lPlayId=1353666821
155849.880   33  11   3     1       q_scr +     cmdPlayBytes lcode=1 scode=[GCP_PlayBytes_Start]
155849.880   20  11   3     1       q_scr run   cmdPlayBytes 1 GCP_PlayBytes_Start action_id=0, crn=0 [1353666821|0|0|0|0][|||||] 00:00:00 max:1|35,078
155849.881   21  11   3     1       q_tel run   cmd_PlayBytes 0 1|GCP_PlayBytes_Start [] 0 max:1|0
155849.906    8  11   3     1       tts   gcp   tts save to tts_filenameC:\Program Files (x86)\VoiceGuide\temp\tts_11_1.wav
155850.005   33  11   3     1 ERROR v7.7.11 - 7.7.9574.26724 (2026-03-19 14:50:38.46) taskGcpSynthesizeSpeech : Grpc.Core.RpcException: Status(StatusCode="InvalidArgument", Detail="Currently, only Chirp 3: HD voices are supported for streaming synthesis.", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"@1776002329.993000000","description":"Error received from peer ipv4:142.250.181.234:443","file":"..\..\..\src\core\lib\surface\call.cc","file_line":953,"grpc_message":"Currently, only Chirp 3: HD voices are supported for streaming synthesis.","grpc_status":3}") ---> Grpc.Core.Internal.CoreErrorDetailException: {"created":"@1776002329.993000000","description":"Error received from peer ipv4:142.250.181.234:443","file":"..\..\..\src\core\lib\surface\call.cc","file_line":953,"grpc_message":"Currently, only Chirp 3: HD voices are supported for streaming synthesis.","grpc_status":3}
   --- Koniec śladu stosu wyjątków wewnętrznych ---
   w System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   w System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   w Grpc.Core.Internal.ClientResponseStream`2.<MoveNext>d__5.MoveNext()
--- Koniec śladu stosu z poprzedniej lokalizacji, w której wystąpił wyjątek ---
   w System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   w System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   w System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
   w vgEngine.srLibUtilityFunctions.<>c__DisplayClass71_0.<<PrepareAndPlayWavAndTts>b__0>d.MoveNext()
155850.928    8   5   1       timer EV_TIMEOUT_AFTERIDLE_ALLOWOUT
155850.928    8   5   1             q_scr +     evScriptEvent lcode=9013 scode=[EV_TIMEOUT_AFTERIDLE_ALLOWOUT]
...
     
155920.859   34  11   3     1 ev    CallState GCEV_DISCONNECTED, crn=600000b, iEvent=0 ,16384,0,64, s1:, s2:, s3:, build_date: 2026-03-19 14:50:38.46

 

 

 

Share this post


Link to post

Trace shows that the error returned by Google TTS is:

Currently, only Chirp 3: HD voices are supported for streaming synthesis.

In this version of VoiceGuide the Google TTS is streamed, and Google right now only supports certain TTS voices when streaming the sound data.

The other version that you are comparing against - a v7.7.10 class release - is not calling Google TTS in streaming mode - so you can use the non-streaming TTS voices in that other version.

Please select a 'Chirp 3' category voice and try again. The list of voices provided by Google is saved in vgEngine trace file at beginning of trace file - downloaded from Google and listed as part of VoiceGuide service startup.

Share this post


Link to post

The WAV file generated during TTS synthesis is correct and has good audio quality. However, the audio played via streaming sounds distorted (muffled, “plastic-like”).

This suggests the issue is related specifically to the streaming playback, not the TTS generation itself.

Is there a way to disable streaming via a parameter and instead play the already generated WAV file (non-streaming mode)?

 

 

 

Recording 5.wav

Share this post


Link to post

This sounds more like an ALaw / uLaw mismatch. Please post ktTel trace capturing that call.

Share this post


Link to post

Hi,

Could you let me know if there is any way to disable logging of specific entries in VoiceGuide?

In particular, I would like to exclude:

  1. Raw audio (ecrec) being passed into the ASR engine, e.g.:
ecrec 512 : fe fe fe 7e 7e 7e 7e 7e
ecrec 512 : 7e 7e 7c 7e 7e 7e fe fe
  1. Google TTS streaming traces, e.g.:
tts gcp audio_content.Length=1595 totalAudioBytes=1595
cmdPlayBytes GCP_PlayBytes_Data

These entries significantly increase log size and reduce readability.

 

Which logging level or configuration (e.g. in VG.INI) should be used to suppress these entries?

Also, is there any documentation describing what is included in each logging level (1–10) defined in VG.INI?

 

Share this post


Link to post

ktTel trace shows sound data is encoded as ALaw:

901 133751.798 160192               audio_accumulator_obj->ReadWithTimeout returned data: d5 d5 d5 d5 d5 d5 d5 d5 d5 d5 d5 d5 d5 d5 d5 d5 
924 133751.829 160192               audio_accumulator_obj->ReadWithTimeout returned data: 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 

and the incoming streamed sound data arrives in time for the read calls from Dialogic:

 

incoming TTS data:

891 133751.798 29524               aa_write: processed 1595 queue 1 total 1595 sizes: [1595]
917 133751.829 29524               aa_write: processed 1920 queue 2 total 2491 sizes: [571, 1920]
941 133751.858 29524               aa_write: processed 1920 queue 2 total 2363 sizes: [443, 1920]
960 133751.889 29524               aa_write: processed 1920 queue 2 total 3259 sizes: [1339, 1920]
979 133751.921 29524               aa_write: processed 1920 queue 3 total 4155 sizes: [315, 1920, 1920]
998 133751.950 29524               aa_write: processed 1920 queue 3 total 5051 sizes: [1211, 1920, 1920]
017 133751.980 29524               aa_write: processed 1920 queue 4 total 5947 sizes: [187, 1920, 1920, 1920]
029 133752.010 29524               aa_write: processed 1920 queue 5 total 7867 sizes: [187, 1920, 1920, 1920, 1920]
041 133752.041 29524               aa_write: processed 1920 queue 6 total 9787 sizes: [187, 1920, 1920, 1920, 1920, 1920]
060 133752.071 29524               aa_write: processed 1920 queue 6 total 10683 sizes: [1083, 1920, 1920, 1920, 1920, 1920]
072 133752.117 29524               aa_write: processed 1920 queue 7 total 12603 sizes: [1083, 1920, 1920, 1920, 1920, 1920, 1920]
084 133752.140 29524               aa_write: processed 1920 queue 8 total 14523 sizes: [1083, 1920, 1920, 1920, 1920, 1920, 1920, 1920]
096 133752.163 29524               aa_write: processed 1920 queue 9 total 16443 sizes: [1083, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920]
etc.

read calls:

895 133751.798 160192         uio   uio_read 11 168CCB88 1024
902 133751.814 160192         uio   uio_read 11 168CB358 1024
925 133751.843 160192         uio   uio_read 11 168CCB88 1024
944 133751.875 160192         uio   uio_read 11 168CB358 1024
963 133751.907 160192         uio   uio_read 11 168CCB88 1024
982 133751.939 160192         uio   uio_read 11 168CB358 1024
001 133751.971 160192         uio   uio_read 11 168CCB88 1024
044 133752.067 160192         uio   uio_read 11 168CB358 1024
111 133752.195 160192         uio   uio_read 11 168CCB88 1024
191 133752.323 160192         uio   uio_read 11 168CB358 1024
246 133752.451 160192         uio   uio_read 11 168CCB88 1024
301 133752.580 160192         uio   uio_read 11 168CB358 1024
368 133752.708 160192         uio   uio_read 11 168CCB88 1024

 

The posted Recording 5.wav sound file sounds like there is some 'crackling' type sound on playback?

Please post the tts_11_1.wav file that has the saved sound data. It's in: C:\Program Files (x86)\VoiceGuide\temp\

 

 

To suppress "ecrec" traces the vgEngine log level in VG.INI has to be set to 7 or lower. 

To suppress "tts gcp audio_content" traces the vgEngine log level in VG.INI has to be set to 9 or lower. 

To suppress "cmdPlayBytes GCP_PlayBytes_Data" traces the vgEngine log level in VG.INI has to be set to 7 or lower. 

Share this post


Link to post

tts_11_1.wav

 

the issue appears to be related only to streaming.

The WAV file generated during TTS synthesis has correct audio quality. When we save this temporary WAV file to our library and play it in a subsequent call, it sounds normal.

The degradation occurs only during streaming playback.

This suggests the problem is specific to the streaming mechanism, not the TTS generation itself.

Our audio format is A-Law. (Poland)

 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×