Are you using traditional analog lines to place the outgoing calls?
On analog lines the call is detected as being answered only after the call recipient says something (eg: "Hello" / etc.)
On VoIP and T1/E1 systems you would know the exact time the handset is picked up (or 'answer call' button is pressed/swiped, etc.), so on VoIP and T1/E1 systems it is possible to stat the playing of the sound file immediately - as the recipients telephone handset is being lifted up to their ear and before they even have a chance to say anything.
Distinguishing between answering machines and live answer requires that Dialogic/VoiceGuide wait for something to be said/played by the call recipient first.
Once call recipient says something then Dialogic/VoiceGuide will take about 1 second to determine whether what was said came from a live person or from and answering machine.
If you are seeing 4-6 second delay times then most likely the call recipient did not actually say anything after answering the call - and Dialogic/VoiceGuide is still waiting to hear something. Eventually Dialogic/VoiceGuide realises that it cannot hear the called number ringing anymore and hence the call must have been answered (by someone who did not say anything when answering call), and then the outgoing script is started.