The tutor's generated response.
Optional
Optional separate text for on-screen display (may differ from spoken).
Estimated duration in seconds if available from the model.
The text to be synthesised and spoken aloud.
The tutor's generated response.