I am not sure how to do that - it would really require upstream work I think, but any pointers to similar handling are welcome I think.
Note it does require downloading voice model data when setting up. We will also be testing it as part of our I18N Test Week.