Paper: arXiv
Comparsion among speech generated by STRAIGHT, WaveNet, WaveRNN and HiNet
A. Natural acoustic features as input for speaker slt.
Natural | STRAIGHT | WaveNet | WaveRNN | HiNet |
---|---|---|---|---|
Example 1 | ||||
Example 2 | ||||
Example 3 | ||||
Example 4 | ||||
Example 5 | ||||
B. Natural acoustic features as input for speaker bdl.
Natural | STRAIGHT | WaveNet | WaveRNN | HiNet |
---|---|---|---|---|
Example 1 | ||||
Example 2 | ||||
Example 3 | ||||
Example 4 | ||||
Example 5 | ||||
C. Predicted acoustic features as input for speaker slt.
STRAIGHT | WaveNet | WaveRNN | HiNet |
---|---|---|---|
Example 1 | |||
Example 2 | |||
Example 3 | |||
Example 4 | |||
Example 5 | |||
D. Predicted acoustic features as input for speaker bdl.
STRAIGHT | WaveNet | WaveRNN | HiNet |
---|---|---|---|
Example 1 | |||
Example 2 | |||
Example 3 | |||
Example 4 | |||
Example 5 | |||
Comparsion among speech generated by HiNet, NSF, HiNet-S and HiNet-S-GAN
A. Natural acoustic features as input for speaker slt.
Natural | HiNet | NSF | HiNet-S | HiNet-S-GAN |
---|---|---|---|---|
Example 1 | ||||
Example 2 | ||||
Example 3 | ||||
Example 4 | ||||
Example 5 | ||||
B. Natural acoustic features as input for speaker bdl.
Natural | HiNet | NSF | HiNet-S | HiNet-S-GAN |
---|---|---|---|---|
Example 1 | ||||
Example 2 | ||||
Example 3 | ||||
Example 4 | ||||
Example 5 | ||||
1. Impact of the amount of training data on the HiNet vocoder
Comparsion between speech generated by HiNet-S-GAN for CN-S and CN-S
Natural acoustic features as input.
Natural | HiNet-S-GAN (CN-S) | HiNet-S-GAN (CN-L) |
---|---|---|
Example 1 | ||
Example 2 | ||
Example 3 | ||
Example 4 | ||
Example 5 | ||
2. Comparison between GAN-based ASP and conventional one
Comparsion between speech generated by HiNet-S-GAN and STR-ASP+PSP-S-GAN for slt, CN-S and CN-S
A. Natural acoustic features as input for speaker slt.
Natural | HiNet-S-GAN | STR-ASP+PSP-S-GAN |
---|---|---|
Example 1 | ||
Example 2 | ||
Example 3 | ||
Example 4 | ||
Example 5 | ||
B. Natural acoustic features as input for speaker CN-S.
Natural | HiNet-S-GAN | STR-ASP+PSP-S-GAN |
---|---|---|
Example 1 | ||
Example 2 | ||
Example 3 | ||
Example 4 | ||
Example 5 | ||
C. Natural acoustic features as input for speaker CN-L.
Natural | HiNet-S-GAN | STR-ASP+PSP-S-GAN |
---|---|---|
Example 1 | ||
Example 2 | ||
Example 3 | ||
Example 4 | ||
Example 5 | ||
3. Effects of GMN
Comparsion between speech generated by HiNet-S and HiNet-S-woGMN.
Natural acoustic features as input for speaker slt.
Natural | HiNet-S | HiNet-S-woGMN |
---|---|---|
Example 1 | ||
Example 2 | ||
Example 3 | ||
Example 4 | ||
Example 5 | ||
4. Effects of pre-calculated initial phase
Comparsion between speech generated by HiNet-woPCP and HiNet
Natural acoustic features as input for speaker slt.
Natural | HiNet-woPCIP | HiNet |
---|---|---|
Example 1 | ||
Example 2 | ||
Example 3 | ||
Example 4 | ||
Example 5 | ||
5. Effects of components in loss functions
Comparsion among HiNet-L1, HiNet-L2, HiNet-L3 and HiNet
Natural acoustic features as input for speaker slt.
Natural | HiNet-L1 | HiNet-L2 | HiNet-L3 | HiNet |
---|---|---|---|---|
Example 1 | ||||
Example 2 | ||||
Example 3 | ||||
Example 4 | ||||
Example 5 | ||||