Publications

Journal Paper

  1. Xuan Luo, Shinnosuke Takamichi, Yuki Saito, Tomoki Koriyama, Hiroshi Saruwatari,
    ``Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence,''
    APSIPA Transactions on Signal and Information Processing, vol.13, no.1.(2024) [official]
  2. Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari,
    ``Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation,''
    Speech Communication, vol.132, pp.132-145.(2021) [official] [code]
  3. Shinnosuke Takamichi, Ryosuke Sonobe, Kentaro Mitsui, Yuki Saito, Tomoki Koriyama, Naoko Tanji, Hiroshi Saruwatari,
    ``JSUT and JVS: free Japanese voice corpora for accelerating speech synthesis research,''
    Acoustical Science and Technology, vol.41, no.5, pp.761-768.(2020) [official]
  4. Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
    ``Generative moment matching network-based neural double-tracking for synthesized and natural singing voices,''
    IEICE Transactions on Information and Systems, vol.E103.D, pp.639-647.(2020) [official]
  5. Tomoki Koriyama, Takao Kobayashi,
    ``Statistical Parametric Speech Synthesis Using Deep Gaussian Processes,''
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.27, no.5, pp.948-959. (May 2019) [official] [demo]
  6. Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
    ``GPR-based Thai speech synthesis using multi-level duration prediction,''
    Speech Communication, vol.99, pp.114-123. (May 2018) [official]
  7. Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi,
    ``Speaker Adaptation Using Shared Context Clustering for Cross-lingual Speech Synthesis,''
    The IEICE Transactions on Information and Systems (Japanese Edition), vol.J100-D, no.3, pp.385-393. (Mar. 2017) (In Japanese) [official]
  8. Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi,
    ``HMM-based expressive singing voice synthesis with singing style control and robust pitch modeling,''
    Computer Speech & Language, vol.34, no.1, pp.308-322. (Nov. 2015) [official]
  9. Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
    ``Statistical Parametric Speech Synthesis Based on Gaussian Process Regression,''
    IEEE Journal of Selected Topics in Signal Processing, vol.8, no.2, pp.173-183. (Apr. 2014) [PDF] [official] [demo]
  10. Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka,
    ``Prosodic Variation Enhancement Using Unsupervised Context Labeling for HMM-based Expressive Speech Synthesis,''
    Speech Communication, vol.57, no.3, pp.144–154. (Feb. 2014) [official]
  11. Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
    ``Extension of Context Set for Generating Diverse Prosodic Variations in HMM-Based Spontaneous Conversational Speech Synthesis,''
    The IEICE Transactions on Information and Systems (Japanese Edition), vol.J95-D, no.3, pp.597-607. (Mar. 2012) (In Japanese) [PDF] [official]

International Conference

  1. Koichi Miyazaki, Masato Murata, Tomoki Koriyama,
    ``An Attribute Interpolation Method in Speech Synthesis by Model Merging,''
    Proc. Interspeech 2024, pp.xxx-xxx. (Sept. 2024) [arXiv]
  2. Tomoki Koriyama,
    ``VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features,''
    Proc. Interspeech 2024, pp.xxx-xxx. (Sept. 2024) [arXiv]
  3. Dong Yang, Tomoki Koriyama, Yuki Saito,
    ``Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech,''
    Proc. Interspeech 2024, pp.xxx-xxx. (Sept. 2024) [arXiv]
  4. Koichi Miyazaki, Masato Murata, Tomoki Koriyama,
    ``Structured State Space Decoder for Speech Recognition and Synthesis,''
    Proc. ICASSP 2023. (May 2023) [official] [arXiv]
  5. Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari,
    ``Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech,''
    Proc. ICASSP 2023. (May 2023) [official] [arXiv]
  6. Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari,
    ``UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022,''
    Proc. Interspeech 2022, pp.4521-4525. (Sept. 2022) [official]
  7. Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
    ``Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis,''
    Proc. Interspeech 2022, pp.4551-4555. (Sept. 2022) [official]
  8. Xuan Luo, Shinnosuke Takamichi, Tomoki Koriyama, Yuki Saito, Hiroshi Saruwatari,
    ``Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors,''
    Proc. APSIPA ASC. (Dec. 2021) [official]
  9. Taiki Nakamura, Tomoki Koriyama, Hiroshi Saruwatari,
    ``Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer,''
    Proc. Interspeech 2021, pp.121-125. (Aug. 2021) [official]
  10. Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
    ``Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis,''
    Proc. Interspeech 2021, pp.1614-1618. (Aug. 2021) [official]
  11. Kazuki Mizuta, Tomoki Koriyama, Hiroshi Saruwatari,
    ``Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator,''
    Proc. Interspeech 2021, pp.2192-2196. (Aug. 2021) [official]
  12. Kazuya Yufune, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari,
    ``Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder,''
    Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), pp.189-194. (Aug. 2021) [official]
  13. Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Naoko Tanji, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
    ``Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings,''
    Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), pp.211-215. (Aug. 2021) [official]
  14. Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari,
    ``Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes,''
    Proc. Interspeech 2020, pp.2032-2036. (Ocd. 2020) [official]
  15. Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
    ``Investigating Effective Additional Contextual Factors in DNN-based Spontaneous Speech Synthesis,''
    Proc. Interspeech 2020, pp.3201-3205. (Oct. 2020) [official]
  16. Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
    ``Cross-lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space,''
    Proc. Interspeech 2020, pp.2947-2951. (Ocd. 2020) [official]
  17. Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
    ``DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus,''
    Proc. 12th edition of the Language Resources and Evaluation Conference (LREC 2020), pp.6438-6443. (May 2020) [official]
  18. Tomoki Koriyama, Hiroshi Saruwatari,
    ``Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit,''
    Proc. 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020), pp.7249-7253. (May 2020) (In press) [official] [arXiv] [demo] [slide]
  19. Tomoki Koriyama, Shinnosuke Takamichi, Takao Kobayashi,
    ``Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis,''
    Proc. The 10th ISCA Speech Synthesis Workshop (SSW10), pp.149-154. (Sept. 2019) [official] [slide]
  20. Tomoki Koriyama, Takao Kobayashi,
    ``Semi-Supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model,''
    Proc. Interspeech 2019, pp.4450-4454. (Sept. 2019) [official] [slide]
  21. Tomoki Koriyama, Takao Kobayashi,
    ``A Training Method Using DNN-guided Layerwise Pretraining For Deep Gaussian Processes,''
    Proc. 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), pp.4785-4789. (May 2019) [official] [demo] [slide] [PDF (preprint, copyright©2019 IEEE)]
  22. Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
    ``Generative Moment Matching Network-based Random Modulation Post-filter For Dnn-based Singing Voice Synthesis And Neural Double-tracking,''
    Proc. 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), pp.1975-1979. (May 2019) [official]
  23. Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
    ``Enhanced F0 generation for GPR-based speech synthesis considering syllable-based prosodic features,''
    Proc. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, PID:47 (4 pages). (Dec. 2017)
  24. Nattapong Kurpukdee, Tomoki Koriyama, Takao Kobayashi, Sawit Kasuriya, Chai Wutiwiwatchai, Poonlap Lamsrichan,
    ``Speech emotion recognition using convolutional long short-term memory neural network and support vector machines,''
    Proc. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, PID:223 (6 pages). (Dec. 2017)
  25. Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
    ``Sampling-Based Speech Parameter Generation Using Moment-Matching Networks,''
    Proc. 18th Annual Conference of the International Speech Communication (INTERSPEECH 2017), pp.3961-3965. (Aug. 2017) [official]
  26. Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
    ``Duration Prediction Using Multiple Gaussian Process Experts For GPR-based Speech Synthesis,''
    Proc. 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), pp.5495-5498. (Mar. 2017) [official]
  27. Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
    ``Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis,''
    Proc. 17th Annual Conference of the International Speech Communication (INTERSPEECH 2016), pp.1517-1521. (May 2016) [official]
  28. Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
    ``Tone modeling using Gaussian process latent variable model for statistical speech synthesis,''
    Proc. Speech Prosody 2016, pp.1014-1018. (May 2016) [official]
  29. Tomoki Koriyama, Syohei Oshio, Takao Kobayashi,
    ``A Speaker Adaptation Technique For Gaussian Process Regression Based Speech Synthesis Using Feature Space Transform,''
    Proc. 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), pp.5610-5614. (Mar. 2016) [PDF] [official] [demo]
  30. Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
    ``Duration prediction using multi-level model for GPR-based speech synthesis,''
    Proc. 16th Annual Conference of the International Speech Communication (INTERSPEECH 2015), pp.1591-1595. (Sept. 2015) [official]
  31. Tomoki Koriyama, Takao Kobayashi,
    ``A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data,''
    Proc. 16th Annual Conference of the International Speech Communication (INTERSPEECH 2015), pp.3496-3500. (Sept. 2015) [official]
  32. Tomoki Koriyama, Takao Kobayashi,
    ``Prosody Generation Using Frame-based Gaussian Process Regression and Classification for Statistical Parametric Speech Synthesis,''
    Proc. ICASSP 2015, pp.4929-4933. (Apr. 2015) [PDF] [official]
  33. Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
    ``HMM-based Thai speech synthesis using unsupervised stress context labeling,''
    Proc. 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, PID:1138. (Dec. 2014)
  34. Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
    ``Parametric Speech Synthesis Using Local and Global Sparse Gaussian Processes,''
    Proc. The 24th IEEE International Workshop on Machine Learning for Signal Processing. (Sept. 2014) [PDF] [official]
  35. Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi,
    ``Accent Type and Phrase Boundary Estimation Using Acoustic and Language Models for Automatic Prosodic Labeling,''
    Proc. INTERSPEECH 2014, pp.2337-2341. (Sept. 2014) [PDF] [official]
  36. Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi,
    ``Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis,''
    Proc. INTERSPEECH 2014, pp.770-774. (Sept. 2014) [official]
  37. Decha Moungsri, Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
    ``Tone Modeling Using Stress Information for HMM-Based Thai Speech Synthesis,''
    Proc. Speech Prosody 2014, pp.1057-1061. (May 2014)
  38. Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
    ``Parametric Speech Synthesis Based on Gaussian Process Regression Using Global Variance and Hyperparameter Optimization,''
    Proc. ICASSP 2014, pp.3862-3866. (May 2014) [PDF]
  39. Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
    ``Statistical nonparametric speech synthesis using sparse Gaussian processes,''
    Proc. INTERSPEECH 2013, pp.1072-1076. (Aug. 2013) [PDF]
  40. Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi,
    ``A Style Control Technique for Singing Voice Synthesis Based on Multiple-regression HSMM,''
    Proc. INTERSPEECH 2013, pp.378-382. (Aug. 2013)
  41. Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
    ``Frame-level Acoustic Modeling Based on Gaussian Process Regression for Statistical Nonparametric Speech Synthesis,''
    Proc. ICASSP 2013, pp.8007-8010. (May 2013) [PDF]
  42. Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka,
    ``HMM-based Expressive Speech Synthesis Based on Phrase-level F0 Context Labeling,''
    Proc. ICASSP 2013, pp.7859-7863. (May 2013)
  43. Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
    ``Discontinuous Observation HMM for Prosodic-event-based F0 Generation,''
    Proc. INTERSPEECH 2012, pp.462-465. (Sept. 2012) [PDF]
  44. Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
    ``An F0 Modeling Technique Based on Prosodic Events for Spontaneous Speech Synthesis,''
    Proc. ICASSP 2012, pp.4589-4593. (May 2012) [PDF]
  45. Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
    ``On the Use of Extended Context for HMM-based Spontaneous Conversational Speech Synthesis,''
    Proc. INTERSPEECH 2011, pp.2657-2660. (Aug. 2011) [PDF]
  46. Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
    ``Conversational Spontaneous Speech Synthesis Using Average Voice Model,''
    Proc. INTERSPEECH 2010, pp.853-856. (Sept. 2010)

Review

  1. Tomoki Koriyama,
    ``An introduction of Gaussian processes and deep Gaussian processes and their applications to speech processing,''
    Acoustical Science and Technology, vol.41, no.2, pp.457-464. (2020) (Invited Review) [official]