Publications

Journal Paper

Xuan Luo, Shinnosuke Takamichi, Yuki Saito, Tomoki Koriyama, Hiroshi Saruwatari,
``Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence,''
APSIPA Transactions on Signal and Information Processing, vol.13, no.1.(2024) [official]
Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari,
``Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation,''
Speech Communication, vol.132, pp.132-145.(2021) [official] [code]
Shinnosuke Takamichi, Ryosuke Sonobe, Kentaro Mitsui, Yuki Saito, Tomoki Koriyama, Naoko Tanji, Hiroshi Saruwatari,
``JSUT and JVS: free Japanese voice corpora for accelerating speech synthesis research,''
Acoustical Science and Technology, vol.41, no.5, pp.761-768.(2020) [official]
Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
``Generative moment matching network-based neural double-tracking for synthesized and natural singing voices,''
IEICE Transactions on Information and Systems, vol.E103.D, pp.639-647.(2020) [official]
Tomoki Koriyama, Takao Kobayashi,
``Statistical Parametric Speech Synthesis Using Deep Gaussian Processes,''
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.27, no.5, pp.948-959. (May 2019) [official] [demo]
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``GPR-based Thai speech synthesis using multi-level duration prediction,''
Speech Communication, vol.99, pp.114-123. (May 2018) [official]
Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi,
``Speaker Adaptation Using Shared Context Clustering for Cross-lingual Speech Synthesis,''
The IEICE Transactions on Information and Systems (Japanese Edition), vol.J100-D, no.3, pp.385-393. (Mar. 2017) (In Japanese) [official]
Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi,
``HMM-based expressive singing voice synthesis with singing style control and robust pitch modeling,''
Computer Speech & Language, vol.34, no.1, pp.308-322. (Nov. 2015) [official]
Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Statistical Parametric Speech Synthesis Based on Gaussian Process Regression,''
IEEE Journal of Selected Topics in Signal Processing, vol.8, no.2, pp.173-183. (Apr. 2014) [PDF] [official] [demo]
Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka,
``Prosodic Variation Enhancement Using Unsupervised Context Labeling for HMM-based Expressive Speech Synthesis,''
Speech Communication, vol.57, no.3, pp.144–154. (Feb. 2014) [official]
Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Extension of Context Set for Generating Diverse Prosodic Variations in HMM-Based Spontaneous Conversational Speech Synthesis,''
The IEICE Transactions on Information and Systems (Japanese Edition), vol.J95-D, no.3, pp.597-607. (Mar. 2012) (In Japanese) [PDF] [official]

International Conference

Tomoki Koriyama,
``Prosody Labeling with Phoneme-BERT and Speech Foundation Models,''
Proc. 13th edition of the Speech Synthesis Workshop, pp.40-47. (Aug. 2025) [official] [arXiv]
Masato Murata, Koichi Miyazaki, Tomoki Koriyama, Tomoki Toda,
``Eigenvoice Synthesis based on Model Editing for Speaker Generation,''
Proc. Interspeech 2025, pp.5523-5527. (Aug. 2025) [official] [arXiv]
Masato Murata, Koichi Miyazaki, Tomoki Koriyama,
``Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control,''
Proc. Interspeech 2025, pp.4383-4387. (Aug. 2025) [official] [arXiv]
Masato Murata, Koichi Miyazaki, Tomoki Koriyama,
``An Attribute Interpolation Method in Speech Synthesis by Model Merging,''
Proc. Interspeech 2024, pp.3380-3384. (Sept. 2024) [official] [arXiv]
Tomoki Koriyama,
``VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features,''
Proc. Interspeech 2024, pp.3814-3818. (Sept. 2024) [official] [arXiv]
Dong Yang, Tomoki Koriyama, Yuki Saito,
``Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech,''
Proc. Interspeech 2024, pp.4928-4932. (Sept. 2024) [official] [arXiv]
Koichi Miyazaki, Masato Murata, Tomoki Koriyama,
``Structured State Space Decoder for Speech Recognition and Synthesis,''
Proc. ICASSP 2023. (May 2023) [official] [arXiv]
Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari,
``Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech,''
Proc. ICASSP 2023. (May 2023) [official] [arXiv]
Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari,
``UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022,''
Proc. Interspeech 2022, pp.4521-4525. (Sept. 2022) [official]
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
``Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis,''
Proc. Interspeech 2022, pp.4551-4555. (Sept. 2022) [official]
Xuan Luo, Shinnosuke Takamichi, Tomoki Koriyama, Yuki Saito, Hiroshi Saruwatari,
``Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors,''
Proc. APSIPA ASC. (Dec. 2021) [official]
Taiki Nakamura, Tomoki Koriyama, Hiroshi Saruwatari,
``Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer,''
Proc. Interspeech 2021, pp.121-125. (Aug. 2021) [official]
Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
``Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis,''
Proc. Interspeech 2021, pp.1614-1618. (Aug. 2021) [official]
Kazuki Mizuta, Tomoki Koriyama, Hiroshi Saruwatari,
``Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator,''
Proc. Interspeech 2021, pp.2192-2196. (Aug. 2021) [official]
Kazuya Yufune, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari,
``Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder,''
Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), pp.189-194. (Aug. 2021) [official]
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Naoko Tanji, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
``Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings,''
Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), pp.211-215. (Aug. 2021) [official]
Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari,
``Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes,''
Proc. Interspeech 2020, pp.2032-2036. (Ocd. 2020) [official]
Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
``Investigating Effective Additional Contextual Factors in DNN-based Spontaneous Speech Synthesis,''
Proc. Interspeech 2020, pp.3201-3205. (Oct. 2020) [official]
Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
``Cross-lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space,''
Proc. Interspeech 2020, pp.2947-2951. (Ocd. 2020) [official]
Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
``DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus,''
Proc. 12th edition of the Language Resources and Evaluation Conference (LREC 2020), pp.6438-6443. (May 2020) [official]
Tomoki Koriyama, Hiroshi Saruwatari,
``Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit,''
Proc. 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020), pp.7249-7253. (May 2020) (In press) [official] [arXiv] [demo] [slide]
Tomoki Koriyama, Shinnosuke Takamichi, Takao Kobayashi,
``Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis,''
Proc. The 10th ISCA Speech Synthesis Workshop (SSW10), pp.149-154. (Sept. 2019) [official] [slide]
Tomoki Koriyama, Takao Kobayashi,
``Semi-Supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model,''
Proc. Interspeech 2019, pp.4450-4454. (Sept. 2019) [official] [slide]
Tomoki Koriyama, Takao Kobayashi,
``A Training Method Using DNN-guided Layerwise Pretraining For Deep Gaussian Processes,''
Proc. 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), pp.4785-4789. (May 2019) [official] [demo] [slide] [PDF (preprint, copyright©2019 IEEE)]
Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
``Generative Moment Matching Network-based Random Modulation Post-filter For Dnn-based Singing Voice Synthesis And Neural Double-tracking,''
Proc. 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), pp.1975-1979. (May 2019) [official]
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``Enhanced F0 generation for GPR-based speech synthesis considering syllable-based prosodic features,''
Proc. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, PID:47 (4 pages). (Dec. 2017)
Nattapong Kurpukdee, Tomoki Koriyama, Takao Kobayashi, Sawit Kasuriya, Chai Wutiwiwatchai, Poonlap Lamsrichan,
``Speech emotion recognition using convolutional long short-term memory neural network and support vector machines,''
Proc. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, PID:223 (6 pages). (Dec. 2017)
Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
``Sampling-Based Speech Parameter Generation Using Moment-Matching Networks,''
Proc. 18th Annual Conference of the International Speech Communication (INTERSPEECH 2017), pp.3961-3965. (Aug. 2017) [official]
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``Duration Prediction Using Multiple Gaussian Process Experts For GPR-based Speech Synthesis,''
Proc. 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), pp.5495-5498. (Mar. 2017) [official]
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis,''
Proc. 17th Annual Conference of the International Speech Communication (INTERSPEECH 2016), pp.1517-1521. (May 2016) [official]
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``Tone modeling using Gaussian process latent variable model for statistical speech synthesis,''
Proc. Speech Prosody 2016, pp.1014-1018. (May 2016) [official]
Tomoki Koriyama, Syohei Oshio, Takao Kobayashi,
``A Speaker Adaptation Technique For Gaussian Process Regression Based Speech Synthesis Using Feature Space Transform,''
Proc. 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), pp.5610-5614. (Mar. 2016) [PDF] [official] [demo]
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``Duration prediction using multi-level model for GPR-based speech synthesis,''
Proc. 16th Annual Conference of the International Speech Communication (INTERSPEECH 2015), pp.1591-1595. (Sept. 2015) [official]
Tomoki Koriyama, Takao Kobayashi,
``A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data,''
Proc. 16th Annual Conference of the International Speech Communication (INTERSPEECH 2015), pp.3496-3500. (Sept. 2015) [official]
Tomoki Koriyama, Takao Kobayashi,
``Prosody Generation Using Frame-based Gaussian Process Regression and Classification for Statistical Parametric Speech Synthesis,''
Proc. ICASSP 2015, pp.4929-4933. (Apr. 2015) [PDF] [official]
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``HMM-based Thai speech synthesis using unsupervised stress context labeling,''
Proc. 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, PID:1138. (Dec. 2014)
Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Parametric Speech Synthesis Using Local and Global Sparse Gaussian Processes,''
Proc. The 24th IEEE International Workshop on Machine Learning for Signal Processing. (Sept. 2014) [PDF] [official]
Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi,
``Accent Type and Phrase Boundary Estimation Using Acoustic and Language Models for Automatic Prosodic Labeling,''
Proc. INTERSPEECH 2014, pp.2337-2341. (Sept. 2014) [PDF] [official]
Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi,
``Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis,''
Proc. INTERSPEECH 2014, pp.770-774. (Sept. 2014) [official]
Decha Moungsri, Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Tone Modeling Using Stress Information for HMM-Based Thai Speech Synthesis,''
Proc. Speech Prosody 2014, pp.1057-1061. (May 2014)
Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Parametric Speech Synthesis Based on Gaussian Process Regression Using Global Variance and Hyperparameter Optimization,''
Proc. ICASSP 2014, pp.3862-3866. (May 2014) [PDF]
Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Statistical nonparametric speech synthesis using sparse Gaussian processes,''
Proc. INTERSPEECH 2013, pp.1072-1076. (Aug. 2013) [PDF]
Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi,
``A Style Control Technique for Singing Voice Synthesis Based on Multiple-regression HSMM,''
Proc. INTERSPEECH 2013, pp.378-382. (Aug. 2013)
Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Frame-level Acoustic Modeling Based on Gaussian Process Regression for Statistical Nonparametric Speech Synthesis,''
Proc. ICASSP 2013, pp.8007-8010. (May 2013) [PDF]
Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka,
``HMM-based Expressive Speech Synthesis Based on Phrase-level F0 Context Labeling,''
Proc. ICASSP 2013, pp.7859-7863. (May 2013)
Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Discontinuous Observation HMM for Prosodic-event-based F0 Generation,''
Proc. INTERSPEECH 2012, pp.462-465. (Sept. 2012) [PDF]
Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``An F0 Modeling Technique Based on Prosodic Events for Spontaneous Speech Synthesis,''
Proc. ICASSP 2012, pp.4589-4593. (May 2012) [PDF]
Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``On the Use of Extended Context for HMM-based Spontaneous Conversational Speech Synthesis,''
Proc. INTERSPEECH 2011, pp.2657-2660. (Aug. 2011) [PDF]
Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Conversational Spontaneous Speech Synthesis Using Average Voice Model,''
Proc. INTERSPEECH 2010, pp.853-856. (Sept. 2010)

Review

Tomoki Koriyama,
``An introduction of Gaussian processes and deep Gaussian processes and their applications to speech processing,''
Acoustical Science and Technology, vol.41, no.2, pp.457-464. (2020) (Invited Review) [official]