Publications
Journal Paper
- Xuan Luo, Shinnosuke Takamichi, Yuki Saito, Tomoki Koriyama, Hiroshi Saruwatari,
``Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence,'' APSIPA Transactions on Signal and Information Processing, vol.13, no.1.(2024) [official] - Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari,
``Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation,'' Speech Communication, vol.132, pp.132-145.(2021) [official] [code] - Shinnosuke Takamichi, Ryosuke Sonobe, Kentaro Mitsui, Yuki Saito, Tomoki Koriyama, Naoko Tanji, Hiroshi Saruwatari,
``JSUT and JVS: free Japanese voice corpora for accelerating speech synthesis research,'' Acoustical Science and Technology, vol.41, no.5, pp.761-768.(2020) [official] - Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
``Generative moment matching network-based neural double-tracking for synthesized and natural singing voices,'' IEICE Transactions on Information and Systems, vol.E103.D, pp.639-647.(2020) [official] - Tomoki Koriyama, Takao Kobayashi,
``Statistical Parametric Speech Synthesis Using Deep Gaussian Processes,'' IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.27, no.5, pp.948-959. (May 2019) [official] [demo] - Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``GPR-based Thai speech synthesis using multi-level duration prediction,'' Speech Communication, vol.99, pp.114-123. (May 2018) [official] - Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi,
``Speaker Adaptation Using Shared Context Clustering for Cross-lingual Speech Synthesis,'' The IEICE Transactions on Information and Systems (Japanese Edition), vol.J100-D, no.3, pp.385-393. (Mar. 2017) (In Japanese) [official] - Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi,
``HMM-based expressive singing voice synthesis with singing style control and robust pitch modeling,'' Computer Speech & Language, vol.34, no.1, pp.308-322. (Nov. 2015) [official] - Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Statistical Parametric Speech Synthesis Based on Gaussian Process Regression,'' IEEE Journal of Selected Topics in Signal Processing, vol.8, no.2, pp.173-183. (Apr. 2014) [PDF] [official] [demo] - Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka,
``Prosodic Variation Enhancement Using Unsupervised Context Labeling for HMM-based Expressive Speech Synthesis,'' Speech Communication, vol.57, no.3, pp.144–154. (Feb. 2014) [official] - Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Extension of Context Set for Generating Diverse Prosodic Variations in HMM-Based Spontaneous Conversational Speech Synthesis,'' The IEICE Transactions on Information and Systems (Japanese Edition), vol.J95-D, no.3, pp.597-607. (Mar. 2012) (In Japanese) [PDF] [official]
International Conference
- Koichi Miyazaki, Masato Murata, Tomoki Koriyama,
``An Attribute Interpolation Method in Speech Synthesis by Model Merging,'' Proc. Interspeech 2024, pp.xxx-xxx. (Sept. 2024) [arXiv] - Tomoki Koriyama,
``VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features,'' Proc. Interspeech 2024, pp.xxx-xxx. (Sept. 2024) [arXiv] - Dong Yang, Tomoki Koriyama, Yuki Saito,
``Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech,'' Proc. Interspeech 2024, pp.xxx-xxx. (Sept. 2024) [arXiv] - Koichi Miyazaki, Masato Murata, Tomoki Koriyama,
``Structured State Space Decoder for Speech Recognition and Synthesis,'' Proc. ICASSP 2023. (May 2023) [official] [arXiv] - Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari,
``Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech,'' Proc. ICASSP 2023. (May 2023) [official] [arXiv] - Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari,
``UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022,'' Proc. Interspeech 2022, pp.4521-4525. (Sept. 2022) [official] - Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
``Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis,'' Proc. Interspeech 2022, pp.4551-4555. (Sept. 2022) [official] - Xuan Luo, Shinnosuke Takamichi, Tomoki Koriyama, Yuki Saito, Hiroshi Saruwatari,
``Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors,'' Proc. APSIPA ASC. (Dec. 2021) [official] - Taiki Nakamura, Tomoki Koriyama, Hiroshi Saruwatari,
``Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer,'' Proc. Interspeech 2021, pp.121-125. (Aug. 2021) [official] - Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
``Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis,'' Proc. Interspeech 2021, pp.1614-1618. (Aug. 2021) [official] - Kazuki Mizuta, Tomoki Koriyama, Hiroshi Saruwatari,
``Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator,'' Proc. Interspeech 2021, pp.2192-2196. (Aug. 2021) [official] - Kazuya Yufune, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari,
``Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder,'' Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), pp.189-194. (Aug. 2021) [official] - Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Naoko Tanji, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
``Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings,'' Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), pp.211-215. (Aug. 2021) [official] - Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari,
``Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes,'' Proc. Interspeech 2020, pp.2032-2036. (Ocd. 2020) [official] - Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
``Investigating Effective Additional Contextual Factors in DNN-based Spontaneous Speech Synthesis,'' Proc. Interspeech 2020, pp.3201-3205. (Oct. 2020) [official] - Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
``Cross-lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space,'' Proc. Interspeech 2020, pp.2947-2951. (Ocd. 2020) [official] - Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari,
``DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus,'' Proc. 12th edition of the Language Resources and Evaluation Conference (LREC 2020), pp.6438-6443. (May 2020) [official] - Tomoki Koriyama, Hiroshi Saruwatari,
``Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit,'' Proc. 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020), pp.7249-7253. (May 2020) (In press) [official] [arXiv] [demo] [slide] - Tomoki Koriyama, Shinnosuke Takamichi, Takao Kobayashi,
``Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis,'' Proc. The 10th ISCA Speech Synthesis Workshop (SSW10), pp.149-154. (Sept. 2019) [official] [slide] - Tomoki Koriyama, Takao Kobayashi,
``Semi-Supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model,'' Proc. Interspeech 2019, pp.4450-4454. (Sept. 2019) [official] [slide] - Tomoki Koriyama, Takao Kobayashi,
``A Training Method Using DNN-guided Layerwise Pretraining For Deep Gaussian Processes,'' Proc. 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), pp.4785-4789. (May 2019) [official] [demo] [slide] [PDF (preprint, copyright©2019 IEEE)] - Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
``Generative Moment Matching Network-based Random Modulation Post-filter For Dnn-based Singing Voice Synthesis And Neural Double-tracking,'' Proc. 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), pp.1975-1979. (May 2019) [official] - Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``Enhanced F0 generation for GPR-based speech synthesis considering syllable-based prosodic features,'' Proc. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, PID:47 (4 pages). (Dec. 2017) - Nattapong Kurpukdee, Tomoki Koriyama, Takao Kobayashi, Sawit Kasuriya, Chai Wutiwiwatchai, Poonlap Lamsrichan,
``Speech emotion recognition using convolutional long short-term memory neural network and support vector machines,'' Proc. 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, PID:223 (6 pages). (Dec. 2017) - Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari,
``Sampling-Based Speech Parameter Generation Using Moment-Matching Networks,'' Proc. 18th Annual Conference of the International Speech Communication (INTERSPEECH 2017), pp.3961-3965. (Aug. 2017) [official] - Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``Duration Prediction Using Multiple Gaussian Process Experts For GPR-based Speech Synthesis,'' Proc. 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), pp.5495-5498. (Mar. 2017) [official] - Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis,'' Proc. 17th Annual Conference of the International Speech Communication (INTERSPEECH 2016), pp.1517-1521. (May 2016) [official] - Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``Tone modeling using Gaussian process latent variable model for statistical speech synthesis,'' Proc. Speech Prosody 2016, pp.1014-1018. (May 2016) [official] - Tomoki Koriyama, Syohei Oshio, Takao Kobayashi,
``A Speaker Adaptation Technique For Gaussian Process Regression Based Speech Synthesis Using Feature Space Transform,'' Proc. 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), pp.5610-5614. (Mar. 2016) [PDF] [official] [demo] - Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``Duration prediction using multi-level model for GPR-based speech synthesis,'' Proc. 16th Annual Conference of the International Speech Communication (INTERSPEECH 2015), pp.1591-1595. (Sept. 2015) [official] - Tomoki Koriyama, Takao Kobayashi,
``A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data,'' Proc. 16th Annual Conference of the International Speech Communication (INTERSPEECH 2015), pp.3496-3500. (Sept. 2015) [official] - Tomoki Koriyama, Takao Kobayashi,
``Prosody Generation Using Frame-based Gaussian Process Regression and Classification for Statistical Parametric Speech Synthesis,'' Proc. ICASSP 2015, pp.4929-4933. (Apr. 2015) [PDF] [official] - Decha Moungsri, Tomoki Koriyama, Takao Kobayashi,
``HMM-based Thai speech synthesis using unsupervised stress context labeling,'' Proc. 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, PID:1138. (Dec. 2014) - Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Parametric Speech Synthesis Using Local and Global Sparse Gaussian Processes,'' Proc. The 24th IEEE International Workshop on Machine Learning for Signal Processing. (Sept. 2014) [PDF] [official] - Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi,
``Accent Type and Phrase Boundary Estimation Using Acoustic and Language Models for Automatic Prosodic Labeling,'' Proc. INTERSPEECH 2014, pp.2337-2341. (Sept. 2014) [PDF] [official] - Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi,
``Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis,'' Proc. INTERSPEECH 2014, pp.770-774. (Sept. 2014) [official] - Decha Moungsri, Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Tone Modeling Using Stress Information for HMM-Based Thai Speech Synthesis,'' Proc. Speech Prosody 2014, pp.1057-1061. (May 2014) - Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Parametric Speech Synthesis Based on Gaussian Process Regression Using Global Variance and Hyperparameter Optimization,'' Proc. ICASSP 2014, pp.3862-3866. (May 2014) [PDF] - Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Statistical nonparametric speech synthesis using sparse Gaussian processes,'' Proc. INTERSPEECH 2013, pp.1072-1076. (Aug. 2013) [PDF] - Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi,
``A Style Control Technique for Singing Voice Synthesis Based on Multiple-regression HSMM,'' Proc. INTERSPEECH 2013, pp.378-382. (Aug. 2013) - Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Frame-level Acoustic Modeling Based on Gaussian Process Regression for Statistical Nonparametric Speech Synthesis,'' Proc. ICASSP 2013, pp.8007-8010. (May 2013) [PDF] - Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka,
``HMM-based Expressive Speech Synthesis Based on Phrase-level F0 Context Labeling,'' Proc. ICASSP 2013, pp.7859-7863. (May 2013) - Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Discontinuous Observation HMM for Prosodic-event-based F0 Generation,'' Proc. INTERSPEECH 2012, pp.462-465. (Sept. 2012) [PDF] - Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``An F0 Modeling Technique Based on Prosodic Events for Spontaneous Speech Synthesis,'' Proc. ICASSP 2012, pp.4589-4593. (May 2012) [PDF] - Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``On the Use of Extended Context for HMM-based Spontaneous Conversational Speech Synthesis,'' Proc. INTERSPEECH 2011, pp.2657-2660. (Aug. 2011) [PDF] - Tomoki Koriyama, Takashi Nose, Takao Kobayashi,
``Conversational Spontaneous Speech Synthesis Using Average Voice Model,'' Proc. INTERSPEECH 2010, pp.853-856. (Sept. 2010)
Review
- Tomoki Koriyama,
``An introduction of Gaussian processes and deep Gaussian processes and their applications to speech processing,'' Acoustical Science and Technology, vol.41, no.2, pp.457-464. (2020) (Invited Review) [official]
Copyright (C) 2013 Tomoki Koriyama All Rights Reserved. design by tempnate