FVTTS: Face Based Voice Synthesis for Text-to-Speech
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Minyoung | - |
dc.contributor.author | Park, Eunil | - |
dc.contributor.author | Hong, Sungeun | - |
dc.date.accessioned | 2025-01-21T01:00:20Z | - |
dc.date.available | 2025-01-21T01:00:20Z | - |
dc.date.issued | 2024 | - |
dc.identifier.issn | 2308-457X | - |
dc.identifier.uri | https://scholarx.skku.edu/handle/2021.sw.skku/119906 | - |
dc.description.abstract | A face is expressive of individual identity and used in various studies such as identification, authentication, and personalization. Similarly, a voice is a means of expressing individuals, and personalized voice synthesis based on voice reference is active. However, the voice-based method confronts voice sample dependency limitations. We propose Face-based Voice synthesis for Text-To-Speech (FVTTS) to synthesize voice from face images that are more expressive of personal identity than voice samples. A major challenge in face-based TTS methods is extracting distinct voice features highly related to voice from the face image. Our face encoder is designed to tackle this by integrating global facial attributes with voice-related features to represent personalized characteristics. FVTTS has shown superiority in various metrics and adaptability across different data domains. We establish a new standard in face-based TTS, leading the way in personalized voice synthesis. © 2024 International Speech Communication Association. All rights reserved. | - |
dc.format.extent | 5 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | International Speech Communication Association | - |
dc.title | FVTTS: Face Based Voice Synthesis for Text-to-Speech | - |
dc.type | Article | - |
dc.identifier.doi | 10.21437/Interspeech.2024-140 | - |
dc.identifier.scopusid | 2-s2.0-85214828819 | - |
dc.identifier.wosid | 001331850105013 | - |
dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 4953 - 4957 | - |
dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.startPage | 4953 | - |
dc.citation.endPage | 4957 | - |
dc.type.docType | Proceedings Paper | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.subject.keywordAuthor | end-to-end TTS | - |
dc.subject.keywordAuthor | face to speech | - |
dc.subject.keywordAuthor | face voice conversion | - |
dc.subject.keywordAuthor | face-based TTS | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(03063) 25-2, SUNGKYUNKWAN-RO, JONGNO-GU, SEOUL, KOREA samsunglib@skku.edu
COPYRIGHT © 2021 SUNGKYUNKWAN UNIVERSITY ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.