Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

FVTTS: Face Based Voice Synthesis for Text-to-Speech

Full metadata record
DC Field Value Language
dc.contributor.authorLee, Minyoung-
dc.contributor.authorPark, Eunil-
dc.contributor.authorHong, Sungeun-
dc.date.accessioned2025-01-21T01:00:20Z-
dc.date.available2025-01-21T01:00:20Z-
dc.date.issued2024-
dc.identifier.issn2308-457X-
dc.identifier.urihttps://scholarx.skku.edu/handle/2021.sw.skku/119906-
dc.description.abstractA face is expressive of individual identity and used in various studies such as identification, authentication, and personalization. Similarly, a voice is a means of expressing individuals, and personalized voice synthesis based on voice reference is active. However, the voice-based method confronts voice sample dependency limitations. We propose Face-based Voice synthesis for Text-To-Speech (FVTTS) to synthesize voice from face images that are more expressive of personal identity than voice samples. A major challenge in face-based TTS methods is extracting distinct voice features highly related to voice from the face image. Our face encoder is designed to tackle this by integrating global facial attributes with voice-related features to represent personalized characteristics. FVTTS has shown superiority in various metrics and adaptability across different data domains. We establish a new standard in face-based TTS, leading the way in personalized voice synthesis. © 2024 International Speech Communication Association. All rights reserved.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.publisherInternational Speech Communication Association-
dc.titleFVTTS: Face Based Voice Synthesis for Text-to-Speech-
dc.typeArticle-
dc.identifier.doi10.21437/Interspeech.2024-140-
dc.identifier.scopusid2-s2.0-85214828819-
dc.identifier.wosid001331850105013-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 4953 - 4957-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.startPage4953-
dc.citation.endPage4957-
dc.type.docTypeProceedings Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.subject.keywordAuthorend-to-end TTS-
dc.subject.keywordAuthorface to speech-
dc.subject.keywordAuthorface voice conversion-
dc.subject.keywordAuthorface-based TTS-
Files in This Item
There are no files associated with this item.
Appears in
Collections
Computing and Informatics > Convergence > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher HONG, SUNGEUN photo

HONG, SUNGEUN
Computing and Informatics (Convergence)
Read more

Altmetrics

Total Views & Downloads

BROWSE