FinTab-LLaVA: Finance Domain-Specific Table Understanding Multimodal LLM Using FinTMD
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Table data are essential in multiple fields, particularly the financial domain, for tasks such as financial statement analysis. While large language models (LLMs) have advanced text-based research, they struggle with image-format data. Recent multimodal Large Language Models (multimodal LLMs) have demonstrated an ability to process text and images, but their table image processing performance remains limited and lacks domain specificity. Here, we introduce FinTab-LLaVA, a multimodal LLM designed for effective financial table processing. FinTab-LLaVA is instruction-tuned on FinTMD, a financial domain-specific dataset comprising table images and textual data, supporting tasks such as finance table question answering (FTQA), finance table fact verification (FTFV), and finance table description (FTD). Domain knowledge training can enhance its mathematical reasoning and financial expertise. By adopting a curriculum learning approach, FinTab-LLaVA extends Table-LLaVA to handle the unique requirements of financial table data. Experiment results show that FinTab-LLaVA outperforms existing models in financial table-based tasks and demonstrates strong generalization capabilities. Our findings emphasize the potential of domain-specific multimodal LLMs in processing financial data and significantly expands LLM applications in the financial sector. The code and data are available at https://github.com/Emilia0608/FinTab-LLaVA. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

키워드

Domain Knowledge TrainingFinance DomainFinTMDMultimodal LLMTable Understanding
제목
FinTab-LLaVA: Finance Domain-Specific Table Understanding Multimodal LLM Using FinTMD
저자
Park, HyejinLee, JiyoonOh, Hayoung
DOI
10.1007/978-981-96-8186-0_19
발행일
2025-06
유형
Proceedings Paper
저널명
Lecture Notes in Computer Science
15874 LNCS
페이지
235 ~ 246