상세 보기
- Park, Hyejin;
- Lee, Jiyoon;
- Oh, Hayoung
WEB OF SCIENCE
0SCOPUS
0초록
Table data are essential in multiple fields, particularly the financial domain, for tasks such as financial statement analysis. While large language models (LLMs) have advanced text-based research, they struggle with image-format data. Recent multimodal Large Language Models (multimodal LLMs) have demonstrated an ability to process text and images, but their table image processing performance remains limited and lacks domain specificity. Here, we introduce FinTab-LLaVA, a multimodal LLM designed for effective financial table processing. FinTab-LLaVA is instruction-tuned on FinTMD, a financial domain-specific dataset comprising table images and textual data, supporting tasks such as finance table question answering (FTQA), finance table fact verification (FTFV), and finance table description (FTD). Domain knowledge training can enhance its mathematical reasoning and financial expertise. By adopting a curriculum learning approach, FinTab-LLaVA extends Table-LLaVA to handle the unique requirements of financial table data. Experiment results show that FinTab-LLaVA outperforms existing models in financial table-based tasks and demonstrates strong generalization capabilities. Our findings emphasize the potential of domain-specific multimodal LLMs in processing financial data and significantly expands LLM applications in the financial sector. The code and data are available at https://github.com/Emilia0608/FinTab-LLaVA. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
키워드
- 제목
- FinTab-LLaVA: Finance Domain-Specific Table Understanding Multimodal LLM Using FinTMD
- 저자
- Park, Hyejin; Lee, Jiyoon; Oh, Hayoung
- 발행일
- 2025-06
- 유형
- Proceedings Paper
- 권
- 15874 LNCS
- 페이지
- 235 ~ 246