Multimodal AI System for Real-Time Expense Analysis and Forecasting
1.Saurabh Kokare 2.Yogesh Deokar 3.Aditya Bagal 4.Prof.Richa Agarwal
Information Technology BE Student, Trinity College of Engineering and Research, Pune, India .
Guide, Trinity College of Engineering and Research, Pune, India
Emails: 1.kokaresaurabh19@gmail.com 2. adityabagal9021@gmail.com
3.yogeshdeokar0103@gmail.com 4.richaagarwal.tcoer.@kjei.edu.in
Abstract—In complex digital financial systems, expense mon- itoring has become difficult because of various information sources like receipts, SMS, digital payments, and voice trans- actions. In this paper, we suggest a new Multimodal AI-Based Expense Tracking and Forecasting System that uses Advanced OCR, NLP, Speech- Recognition, and Location-based Expense Tracking and Processing Techniques for real-time categorized expense tracking and predictive expense analytics. We designed a Hybrid Transformer-LSTM with Attention mechanism to improve accuracy which beats most traditional methods used and will be discussed in this paper. This is shown with the countless number of tests we managed to run and came up with the expected results which is having a particular classification accuracy for predictive analytics, speed of expense predictive analysis, and precision of predictive analysis.
Managing personal and enterprise financial records in the digital economy is increasingly difficult due to complexity of data sources including receipts, SMS, mobile wallets, voice notes, etc. Most expense tracking software is limited to a single type of input control and does not offer predictive analysis. This paper proposes an integrated essay that describes a Multimodal AI Based System for Real-Time Expense Assessment and Analysis and Forecasting. This system integrates disparate data inputs such as OCR scanning of receipts, SMS auto Bundling, and GPS-linked expense location tracking to and designed into one intelligent system.
Index Terms—Index Terms—Expense Tracking, Forecasting, OCR, Voice Input, SMS Sync, NLP, NLP, Multimodal AI, Transformer-LSTM