Fusion Based Video Summarization: Integrating Transcripts and Keyframes for YouTube Content Analysis





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 6
File Size 375.07 KB
File Count 1
Create Date 18/03/2026
Last Updated 18/03/2026

Download

Description

Fusion Based Video Summarization: Integrating Transcripts and Keyframes for YouTube Content Analysis

NADENDLA HARSHA VARDHAN*1, SHAIK REENA KOWSAR2, SHAIK NEHA3, PAMUJULA PAVAN NAGA SAI4, VASIREDDY SWATHI5

1Student, Department of CSE(AIML), Bapatla Engineering College, Bapatla 522101, AP, India

2Student, Department of CSE(AIML), Bapatla Engineering College, Bapatla 522101, AP, India

3Student, Department of CSE(AIML), Bapatla Engineering College, Bapatla 522101, AP, India

4Student, Department of CSE(AIML), Bapatla Engineering College, Bapatla 522101, AP, India

5Assistant Professor, Department of CSE(AIML), Bapatla Engineering College, Bapatla 522101, AP, India.

Abstract— In addition, the fastest growth of video-sharing platforms has produced an onslaught of long-form multimedia content. Long videos are commonly used in educational and technical domains, but users often find it difficult to efficiently extract relevant information from them. Most of the existing transcript-based summarization approaches mainly make use of textual features and tend to neglect viewer engagement cues that indicate significance of video segments. This work presents a novel fusion-based multimodal YouTube video summarization pipeline leveraging transcript, engagement analysis, and generative AI insights. Our framework utilizes TextRank algorithm along with TF–IDF based similarity measure to rank sentences of transcript of the video. The user Engagement Signals like retention rates, engagement, and sentiment scores are used along with sentence rank scores to identify important sentences in the video using an engagement fusion model. Since there are multiple engagement signals, we perform dimensionality reduction using PCA to reduce computational complexity. We use generative AI models to generate summaries to benchmark against our extractive summary models. We designed a UI using Streamlit where a user can enter the URL of the YouTube video and view the summary of the video along with other details. Our results show that adding engagement aware signals help generate better summaries with more context as opposed to traditional methods that only take into consideration the transcript of the video.

Keywords— Video Summarization, TextRank, Engagement Fusion, Natural Language Processing, YouTube Analytics, Generative AI

Fusion Based Video Summarization: Integrating Transcripts and Keyframes for YouTube Content Analysis

Fusion Based Video Summarization: Integrating Transcripts and Keyframes for YouTube Content Analysis

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

Fusion Based Video Summarization: Integrating Transcripts and Keyframes for YouTube Content Analysis

Fusion Based Video Summarization: Integrating Transcripts and Keyframes for YouTube Content Analysis

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us