GlobalLearn is an educational application that summarizes educational videos into text format in multiple languages and allows users to prompt questions based on the video content.
The app uses audio and image processing techniques to extract text and generate a concise summary. Additionally, a Retrieval-Augmented Generation (RAG) model is implemented to answer user queries based on the extracted text.
- Automatic Video Summarization: Converts video content into structured text.
- Multimodal Data Extraction:
- Audio-based transcript extraction
- Image-based text extraction (when audio is unavailable)
- Multi-Language Support: Summarized text can be translated into various languages.
- Question-Answering System: Users can prompt questions and receive answers based on the video content.
- Efficient Processing: Generates summaries quickly, even in CPU runtime environments.
- Collect videos from a single subject.
- Convert video into audio and image frames.
- Extract transcripts from audio.
- Extract text from images (when audio is muted).
- Clean and structure the extracted text.
- Combine text from audio & images.
- Create a Retrieval-Augmented Generation (RAG) corpus to facilitate question answering.
- Generate a concise summary of the extracted text.
- Provide translations into multiple languages.
- Build an interface to upload videos & prompt questions.
- Deploy a server to run all models and functionalities.
- TF-IDF Similarity Score: 0.46
- Ensures key terms are used without copying transcript structure.
- Semantic Similarity Score: 0.82
- Ensures summaries retain meaning while using different sentence structures.
- Processing Time:
- For a 10-minute video, summary generation takes around 4 minutes on a CPU-based system.
✅ Works with videos that lack transcripts.
✅ Combines audio & image-based text extraction.
✅ Supports multilingual summaries.
✅ Enables question-answering based on video content.
- Python 3.x
- Required libraries (install via
requirements.txt)
git clone https://github.com/praths71018/Video_Text_Summarisation_And_Prompting.git
cd Video_Text_Summarisation_And_Prompting- Go to the backend directory and create a virtual environment:
cd backend python -m venv venv - Activate the virtual environment:
source venv/bin/activate - Install requirements:
pip install -r requirements.txt
- Start the backend server:
python backend/app.py
- Activate the frontend:
cd frontend npm install npm start - Upload a video through the web interface.
- Wait for processing (transcription, summarization, translation).
- Ask questions based on the generated summary.
- Pratham R Shetty
- Prateek M
- R Ranjive
- Anirudh Krishna
