Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 66 additions & 73 deletions README.md
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,90 +1,83 @@
# Sentiment Analysis Project
# Scholar Sense

## Project Overview
This project provides a comprehensive sentiment analysis solution with a modern web interface for analyzing PDF documents. The system can determine sentiment towards specific topics in uploaded documents and extract relevant keywords and sentences.
A full-stack Retrieval-Augmented Generation (RAG) application for sentiment analysis of academic research papers.

<img src="app.png">
## Features

## Read our wiki
- [Design](https://git.ecdf.ed.ac.uk/psd2425/Rose-Campbell/sentiment-analysis/-/wikis/Design)
- [Planning](https://git.ecdf.ed.ac.uk/psd2425/Rose-Campbell/sentiment-analysis/-/wikis/Planning)
- [Implementation](https://git.ecdf.ed.ac.uk/psd2425/Rose-Campbell/sentiment-analysis/-/wikis/Implementation)
- **PDF Upload**: Upload academic research papers in PDF format
- **Sentiment Analysis**: Analyze paper sentiment towards specific keywords
- **Citation Support**: Extract and display citations with page references
- **Interactive Chat**: Query-based interface for analysis
- **Modern UI**: Next.js with Tailwind CSS

## System Architecture
The project is built as a monorepo using Turborepo to manage multiple services:
## Tech Stack

1. **Frontend**: Next.js application that provides the user interface
2. **Backend**: Flask-based API service that performs the sentiment analysis
### Backend
- **FastAPI**: High-performance Python web framework
- **LangChain**: RAG pipeline orchestration
- **ChromaDB**: Vector database for embeddings
- **Google Gemini**: LLM for analysis
- **PyMuPDF & pdfplumber**: PDF processing

### Architecture Diagram
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Frontend │ ──────► │ Backend │ ──────► │ GROBID Service │
│ (Next.js) │ │ (Flask) │ │ (PDF Parser) │
│ │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘

```

## Key Features
- Upload and analyze PDF documents
- Perform topic-focused sentiment analysis
- Extract relevant sentences and keywords
- Visual representation of sentiment results

## Getting Started

### Prerequisites
- Node.js 18.0+
- Python 3.8+
- Docker (for GROBID service)
### Frontend
- **Next.js 14**: React framework with App Router
- **Tailwind CSS**: Utility-first styling
- **TypeScript**: Type-safe development
- **Axios**: HTTP client

### Setup Steps
## Project Structure

1. **Clone the repository**
```bash
git clone https://git.ecdf.ed.ac.uk/psd2425/Rose-Campbell/sentiment-analysis.git
cd sentiment-analysis
```

2. **Start the GROBID service** (required for PDF processing)
```bash
cd apps/backend/grobid_deployment && chmod +x deploy-grobid.sh
./deploy-grobid.sh
windows-build/
├── backend/
│ ├── main.py # FastAPI application
│ ├── config.py # Configuration settings
│ ├── requirements.txt # Python dependencies
│ ├── .env # Environment variables
│ ├── routers/
│ │ ├── upload.py # PDF upload endpoints
│ │ └── chat.py # Chat/analysis endpoints
│ ├── services/
│ │ ├── pdf_processor.py # PDF extraction and chunking
│ │ └── rag_service.py # RAG pipeline and LLM
│ ├── uploads/ # Uploaded PDFs (auto-created)
│ └── chroma_db/ # Vector database (auto-created)
└── frontend/
├── app/
│ ├── layout.tsx # Root layout
│ ├── page.tsx # Main page with tabs
│ └── globals.css # Global styles
├── components/
│ ├── UploadTab.tsx # Upload interface
│ └── ChatTab.tsx # Chat interface
├── package.json
├── tsconfig.json
├── tailwind.config.js
└── next.config.js
```

3. **Set up the backend**
```bash
cd apps/backend && chmod +x start_conda.sh
./start_conda.sh
```
## Setup Instructions

4. **Set up the frontend**
```bash
cd apps/frontend
npm install
npm run dev
```
### Prerequisites
- Python 3.9+
- Node.js 18+
- Google API Key (for Gemini)

5. **Access the application**
- Frontend: http://localhost:3000
- Backend API: http://localhost:5000
- GROBID Service: http://localhost:8070
### Backend Setup

## Directory Structure
```
sentiment-analysis/
├── apps/
│ ├── frontend/ # Next.js frontend application
│ └── backend/ # Flask backend service
├── packages/ # Shared packages and utilities
├── README.md # This file
└── package.json # Root package.json for Turborepo
1. Navigate to backend directory:
```powershell
cd backend
```

## Detailed Documentation
For more detailed information about each component:
2. Create a virtual environment:
```powershell
python -m venv venv
.\venv\Scripts\Activate
```

- [Frontend Documentation](apps/frontend/README.md)
- [Backend Documentation](apps/backend/README.md)
3. Install dependencies:
```powershell
pip install -r requirements.txt
```
18 changes: 18 additions & 0 deletions apps/backend/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
venv/
__pycache__/
*.pyc
*.pyo
*.pyd
.Python
.env
.git
.gitignore
*.md
.pytest_cache/
.coverage
htmlcov/
.mypy_cache/
uploads/*
chroma_db/*
!uploads/.gitkeep
!chroma_db/.gitkeep
11 changes: 6 additions & 5 deletions apps/backend/.gitignore
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
.env.example
/uploads
/testing
/__pycache__
/logs
uploads/
chroma_db/
__pycache__/
*.pyc
venv/
.env
Loading