Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Google Gemini API Key (required)
GOOGLE_API_KEY=your_google_api_key_here
101 changes: 101 additions & 0 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
name: Build Docker Images

on:
push:
branches: [main, langchain]
pull_request:
branches: [main]

env:
REGISTRY: ghcr.io
BACKEND_IMAGE: ${{ github.repository }}/backend
FRONTEND_IMAGE: ${{ github.repository }}/frontend

jobs:
build-backend:
name: Build Backend
runs-on: ubuntu-latest
permissions:
contents: read
packages: write

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Log in to Container Registry
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata for Backend
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.BACKEND_IMAGE }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=sha,prefix=
type=raw,value=latest

- name: Build and push Backend image
uses: docker/build-push-action@v5
with:
context: ./apps/backend
file: ./apps/backend/Dockerfile
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

build-frontend:
name: Build Frontend
runs-on: ubuntu-latest
permissions:
contents: read
packages: write

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Log in to Container Registry
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata for Frontend
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.FRONTEND_IMAGE }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=sha,prefix=
type=raw,value=latest

- name: Build and push Frontend image
uses: docker/build-push-action@v5
with:
context: ./apps/frontend
file: ./apps/frontend/Dockerfile
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
139 changes: 66 additions & 73 deletions README.md
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,90 +1,83 @@
# Sentiment Analysis Project
# Scholar Sense

## Project Overview
This project provides a comprehensive sentiment analysis solution with a modern web interface for analyzing PDF documents. The system can determine sentiment towards specific topics in uploaded documents and extract relevant keywords and sentences.
A full-stack Retrieval-Augmented Generation (RAG) application for sentiment analysis of academic research papers.

<img src="app.png">
## Features

## Read our wiki
- [Design](https://git.ecdf.ed.ac.uk/psd2425/Rose-Campbell/sentiment-analysis/-/wikis/Design)
- [Planning](https://git.ecdf.ed.ac.uk/psd2425/Rose-Campbell/sentiment-analysis/-/wikis/Planning)
- [Implementation](https://git.ecdf.ed.ac.uk/psd2425/Rose-Campbell/sentiment-analysis/-/wikis/Implementation)
- **PDF Upload**: Upload academic research papers in PDF format
- **Sentiment Analysis**: Analyze paper sentiment towards specific keywords
- **Citation Support**: Extract and display citations with page references
- **Interactive Chat**: Query-based interface for analysis
- **Modern UI**: Next.js with Tailwind CSS

## System Architecture
The project is built as a monorepo using Turborepo to manage multiple services:
## Tech Stack

1. **Frontend**: Next.js application that provides the user interface
2. **Backend**: Flask-based API service that performs the sentiment analysis
### Backend
- **FastAPI**: High-performance Python web framework
- **LangChain**: RAG pipeline orchestration
- **ChromaDB**: Vector database for embeddings
- **Google Gemini**: LLM for analysis
- **PyMuPDF & pdfplumber**: PDF processing

### Architecture Diagram
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Frontend │ ──────► │ Backend │ ──────► │ GROBID Service │
│ (Next.js) │ │ (Flask) │ │ (PDF Parser) │
│ │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘

```

## Key Features
- Upload and analyze PDF documents
- Perform topic-focused sentiment analysis
- Extract relevant sentences and keywords
- Visual representation of sentiment results

## Getting Started

### Prerequisites
- Node.js 18.0+
- Python 3.8+
- Docker (for GROBID service)
### Frontend
- **Next.js 14**: React framework with App Router
- **Tailwind CSS**: Utility-first styling
- **TypeScript**: Type-safe development
- **Axios**: HTTP client

### Setup Steps
## Project Structure

1. **Clone the repository**
```bash
git clone https://git.ecdf.ed.ac.uk/psd2425/Rose-Campbell/sentiment-analysis.git
cd sentiment-analysis
```

2. **Start the GROBID service** (required for PDF processing)
```bash
cd apps/backend/grobid_deployment && chmod +x deploy-grobid.sh
./deploy-grobid.sh
windows-build/
├── backend/
│ ├── main.py # FastAPI application
│ ├── config.py # Configuration settings
│ ├── requirements.txt # Python dependencies
│ ├── .env # Environment variables
│ ├── routers/
│ │ ├── upload.py # PDF upload endpoints
│ │ └── chat.py # Chat/analysis endpoints
│ ├── services/
│ │ ├── pdf_processor.py # PDF extraction and chunking
│ │ └── rag_service.py # RAG pipeline and LLM
│ ├── uploads/ # Uploaded PDFs (auto-created)
│ └── chroma_db/ # Vector database (auto-created)
└── frontend/
├── app/
│ ├── layout.tsx # Root layout
│ ├── page.tsx # Main page with tabs
│ └── globals.css # Global styles
├── components/
│ ├── UploadTab.tsx # Upload interface
│ └── ChatTab.tsx # Chat interface
├── package.json
├── tsconfig.json
├── tailwind.config.js
└── next.config.js
```

3. **Set up the backend**
```bash
cd apps/backend && chmod +x start_conda.sh
./start_conda.sh
```
## Setup Instructions

4. **Set up the frontend**
```bash
cd apps/frontend
npm install
npm run dev
```
### Prerequisites
- Python 3.9+
- Node.js 18+
- Google API Key (for Gemini)

5. **Access the application**
- Frontend: http://localhost:3000
- Backend API: http://localhost:5000
- GROBID Service: http://localhost:8070
### Backend Setup

## Directory Structure
```
sentiment-analysis/
├── apps/
│ ├── frontend/ # Next.js frontend application
│ └── backend/ # Flask backend service
├── packages/ # Shared packages and utilities
├── README.md # This file
└── package.json # Root package.json for Turborepo
1. Navigate to backend directory:
```powershell
cd backend
```

## Detailed Documentation
For more detailed information about each component:
2. Create a virtual environment:
```powershell
python -m venv venv
.\venv\Scripts\Activate
```

- [Frontend Documentation](apps/frontend/README.md)
- [Backend Documentation](apps/backend/README.md)
3. Install dependencies:
```powershell
pip install -r requirements.txt
```
18 changes: 18 additions & 0 deletions apps/backend/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
venv/
__pycache__/
*.pyc
*.pyo
*.pyd
.Python
.env
.git
.gitignore
*.md
.pytest_cache/
.coverage
htmlcov/
.mypy_cache/
uploads/*
chroma_db/*
!uploads/.gitkeep
!chroma_db/.gitkeep
11 changes: 6 additions & 5 deletions apps/backend/.gitignore
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
.env.example
/uploads
/testing
/__pycache__
/logs
uploads/
chroma_db/
__pycache__/
*.pyc
venv/
.env
Loading