Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ and this project adheres to

### Changed

- ✨(backend) improve indexing command
- checkpoint recovery
- asynchronicity
- admin command trigger
- 💄(frontend) improve comments highlights #1961

## [v4.8.3] - 2026-03-23
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ demo: ## flush db then create a demo for load testing purpose
.PHONY: demo

index: ## index all documents to remote search
@$(MANAGE) index
@$(MANAGE) index $(args)
.PHONY: index

# Nota bene: Black should come after isort just in case they don't agree...
Expand Down
63 changes: 63 additions & 0 deletions docs/commands/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Index Command

The `index` management command is used to index documents to the remote search indexer.

## Usage

### Make Command

```bash
# Basic usage with defaults
make index

# With custom parameters
make index args="--batch-size 100 --lower-time-bound 2024-01-01T00:00:00 --upper-time-bound 2026-01-01T00:00:00"

```

### Command line

```bash
python manage.py index \
--lower-time-bound "2024-01-01T00:00:00" \
--upper-time-bound "2024-01-31T23:59:59" \
--batch-size 200 \
--async
```
Comment thread
mascarpon3 marked this conversation as resolved.

### Django Admin

The command is available in the Django admin interface:

1. Go to `/admin/run-indexing/`, you arrive at the "Run Indexing Command" page
2. Fill in the form with the desired parameters
3. Click **"Run Indexing Command"**

## Parameters

### `--batch-size`
- **type:** Integer
- **default:** `settings.SEARCH_INDEXER_BATCH_SIZE`
- **description:** Number of documents to process per batch. Higher values may improve performance but use more memory.

### `--lower-time-bound`
- **optional**: true
- **type:** ISO 8601 datetime string
- **default:** `None`
- **description:** Only documents updated after this date will be indexed.

### `--upper-time-bound`
- **optional**: true
- **type:** ISO 8601 datetime string
- **default:** `None`
- **description:** Only documents updated before this date will be indexed.

### `--async`
- **type:** Boolean flag
- **default:** `False`
+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stray + diff marker leftover in rendered documentation.

The line literally starts with +- **description:** — a leftover from a previously applied suggested diff that was pasted verbatim. Markdown will render this with a leading +.

📝 Proposed fix
-+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
+- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
- **description:** When set, dispatches the indexing job to a Celery worker instead of running it synchronously.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/commands/index.md` at line 58, The documentation line contains a
leftover diff marker ("+- **description:** When set, dispatches the indexing job
to a Celery worker..."); locate that exact string in docs/commands/index.md and
remove the leading '+' so the line becomes a normal Markdown bullet (e.g., "-
**description:** When set, ...") or simply "**description:** ..." depending on
surrounding list style, ensuring no stray diff markers remain.


## Crash Safe Mode

The command saves the `updated_at` of the last document of each successful batch into the `bulk-indexer-checkpoint` cache variable.
If the process crashes, this value can be used as `lower-time-bound` to resume from the last successfully indexed document.
77 changes: 76 additions & 1 deletion src/backend/core/admin.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,54 @@
"""Admin classes and registrations for core app."""

from django.contrib import admin, messages
from django.contrib.admin.views.decorators import staff_member_required
from django.contrib.auth import admin as auth_admin
from django.shortcuts import redirect
from django.core.management import call_command
from django.http import HttpRequest
from django.shortcuts import redirect, render
from django.utils.translation import gettext_lazy as _

from treebeard.admin import TreeAdmin

from core import models
from core.forms import RunIndexingForm
from core.tasks.user_reconciliation import user_reconciliation_csv_import_job

# Customize the default admin site's get_app_list method
_original_get_app_list = admin.site.get_app_list


def custom_get_app_list(_self, request, app_label=None):
"""Add custom commands to the app list."""
app_list = _original_get_app_list(request, app_label)

# Add Commands app with Run Indexing command
commands_app = {
"name": _("Commands"),
"app_label": "commands",
"app_url": "#",
"has_module_perms": True,
"models": [
{
"name": _("Run indexing"),
"object_name": "RunIndexing",
"admin_url": "/admin/run-indexing/",
"view_only": False,
"add_url": None,
"change_url": None,
}
],
}
Comment on lines +26 to +41
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

"Commands" entry is always shown, regardless of staff status.

custom_get_app_list unconditionally appends the commands_app with "has_module_perms": True. Any user reaching the admin index page will see the "Run indexing" link, even if they are not staff (the view itself is now correctly staff-gated, so clicking it will redirect to login — but the entry is still visible). Consider gating visibility on request.user.is_staff (or a more specific permission) so non-staff users don't see a link they can't use.

🔒 Suggested fix
 def custom_get_app_list(_self, request, app_label=None):
     """Add custom commands to the app list."""
     app_list = _original_get_app_list(request, app_label)

+    if not request.user.is_staff:
+        return app_list
+
     # Add Commands app with Run Indexing command
     commands_app = { ... }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/admin.py` around lines 26 - 41, The Commands entry is always
visible because custom_get_app_list unconditionally appends the commands_app
with has_module_perms=True; update custom_get_app_list to only include or set
has_module_perms for the commands_app when the current request.user has staff
privileges (e.g., request.user.is_staff) or the specific permission you want to
gate; locate the commands_app dict and the custom_get_app_list function and
either skip appending commands_app for non-staff users or set its
"has_module_perms" to False when request.user.is_staff is False so the
"RunIndexing" link is hidden from non-staff.


app_list.append(commands_app)
return app_list


# Monkey-patch the admin site
admin.site.get_app_list = lambda request, app_label=None: custom_get_app_list(
admin.site, request, app_label
)
Comment on lines +17 to +50
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Monkey-patching admin.site.get_app_list is fragile; prefer a AdminSite subclass or clean override.

Directly reassigning admin.site.get_app_list to a lambda:

  • Is not composable: if another module (now or in the future) does the same thing, only the last patch wins, and the order becomes import-order-dependent.
  • Makes the override hard to discover and to test in isolation.
  • Captures _original_get_app_list at import time; any later rebinding by another app is invisible to this wrapper.

Cleaner alternatives:

  1. Subclass AdminSite, override get_app_list, and set default_site via AppConfig.default_site.
  2. Or add a proper ModelAdmin/entry for RunIndexing (e.g. a tiny proxy model or a dedicated admin view registered via admin.site.get_urls) so the "Commands" entry comes through the normal app-list machinery.

Also, custom_get_app_list(_self, request, app_label=None) never uses _self; you can drop it and the lambda’s admin.site argument:

♻️ Suggested simplification
-def custom_get_app_list(_self, request, app_label=None):
+def custom_get_app_list(request, app_label=None):
     """Add custom commands to the app list."""
     app_list = _original_get_app_list(request, app_label)
     ...
     app_list.append(commands_app)
     return app_list


-# Monkey-patch the admin site
-admin.site.get_app_list = lambda request, app_label=None: custom_get_app_list(
-    admin.site, request, app_label
-)
+admin.site.get_app_list = custom_get_app_list
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/admin.py` around lines 17 - 50, Replace the fragile
monkey-patch of admin.site.get_app_list by creating a proper AdminSite subclass
that overrides get_app_list (move custom_get_app_list logic into that override)
and register it as the default site via AppConfig.default_site (or instantiate
and use your subclass when configuring the app), or alternatively implement a
normal admin entry for the RunIndexing command by registering a ModelAdmin/proxy
model or adding a custom admin view via admin.site.get_urls so the "Commands"
app appears through Django's standard app-list machinery; remove the lambda
rebind of admin.site.get_app_list and the unused _self parameter in
custom_get_app_list once migrated.



@admin.register(models.User)
class UserAdmin(auth_admin.UserAdmin):
Expand Down Expand Up @@ -227,3 +266,39 @@ class InvitationAdmin(admin.ModelAdmin):
def save_model(self, request, obj, form, change):
obj.issuer = request.user
obj.save()


@staff_member_required
def run_indexing_view(request: HttpRequest):
"""Custom admin view for running indexing commands."""
if request.method == "POST":
form = RunIndexingForm(request.POST)
if form.is_valid():
lower_time_bound = form.cleaned_data.get("lower_time_bound")
upper_time_bound = form.cleaned_data.get("upper_time_bound")
call_command(
"index",
batch_size=form.cleaned_data["batch_size"],
lower_time_bound=lower_time_bound.isoformat()
if lower_time_bound
else None,
upper_time_bound=upper_time_bound.isoformat()
if upper_time_bound
else None,
async_mode=True,
)
Comment thread
mascarpon3 marked this conversation as resolved.
Comment thread
mascarpon3 marked this conversation as resolved.
messages.success(request, _("Indexing triggered!"))
return redirect("run_indexing")
messages.error(request, _("Please correct the errors below."))
else:
form = RunIndexingForm()
Comment thread
mascarpon3 marked this conversation as resolved.
Comment thread
mascarpon3 marked this conversation as resolved.

return render(
request=request,
template_name="runindexing.html",
context={
**admin.site.each_context(request),
"title": "Run Indexing Command",
"form": form,
},
)
15 changes: 15 additions & 0 deletions src/backend/core/factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from django.contrib.auth.hashers import make_password

import factory.fuzzy
from factory import post_generation
from faker import Faker

from core import models
Expand Down Expand Up @@ -159,6 +160,20 @@ def masked_by(self, create, extracted, **kwargs):
document=self, user=item, defaults={"is_masked": True}
)

@post_generation
def updated_at(self, create, extracted, **kwargs):
"""
the BaseModel.updated_at has auto_now=True.
This prevents setting a specific updated_at value with the factory.

This post_generation method bypasses this behavior.
"""
if not create or not extracted:
return

self.__class__.objects.filter(pk=self.pk).update(updated_at=extracted)
self.refresh_from_db()


class UserDocumentAccessFactory(factory.django.DjangoModelFactory):
"""Create fake document user accesses for testing."""
Expand Down
42 changes: 42 additions & 0 deletions src/backend/core/forms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
"""Forms for the core app."""

from django import forms
from django.conf import settings
from django.utils.translation import gettext_lazy as _


class RunIndexingForm(forms.Form):
"""
Form for running the indexing process.
"""

batch_size = forms.IntegerField(
min_value=1,
initial=settings.SEARCH_INDEXER_BATCH_SIZE,
)
lower_time_bound = forms.DateTimeField(
required=False, widget=forms.TextInput(attrs={"type": "datetime-local"})
)
upper_time_bound = forms.DateTimeField(
required=False, widget=forms.TextInput(attrs={"type": "datetime-local"})
)
Comment on lines +17 to +22
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look at Django's default DATETIME_INPUT_FORMATS and project overrides
rg -nP 'DATETIME_INPUT_FORMATS|USE_TZ\s*=' --type=py -C1
# Check how the index command's datetime.fromisoformat() handles the produced string
rg -nP 'fromisoformat|isoformat\(' src/backend/core --type=py -C2

Repository: suitenumerique/docs

Length of output: 50376


🏁 Script executed:

# Search for fromisoformat usage more specifically
rg -n 'fromisoformat' src/backend --type=py -B2 -A2

Repository: suitenumerique/docs

Length of output: 1019


🏁 Script executed:

# Look for admin.py and how it uses the form
fd -type f -name 'admin.py' src/backend

Repository: suitenumerique/docs

Length of output: 234


🏁 Script executed:

# Search for DATETIME_INPUT_FORMATS customization
rg -n 'DATETIME_INPUT_FORMATS' src/backend --type=py -A3

Repository: suitenumerique/docs

Length of output: 45


🏁 Script executed:

# Find the index command that uses these bounds
rg -n 'lower_time_bound|upper_time_bound' src/backend --type=py -B2 -A2

Repository: suitenumerique/docs

Length of output: 22766


🏁 Script executed:

# Find Django version requirement
find . -name 'requirements*.txt' -o -name 'pyproject.toml' -o -name 'setup.py' | xargs grep -l 'django' 2>/dev/null | head -5

Repository: suitenumerique/docs

Length of output: 92


🏁 Script executed:

# Check for tests of the admin form itself with datetime submission
rg -n 'RunIndexingForm|test.*admin.*indexing' src/backend/core/tests --type=py -l

Repository: suitenumerique/docs

Length of output: 45


🏁 Script executed:

# Look at actual admin.py to see the context around form usage
cat -n src/backend/core/admin.py | sed -n '270,295p'

Repository: suitenumerique/docs

Length of output: 1256


🏁 Script executed:

# Check Django version in pyproject.toml
cat src/backend/pyproject.toml | grep -A 20 -B 5 'django'

Repository: suitenumerique/docs

Length of output: 3191


🏁 Script executed:

# Look for any tests of the RunIndexingForm with actual datetime-local input
rg -n 'RunIndexingForm|run_indexing_view' src/backend/core/tests --type=py -B 3 -A 5

Repository: suitenumerique/docs

Length of output: 1649


🏁 Script executed:

# Check if there's a settings file that customizes DATETIME_INPUT_FORMATS
find src/backend -name 'settings.py' -o -name 'settings_*' | xargs cat 2>/dev/null | rg -A 5 'DATETIME_INPUT_FORMATS' || echo "No DATETIME_INPUT_FORMATS found"

Repository: suitenumerique/docs

Length of output: 95


🏁 Script executed:

# Get the full test file to see how the form is tested
cat -n src/backend/core/tests/test_admin_run_indexing.py

Repository: suitenumerique/docs

Length of output: 1928


🏁 Script executed:

# Check Django's default DATETIME_INPUT_FORMATS to understand what it accepts
python3 << 'EOF'
from django.conf import settings
import django
# Try to get Django's default input formats
print("Python code to understand Django's DATETIME_INPUT_FORMATS")
# In Django 3.2+, the defaults include ISO 8601 formats
# We need to know which version of Django is being used
# Version constraint is: django<6.0.0 (from pyproject.toml)
# So it could be Django 3.2, 4.0, 4.1, 4.2, or 5.x

# Let's check what formats modern Django supports
try:
    import django
    print(f"Django version in environment: {django.get_version() if hasattr(django, 'get_version') else 'unknown'}")
except:
    pass

# The key issue: does DateTimeField accept "2024-02-01T12:00" format?
# Django's default DATETIME_INPUT_FORMATS (since 3.2) are:
# '%Y-%m-%d %H:%M:%S',
# '%Y-%m-%d %H:%M:%S.%f',
# '%Y-%m-%d %H:%M',
# Note: NO T-separated format by default

print("Django defaults do NOT include T-separated ISO format by default")
print("HTML5 datetime-local submits: 2024-02-01T12:00")
print("Django defaults expect: 2024-02-01 12:00")
print("\nThis is a parsing mismatch!")
EOF

Repository: suitenumerique/docs

Length of output: 182


🏁 Script executed:

# Look at the complete forms.py file to see all form definitions
cat -n src/backend/core/forms.py

Repository: suitenumerique/docs

Length of output: 1684


🏁 Script executed:

# Search for any test that actually submits datetime values to the form
rg -n 'lower_time_bound.*=' src/backend/core/tests --type=py -B 2 -A 2 | head -50

Repository: suitenumerique/docs

Length of output: 4398


🏁 Script executed:

# Check if there's a custom DateTimeField or widget in the codebase
rg -n 'class.*DateTimeField|class.*DateTimeInput' src/backend --type=py

Repository: suitenumerique/docs

Length of output: 45


Add explicit input_formats to DateTimeField to accept HTML5 datetime-local format.

The HTML5 datetime-local widget submits values as YYYY-MM-DDTHH:MM[SS] (T-separated, no timezone), but Django's default DATETIME_INPUT_FORMATS only include space-separated formats (%Y-%m-%d %H:%M[:%S]). Form validation will fail with "Enter a valid date/time" when users submit via the HTML5 widget.

Also ensure consistent timezone handling: form fields produce naive datetimes (no tzinfo); these are then passed to filter_updated_at() which compares against tz-aware updated_at values. Either use USE_TZ=False or explicitly handle timezone awareness in the form and downstream queries.

Suggested fix
lower_time_bound = forms.DateTimeField(
    required=False,
    widget=forms.TextInput(attrs={"type": "datetime-local"}),
    input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"]
)
upper_time_bound = forms.DateTimeField(
    required=False,
    widget=forms.TextInput(attrs={"type": "datetime-local"}),
    input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"]
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/core/forms.py` around lines 17 - 22, The DateTimeField widgets
lower_time_bound and upper_time_bound need explicit input_formats to accept
HTML5 "datetime-local" values (e.g. "%Y-%m-%dT%H:%M" and "%Y-%m-%dT%H:%M:%S")
and you must normalize timezone awareness before passing results to
filter_updated_at; update those fields to include
input_formats=["%Y-%m-%dT%H:%M", "%Y-%m-%dT%H:%M:%S"] and then, depending on
your project USE_TZ setting, either convert the parsed naive datetimes to aware
datetimes using
django.utils.timezone.make_aware(timezone.get_current_timezone()) before calling
filter_updated_at or adjust filter_updated_at to accept naive values by
localizing them consistently.


def clean(self):
"""Override clean to validate time bounds."""
cleaned_data = super().clean()
self.check_time_bounds()
return cleaned_data

def check_time_bounds(self):
"""Validate that lower_time_bound is before upper_time_bound."""
lower_time_bound = self.cleaned_data.get("lower_time_bound")
upper_time_bound = self.cleaned_data.get("upper_time_bound")
if (
lower_time_bound
and upper_time_bound
and lower_time_bound > upper_time_bound
):
self.add_error(
"upper_time_bound",
_("Upper time bound must be after lower time bound."),
)
Comment thread
mascarpon3 marked this conversation as resolved.
77 changes: 62 additions & 15 deletions src/backend/core/management/commands/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,16 @@

import logging
import time
from datetime import datetime

from django.conf import settings
from django.core.management.base import BaseCommand, CommandError

from core import models
from core.services.search_indexers import get_document_indexer
from core.tasks.search import batch_document_indexer_task

logger = logging.getLogger("docs.search.bootstrap_search")
logger = logging.getLogger(__name__)


class Command(BaseCommand):
Expand All @@ -24,9 +28,32 @@ def add_arguments(self, parser):
action="store",
dest="batch_size",
type=int,
default=50,
default=settings.SEARCH_INDEXER_BATCH_SIZE,
help="Indexation query batch size",
)
Comment thread
mascarpon3 marked this conversation as resolved.
parser.add_argument(
"--lower-time-bound",
action="store",
dest="lower_time_bound",
type=datetime.fromisoformat,
default=None,
help="DateTime in ISO format. Only documents updated after this date will be indexed",
)
parser.add_argument(
"--upper-time-bound",
action="store",
dest="upper_time_bound",
type=datetime.fromisoformat,
default=None,
help="DateTime in ISO format. Only documents updated before this date will be indexed",
)
parser.add_argument(
"--async",
action="store_true",
dest="async_mode",
default=False,
help="Whether to execute indexing asynchronously in a Celery task (default: False)",
)
Comment thread
mascarpon3 marked this conversation as resolved.

def handle(self, *args, **options):
"""Launch and log search index generation."""
Expand All @@ -35,18 +62,38 @@ def handle(self, *args, **options):
if not indexer:
raise CommandError("The indexer is not enabled or properly configured.")

logger.info("Starting to regenerate Find index...")
start = time.perf_counter()
batch_size = options["batch_size"]
if options["async_mode"]:
try:
batch_document_indexer_task.apply_async(
kwargs={
"lower_time_bound": options["lower_time_bound"],
"upper_time_bound": options["upper_time_bound"],
"batch_size": options["batch_size"],
"crash_safe_mode": True,
},
)
except Exception as err:
raise CommandError("Unable to dispatch indexing task") from err
logger.info("Document indexing task sent to worker")
else:
logger.info("Starting to regenerate Find index...")
start = time.perf_counter()

try:
count = indexer.index(batch_size=batch_size)
except Exception as err:
raise CommandError("Unable to regenerate index") from err
try:
count = indexer.index(
queryset=models.Document.objects.filter_updated_at(
lower_time_bound=options["lower_time_bound"],
upper_time_bound=options["upper_time_bound"],
),
batch_size=options["batch_size"],
crash_safe_mode=True,
)
except Exception as err:
raise CommandError("Unable to regenerate index") from err

duration = time.perf_counter() - start
logger.info(
"Search index regenerated from %d document(s) in %.2f seconds.",
count,
duration,
)
duration = time.perf_counter() - start
logger.info(
"Search index regenerated from %d document(s) in %.2f seconds.",
count,
duration,
)
32 changes: 32 additions & 0 deletions src/backend/core/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -859,6 +859,38 @@ def annotate_user_roles(self, user):
user_roles=models.Value([], output_field=output_field),
)

def filter_updated_at(self, lower_time_bound=None, upper_time_bound=None):
Comment thread
mascarpon3 marked this conversation as resolved.
"""
Filter documents by update_at.

Args:
lower_time_bound (datetime, optional):
Keep documents updated after this timestamp.
upper_time_bound (datetime, optional):
Keep documents updated before this timestamp.

Returns:
QuerySet: Filtered queryset ready for indexation.
"""
conditions = models.Q()
if lower_time_bound and upper_time_bound:
conditions = models.Q(
updated_at__gte=lower_time_bound, updated_at__lte=upper_time_bound
) | models.Q(
ancestors_deleted_at__gte=lower_time_bound,
ancestors_deleted_at__lte=upper_time_bound,
)
elif lower_time_bound:
conditions = models.Q(updated_at__gte=lower_time_bound) | models.Q(
ancestors_deleted_at__gte=lower_time_bound
)
elif upper_time_bound:
conditions = models.Q(updated_at__lte=upper_time_bound) | models.Q(
ancestors_deleted_at__lte=upper_time_bound
)

return self.filter(conditions)
Comment thread
coderabbitai[bot] marked this conversation as resolved.


class DocumentManager(MP_NodeManager.from_queryset(DocumentQuerySet)):
"""
Expand Down
Loading
Loading