Skip to content

More vector data for programs, programs-as-courses#3115

Open
mbertrand wants to merge 15 commits intomainfrom
mb/program_content
Open

More vector data for programs, programs-as-courses#3115
mbertrand wants to merge 15 commits intomainfrom
mb/program_content

Conversation

@mbertrand
Copy link
Copy Markdown
Member

@mbertrand mbertrand commented Mar 26, 2026

What are the relevant tickets?

Part of https://github.com/mitodl/hq/issues/10677

Description (What does it do?)

  • Adds non-published test-mode courses/programs as children (formerly only published resources were added as children)
  • Appends summary info on all child courses/programs in the LearningResourceMetadataDisplaySerializer (which ends up in a contentfile for the resource used by the syllabusbot to answer questions)

How can this be tested?

You'll need both learn-ai (use the mb/related_resources branch for this related PR) and mit-learn running.

####mit-learn

backend.local.env:

MITX_ONLINE_BASE_URL=https://rc.mitxonline.mit.edu/
MITX_ONLINE_COURSES_API_URL=https://rc.mitxonline.mit.edu/api/v2/courses/
MITX_ONLINE_PROGRAMS_API_URL=https://rc.mitxonline.mit.edu/api/v2/programs/

frontend.local.env:

NEXT_PUBLIC_LEARN_AI_RECOMMENDATION_ENDPOINT=http://ai.open.odl.local:8005/http/recommendation_agent/
NEXT_PUBLIC_LEARN_AI_SYLLABUS_ENDPOINT=http://ai.open.odl.local:8005/http/syllabus_agent/

####learn-ai

backend.local.env:

LEARN_ACCESS_TOKEN=supertopsecret

frontend.local.env:

NEXT_PUBLIC_MIT_LEARN_API_BASE_URL="http://open.odl.local:8065"

shared.local.env:

AI_MIT_CONTENTFILE_URL=http://host.docker.internal:8065/api/v1/contentfiles
AI_MIT_SEARCH_VECTOR_URL=http://host.docker.internal:8065/api/v0/vector_learning_resources_search/
AI_MIT_SEARCH_ELASTIC_URL=http://host.docker.internal:8065/api/v1/learning_resources_search/
AI_MIT_SEARCH_URL=http://host.docker.internal:8065/api/v1/learning_resources_search/
AI_MIT_SYLLABUS_URL=http://host.docker.internal:8065/api/v0/vector_content_files_search/
AI_MIT_VIDEO_TRANSCRIPT_URL=http://host.docker.internal:8065/api/v0/vector_content_files_search/
AI_MIT_SEARCH_DETAIL_URL=http://host.docker.internal:8062/?resource=
  • Start containers for both learn-ai and mit-learn
  • In mit learn:
    • docker compose up
    • Log in as an admin
    • Go to http://api.open.odl.local:8065/admin/users/user/
    • Create a new user called "learn_ai_user" and add "content_file_viewers" to the user's groups
    • Go to http://api.open.odl.local:8065/admin/oauth2_provider/accesstoken/ and add a new access token for user "learn_ai_user" equal to value of LEARN_ACCESS_TOKEN above
    • docker compose run --rm web python manage.py backpopulate_mitxonline_data
    • wait until all indexing tasks are complete.
    • In a shell, run this to make sure embeddings are generated for marketing pages:
         from learning_resources.tasks import scrape_marketing_pages
         scrape_marketing_pages.delay()
    • Go to http://open.odl.local:6333/dashboard#/collections/resource_embeddings.content_files and search for "resource_readable_id: program-v1:UAI+B2C.4" - the chunk_content should include information about the child resource course-v1:UAI_SOURCE+UAI.12
    • Now search for "resource_readable_id: program-v1:UAI+B2C" - the chunk_content should include information about multiple child resources (programs and courses)
    • If you don't find them, you might need to use the vector_search mgmt commands to regenerate the collections and embeddings.
    • Go to either one of the following, whatever you prefer to use:
      • http://ai.open.odl.local:8003/?rec_prompt=&tab=SyllabusGPT&syllabus_prompt=&syllabus_resource=<id of "Universal AI" program>
      • http://open.odl.local:8062/search?resource=<id of "Universal AI" program>&syllabus= and click "AskTIM"
    • Ask questions. Compare to answers you get on rc.learn.mit.edu or https://learn-ai-qa.ol.mit.edu/ for the same program
    • Repeat for "Fundamentals of Large Language Models" program-as-a-course

Copilot AI review requested due to automatic review settings March 26, 2026 21:20
@github-actions
Copy link
Copy Markdown

OpenAPI Changes

Show/hide 2 changes: 0 error, 0 warning, 2 info
2 changes: 0 error, 0 warning, 2 info
info	[response-required-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resource_display_info/
		added the required property 'results/items/program_courses' to the response with the '200' status

info	[response-required-property-added] at head/openapi/specs/v1.yaml	
	in API GET /api/v1/learning_resource_display_info/{id}/
		added the required property 'program_courses' to the response with the '200' status


Unexpected changes? Ensure your branch is up-to-date with main (consider rebasing).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands program/course vector and display metadata so downstream AI/embedding workflows can include more complete program hierarchies, including test_mode resources.

Changes:

  • Allow fetch_only course/program lookups to return test_mode resources (not just published).
  • Append a markdown “Program Contents” section (including child summaries) to program marketing-page content files during scraping.
  • Add program_courses to the display-info serializer/OpenAPI spec and regenerate the frontend API types.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
openapi/specs/v1.yaml Adds program_courses to the v1 schema for display info responses.
learning_resources/utils_test.py Adds unit tests for program-children markdown generation.
learning_resources/utils.py Implements program child hierarchy collection and markdown formatting (with content summaries).
learning_resources/tasks.py Appends generated program-children markdown to scraped marketing page content.
learning_resources/serializers_test.py Adds tests for recursive program_courses collection behavior and depth limiting.
learning_resources/serializers.py Adds program_courses field to metadata display serializer; tweaks chunk header to be resource-type aware.
learning_resources/etl/loaders_test.py Updates expectation for the new warning message when fetch-only lookup fails.
learning_resources/etl/loaders.py Expands fetch-only lookup to include test_mode resources and updates warning text.
frontends/api/src/generated/v1/api.ts Regenerates TS types to include program_courses.

@mbertrand mbertrand changed the title More vector data for courses/programs More vector data for programs, programs-as-courses Mar 26, 2026
@mbertrand mbertrand requested a review from Copilot March 27, 2026 11:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.

@mbertrand mbertrand force-pushed the mb/program_content branch from acd49b9 to b3fbd35 Compare March 27, 2026 16:20
@mbertrand mbertrand added Needs Review An open Pull Request that is ready for review and removed Work in Progress labels Mar 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated no new comments.

@abeglova abeglova self-assigned this Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review An open Pull Request that is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants