Skip to content

Initial Data Producer Guide for VEDA (seeking feedback)#188

Open
siddharth0248 wants to merge 2 commits intomainfrom
data-guidance
Open

Initial Data Producer Guide for VEDA (seeking feedback)#188
siddharth0248 wants to merge 2 commits intomainfrom
data-guidance

Conversation

@siddharth0248
Copy link
Copy Markdown
Collaborator

Summary

This PR adds an initial version of a Data Producer Guide for VEDA instances.

It includes guidance on:

  • Data formats, file sizes, chunking, and compression
  • Storage, open data requirements, and citation
  • End-to-end data inclusion workflow

Notes

  • Some values are included as starting points based on best practices and references
  • Happy to iterate on specific sections (e.g., chunking, file size ranges) based on feedback

Goal

Provide a clear, consistent starting point for onboarding datasets into VEDA.

@netlify
Copy link
Copy Markdown

netlify bot commented Apr 3, 2026

Deploy Preview for harmonious-cajeta-5542ab ready!

Name Link
🔨 Latest commit 8f78c08
🔍 Latest deploy log https://app.netlify.com/projects/harmonious-cajeta-5542ab/deploys/69d01a349f95a100086f120b
😎 Deploy Preview https://deploy-preview-188--harmonious-cajeta-5542ab.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@abarciauskas-bgse
Copy link
Copy Markdown
Contributor

@siddharth0248 Thank you for contributing! A lot of this guidance is specific to adding data to VEDA instances, especially regarding the workflow. Any VEDA-specific content should go in VEDA docs.

I think much of the data preparation content could be helpful as a part of #183 . I think we should make a branch for the decision framework, add that content to it, and then we can iterate on the decision framework from there. How does that sound to you?

@wildintellect
Copy link
Copy Markdown
Contributor

Thanks for the PR @siddharth0248

As currently written this seems more like a documentation of how VEDA makes decisions and VEDA specific requirements related data cataloging. I would lean towards this material without changes belonging in https://docs.openveda.cloud/ with a link from our Cookbooks section to this as an example of how one organization makes decisions.

The other approach I might suggest, is to move this PR to be a cookbook, and rework some of the text to explain that this is how VEDA decided to do things and why.

I will also note that some of this material is clearly a precursor to #183 , but we would need to make the choices more generic and describe more scenarios in depth to explain why one might pick one format over another. e.g. there are several non-cloud optimized formats mentioned here, and there might be some significant disagreement over their inclusion.

@siddharth0248
Copy link
Copy Markdown
Collaborator Author

siddharth0248 commented Apr 3, 2026

Thanks @abarciauskas-bgse @wildintellect, that sounds like a good plan.

I agree that the VEDA-specific workflow content should live in VEDA docs, and we can keep the more general data preparation guidance here.

For #183, I’ll create a branch (if you want) focused on the decision framework and move over the relevant sections (formats, chunking, compression, etc.) so we can iterate there.

I can also start structuring it as a decision guide to align with the goal of that issue.

Let me know if that direction works or if you have something different in mind for the framework.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants