Skip to content

fix: reverse SOC/CAC detection order in POST /chunks handler#5445

Open
significance wants to merge 6 commits intoethersphere:masterfrom
significance:fix/soc-chunk-detection
Open

fix: reverse SOC/CAC detection order in POST /chunks handler#5445
significance wants to merge 6 commits intoethersphere:masterfrom
significance:fix/soc-chunk-detection

Conversation

@significance
Copy link
Copy Markdown
Member

@significance significance commented Apr 25, 2026

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

The POST /chunks handler tries CAC (Content Addressed Chunk) parsing before SOC (Single Owner Chunk) parsing. Since CAC parsing accepts any data between 8-4104 bytes, valid SOC data is always misclassified as CAC and stored at the wrong content-addressed hash instead of the correct SOC address (Hash(Identifier || Owner)).

This breaks client-side SOC construction workflows where pre-stamped SOCs are uploaded via POST /chunks — the chunk is stored but can never be retrieved at its expected SOC address.

The fix reverses the detection order: try SOC parsing first, fall back to CAC.

Open API Spec Version Changes (if applicable)

N/A — no API contract change, only fixes incorrect internal routing of chunk types.

Motivation and Context

Client-side SOC construction (e.g. using bee-js makeContentAddressedChunk().toSingleOwnerChunk() then uploadChunk()) relies on POST /chunks correctly identifying and storing SOCs at their SOC address. Without this fix, all SOCs uploaded via this endpoint are silently stored as CACs at the wrong address.

Related Issue

N/A

Screenshots (if appropriate):

N/A

AI Disclosure

  • This PR contains code that has been generated by an LLM.
  • I have reviewed the AI generated code thoroughly.
  • I possess the technical expertise to responsibly review the code generated in this PR.

AI Canary 🦜

Updated diff not yet reviewed by the author

SOC chunks uploaded via the generic /chunks endpoint are misclassified
as CAC chunks because the handler tries CAC parsing first, which always
succeeds for valid SOC data (8-4104 bytes). These tests demonstrate that
SOC uploads return incorrect addresses and are unretrievable.
Try SOC parsing before CAC in the chunk upload handler. CAC parsing
accepts any data between 8-4104 bytes, so valid SOC data was always
misclassified as CAC and stored at the wrong content-addressed hash
instead of the correct SOC address (Hash(Identifier || Owner)).
@significance significance changed the title Fix/soc chunk detection fix: reverse SOC/CAC detection order in POST /chunks handler Apr 27, 2026
@acud
Copy link
Copy Markdown
Contributor

acud commented Apr 27, 2026

@significance thanks for this. a few comments:

  • the /chunks api endpoint was never made to upload SOCs
  • i am in favor of this api simplification, but in which case we should also get rid of the soc api path - it makes no sense to keep both and just increases confusion in the api
  • having a unified api would be better. but it would also require just having a unified type of upload requests. right now those are asymmetric - the soc endpoint requires both the owner and the id and verifies them. i am not entirely sure this is needed to be done in this way
  • i would probably just make a simple validation on both without having to leak too much about the internals of the chunk (like is done in the retrieval protocol for example) and leave the task of crafting a valid chunk to external tooling altogher

@agazso
Copy link
Copy Markdown
Contributor

agazso commented Apr 28, 2026

@significance thanks for this. a few comments:

* the `/chunks` api endpoint was never made to upload SOCs

* i am in favor of this api simplification, but in which case we should also get rid of the `soc` api path - it makes no sense to keep both and just increases confusion in the api

* having a unified api would be better. but it would also require just having a unified type of upload requests. right now those are asymmetric - the soc endpoint requires both the owner and the id and verifies them. i am not entirely sure this is needed to be done in this way

* i would probably just make a simple validation on both without having to leak too much about the internals of the chunk (like is done in the retrieval protocol for example) and leave the task of crafting a valid chunk to external tooling altogher

I think this is highly needed and it would be good to have a generic chunk upload endpoint. In my opinion what would make the most sense is to have a /chunks endpoint to upload any type of chunk applying the detection logic from the PR.

This PR introduces a potential performance penalty when using only content-addressed chunks, because the signature would need to be validated even if the chunk is content-addressed. This could be mitigated by having a /cac endpoint (semantically similarly to the /soc endpoint) which could only accept content-addressed chunks. Alternatively the unified /chunks endpoint could have options, either with query parameters or headers to indicate which chunk type is being uploaded explicitly.

See related issues/mentions:

@acud
Copy link
Copy Markdown
Contributor

acud commented Apr 28, 2026

This PR introduces a potential performance penalty when using only content-addressed chunks, because the signature would need to be validated even if the chunk is content-addressed. This could be mitigated by having a /cac endpoint (semantically similarly to the /soc endpoint) which could only accept content-addressed chunks. Alternatively the unified /chunks endpoint could have options, either with query parameters or headers to indicate which chunk type is being uploaded explicitly.

I think that having a header would make sense, then we can have a unified interface and get rid of the extra endpoints. The question is whether we want to have opinionated soc validation like today in the current soc upload endpoint (for me this is a bit too much. not sure it needs to go that deep).

@significance
Copy link
Copy Markdown
Member Author

yes, must admit i did not consider the performance implication there, as i was in app dev mode and just scratching an itch/removing barriers : )

from bee pov header sgtm, thank you both 🙇

@significance significance force-pushed the fix/soc-chunk-detection branch from acfa90b to 489b34f Compare April 29, 2026 11:31
@significance
Copy link
Copy Markdown
Member Author

significance commented Apr 29, 2026

ethersphere/SWIPs#67

btw @agazso @acud this is the longer term solution to this problem

this is great for a one chunk solution but for efficiencies sake i would like to pursue the chunk typing system proposed above and streaming presigned chunks en masse

Allow callers to explicitly specify chunk type ('soc'/'1' or 'cac'/'0')
via the Swarm-Chunk-Type header, skipping auto-detection and avoiding
unnecessary SOC signature validation for CAC uploads.
@significance
Copy link
Copy Markdown
Member Author

and immediately running into the problem again because CAC ⊄ SOC, SOC ⊂ CAC

When Swarm-Chunk-Type: cac is set but the payload is a valid SOC, the handler silently stores it at the wrong address.

there is no way to mitigate this without the chunk types

@acud
Copy link
Copy Markdown
Contributor

acud commented May 1, 2026

ethersphere/SWIPs#67

btw @agazso @acud this is the longer term solution to this problem

this is great for a one chunk solution but for efficiencies sake i would like to pursue the chunk typing system proposed above and streaming presigned chunks en masse

that SWIP is more about how we pass chunks around in the protocols. while it doesn't necessarily touch upon the point of API contracts (but rather wire protocol contracts), at least in theory - there's nothing preventing us from doing it in this way. the only thing to keep in mind is that once it leaves the wire protocol domain and enters the API, both upload and download must use the new protobuf type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants