Skip to content

SWIP 191 & SWIP 192#90

Open
lat-murmeldjur wants to merge 5 commits intoethersphere:masterfrom
lat-murmeldjur:swip_191_192
Open

SWIP 191 & SWIP 192#90
lat-murmeldjur wants to merge 5 commits intoethersphere:masterfrom
lat-murmeldjur:swip_191_192

Conversation

@lat-murmeldjur
Copy link
Copy Markdown

In these two SWIPs I propose a more resilient decentralised commenting engine for swarm.

Combined, the conventions proposed here can provide a substantially more censorship-resistant decentralised commenting system that can operate entirely on Swarm. Besides commenting, the anythread engine can serve a number of social purposes, such as creating searchable user registries where anyone can sign up, sending friend requests without previously established communications channels to discovered users, and publishing and discovering new content.

Copy link
Copy Markdown
Member

@nugaon nugaon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to comment the missing references in the proposal, but I recognized my PR about GSOC has disappeared in this repo... not even in the closed PRs. How is that possible?

Anyway, see my comments below.

edit: I found it by URL: #41 , but is it not shown between pulls.

Comment thread SWIPs/swip-191-192.md


# SWIP 191 - Efficient multiple-version SOC exhaustive lookup
# SWIP 192 - Censorship-resistant decentralised commenting on Swarm
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this SWIP title comes after this section.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first of all thank you for reading through and the valuable feedback, I will try to clarify each point as much as I can

Comment thread SWIPs/swip-191-192.md

SWIP 191 provides a deterministic way to discover all versions of a given SOC with reasonable efficiency. For anythread this is directly useful. If an attacker attempts to censor a specific feed index by publishing additional versions of the same SOC, retrieval no longer has to rely on chance. The application can enumerate all versions of that index and recover the intended one.

This does not solve spam or flooding. Anythread is meant to be censorship-resistant, and content moderation is outside the scope of this proposal. The purpose here is narrower: to make it harder for a participant to hide legitimate thread updates by exploiting the fact that a shared feed owner allows many writers to target the same SOC address.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flood is exponentially more expensive with GSOC, since all updates fall into the same neighborhood area and for writing you need to buy a postage batch with which you must purchase the same store counts in each neighborhood, not only your targeted one.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this limitation exists, but not in a practical sense. to get 1000 chunks in one neighborhood, you only need a batch with depth +10 of the neighborhood depth, it's not prohibitively expensive to get one for even a single actor, and 1000 versions already means that the original ends up behind a hard to digest number of other entries

Comment thread SWIPs/swip-191-192.md
The purpose of SWIP 191 is to provide the retrieval-side mechanism required for efficient exhaustive lookup of multiple versions of a single SOC.
The purpose of SWIP 192 is to make anythread significantly more resistant to censorship and operational failure modes.

## SWIP 191 - Efficient multiple-version SOC exhaustive lookup
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other_than is a nice extension of the constrained SOC request format,
the serialization could happen in the same manner as is detailed in the GSOC proposal
so optionType 0 could be the prefixed way to query and 1 could be the other_than.

Comment thread SWIPs/swip-191-192.md
Comment on lines +194 to +197
This has two useful effects:

- it raises the amount of work required to generate a version that the application will treat as relevant;
- it gives the reader a narrower starting range for exhaustive lookup.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has strong impact on the UX, increasing load time and resource usage for honest clients, who need to do the same work as flooders.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does have an impact on the UX, but only for writers, for loading versions as a reader the time it takes is unaffected, as the reader starts within the narrower range of relevant versions, by calling the get_soc_versions with a set prefix instead of no prefix at the beginning

Comment thread SWIPs/swip-191-192.md
- when a writer publishes base-feed index `N`, it also publishes the same content at pager-feed index `floor(N / pager_size)`;
- each pager-feed index therefore corresponds to a `pager_size` long section of the base-feed indexes;
- because many base indexes map to the same pager index, a single pager index may legitimately accumulate multiple versions;
- the writer is expected to publish the base entry and the pager entry with the same postage stamp, probably making them expire together.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the same postage batch, they should use different stamps

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not necessary to use a different batch for one writer writing the base feed and pager feed.

the pager feed only has one purpose. Let's say there was a popular thread 5 years ago, it got entries in the first 1000 base feed indexes, which correspond to the first 32 pager indexes. 5 years later, only one entry survived expiry, it is on index 728 of the base feed. Now to get that one comment, without the pager feed, one would have to look up the first 728 indexes and then it wouldn't be for sure that further lookahead would not give sporadic results.

This index is index 23 on the pager feed. Since all but this entries are GC'd on both feeds (as the users used the same batches to write both entries), we can more easily check for sporadic remnants by looking up 32 SOCs (the first 32 pager indexes) to check 1024 base feed indexes. This is more efficient, as it requires 992 less retrieve requests to check the larger range.

Comment thread SWIPs/swip-191-192.md
- because many base indexes map to the same pager index, a single pager index may legitimately accumulate multiple versions;
- the writer is expected to publish the base entry and the pager entry with the same postage stamp, probably making them expire together.

This gives the reader an efficient look-ahead mechanism. If the pager size is `32`, then checking the next `16` pager indexes tells the reader whether any updates exist in the next `512` base-feed indexes. Since SWIP 191 already provides a method to enumerate all versions of a pager index, the application can recover the pager information efficiently even though each pager index may hold multiple versions.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not give any guarantee; all records can be garbage collected on a pager feed address over time.
thereby, the same corrections will be required to fetch records as it was a simple base feed

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This construct is not meant to give more protection against garbage collection for writers. It's sole purpose is to make it very efficient for readers to scan large ranges with sporadic entries. Base and pager feed entry of the same user is expected to GC at the same time, and that is fine. When the base feed entries get sporadic, non-continous, the most efficient way to get the sporadic entries from the higher indexes is to finish base feed lookup until the next pager delimitator one by one, and then look ahead based on pager indexes.

For example, if the pager size is 32, and the first 170 base feed indexes were continous, it's still good to look up each entry until index 192 on the base feed (this is why a medium pager size makes sense, so we only have to look up another up to 31 entries on the base feed after we find the first gap). From indexes 192, we can continue looking up pager feed index 6 to 38, and those 32 requests will already give us information whether there are any sporadic updates on the next 1024 base feed indexes.

Comment thread SWIPs/swip-191-192.md
The base feed and the pager feed serve different purposes and are both useful:

- the base feed spreads writes across more neighbourhoods, which makes censorship by local manipulation more difficult;
- the strong correspondence between base indexes and pager indexes helps writers locate where to write in the pager feed without concentrating too many writes onto a single overloaded SOC by accident;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many writes on a SOC by accident? this point is not really clear.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because a normal user looks up the first free index on the base feed, and decide which pager feed entry to write based on "base feed index / pager size", the entries of a pager feed will fluctuate between 0 and pager size (unless there is an attacker overloading one entry intentionally). This does not mean that a pager feed entry can not have more versions than what the pager size is, it only means normal users will not create more versions for a pager feed soc than pager size, so it will not create a distributed effort by good willing users where any SOC gets an unbounded number of versions. This will limit the "normal usage" versions to pager size.

Comment thread SWIPs/swip-191-192.md

The base feed and the pager feed serve different purposes and are both useful:

- the base feed spreads writes across more neighbourhoods, which makes censorship by local manipulation more difficult;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the hostile actor floods the base feed and the pager feed at once? let's say they write one chunk to all pager indices and additionally they overload the base indices with 3 more extra chunks at places that they want to censor?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will still efficiently get all versions of all base feed and pager feed indexes based on swip 191. Beyond that, applying the security parameter makes it more expensive for an attacker to generate 100 versions of a new base feed index. If a user notices the soc containing his version is targeted by an attacker, it will be cheaper for that user to write to a new feed index (with zero previous versions) and the attacker will once again need to do magnitudewise more effort to efficiently flood that new index with another 100 versions.

Comment thread SWIPs/swip-191-192.md
- the feed identity is derived from the anythread owner, the topic, and the current epoch boundary;
- when the epoch changes, a new new-updates feed begins.

Because each such feed exists only for a short interval, it is much less likely to develop large garbage-collected gaps. Even if a small gap appears, looking ahead a few indexes remains cheap. A reader that wants to poll for fresh activity can therefore monitor the current epoch's new-updates feed instead of relying on the first missing index of the long-lived base feed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, this time-anchored feed can be used in any situation where the topmost updates are significant and then I still don't see the meaning to use other feed types.

Copy link
Copy Markdown
Author

@lat-murmeldjur lat-murmeldjur Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a feed has history of 5 months, you would have to check 5 * 31 * 24 * 60 / 5 indexes to get each epoch anchored feed from that period, even if there were only 8 comments in this period. That is an inefficient amount of lookup. Also it's not trivial how much we have to go back in time to get the first such feed with entry. Therefore this construct only helps with live updates. To get historical results it is much more efficient to look up the base and pager feeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants