Conversation
6c9a6e6 to
b85a439
Compare
e6d05b1 to
50a59f6
Compare
…red dev server running
alxndrsn
left a comment
There was a problem hiding this comment.
This looks great! 👍
Not sure that the changes to README.md are relevant.
| "emit": "readonly", | ||
| "PouchDB": "readonly" | ||
| "PouchDB": "readonly", | ||
| "__pouch__": "readonly" |
There was a problem hiding this comment.
Could window.__pouch__ could be used instead of adding an extra constant here?
There was a problem hiding this comment.
Omitting window. from the test code was a readability win to me but I don't have a strong preference either way. Happy to change this if it's a big issue.
TESTING.md
Outdated
| > **Always set the `COUCH_HOST` env var to a proper CouchDB for the integration tests!** | ||
| > [!WARNING] | ||
| > VERY IMPORTANT **Always set the `COUCH_HOST` env var to a proper CouchDB for | ||
| > the integration tests!** |
There was a problem hiding this comment.
Are these changes related to the new tests?
There was a problem hiding this comment.
This commit only changes whitespace in TESTING.md; my editor hard-wraps text/markdown files so they're readable in terminal/editors, and these formatting changes are in their own commit, separate from any content changes.
There was a problem hiding this comment.
to clarify, I am generally fine with cleanup commits that make things easier as long as they are separate.
@alxndrsn do you prefer these to be separate PRs?
| return page.evaluate(fn); | ||
| } | ||
|
|
||
| _getPage(id) { |
There was a problem hiding this comment.
"page" means "tab"? Maybe helpful to use consistent language?
There was a problem hiding this comment.
page is the playwright API language and it is consistent with that. Of course we can change it, but not sure if worth it?
There was a problem hiding this comment.
I was unsure about this. Everyone discussing these issues uses "tab" but Playwright uses "page". Strictly speaking, a "tab" is a particular UI for operating multiple pages, but it's not the only one. Colloquially I think "multi-tab" communicates "several pages operating concurrently" better than "multi-page" which to me sounds like a multi-step sequential workflow.
Overview
This PR adds a testing setup for multi-tab situations, and includes a test that
reproduces a reported bug (see linked issues and PRs) when the
indexeddbadapater operates in multi-tab settings.
I have a theory of why multi-tab interactions are broken on
indexeddbwhich Iinclude a couple of possible fixes for. The theory goes like this:
When you open a
PouchDBinstance on this adapter in tab A, it callsindexedDB.open(dbname, v1)wherev1is the database version. In PouchDBthe version is basically the current unix time.
When another tab B opens the same database it calls
indexedDB.open(dbname, v2)wherev2 > v1. This triggersonversionchangein tab A since anotherclient has opened the database at a higher version.
PouchDB handles
onversionchangeby doingindexedDB.open(dbname)with noversion, i.e. it opens the database at whatever version is already open. This
is to prevent multiple tabs interacting with the database from thrashing
open()calls as they keep invalidating each other's connections with higherversion numbers. (A comment in the code refers to this as an "infinite loop"
which I think is inaccurate; calling
open()only happens if you interactwith the database, so doesn't passively cause an infinite loop, but it does
cause thrashing if multiple tabs interact with the database.)
Aside: within a single-tab context, IndexedDB connections are cached by name
so only one connection per database is needed, so this problem does not happen
in single-tab contexts even if multiple
PouchDBinstances are active.The other ingredient of this problem is that the adapter caches a "metadata"
object containing
doc_countandseqin memory when the database is firstopened. This data is stored in the
metaobject store and only updated by thebulkDocs()function. All other operations read this value from the in-memorycache. The
seqvalue must be up to date as each doc update in the databasemust have a unique
seqvalue, and this is enforced by a unique index inIndexedDB. The error that occurs in the multi-tab case is IndexedDB reporting
a violation of this constraint.
Therefore, the following set of events are possible:
indexedDB.open(name, v1)indexedDB.open(name, v2)wherev2 > v1indexedDB.open(name)The last
open()does not triggeronversionchangein tab B, and so there arenow two active connections to the same IndexedDB database. And both of these
PouchDB instances have an in-memory cache of the
{ doc_count, seq }metadata.When one of these clients updates a doc, the cached metadata in the other tab
becomes stale, and this causes a
seqconflict when it next tries to update adoc. It also causes the result of
info()and other methods to be incorrect.Why does PouchDB use versions in this way?
when the database is first opened, and only when it is opened with a higher
version than it was last opened at.
to make sure these are up to date when the database is opened.
uses the current time as the version to give itself the opportunity to define
any necessary indexes.
createIndex(), this also bumps the database's version,triggering
onversionchangein other tabs and forcing them to re-open thedatabase with the right indexes defined.
Possible fixes I am considering:
Always give the version when re-opening the database. i.e. remove the
mechanism that means
open()is called without a version number when it'striggered via
onversionchangeand always pass in the current time. Thischanges nothing for single-tab contexts, they can continue to cache metadata.
However, it means any interaction with the database will cause
onversionchangein other tabs and force them to callopen()and reloadthe metadata again. So in multi-tab contexts, this causes
open()thrashing.Stop caching metadata. If multiple connections can coexist without
triggering
onversionchangeon each other, then caching metadata in memoryis simply wrong as there's no way to detect that it's stale. (You could
handle conflicts on
seqwhen updating docs but this still allowsinfo()to return stale data.) This means metadata would always be re-read from the
metastore for all interactions, even for single tabs, which could haveperformance implications.
I have not been able to see any measurable difference in the performance tests
as a result of removing metadata caching. To me, option 1 seems strictly worse
than 2 in the multi-tab case, since re-opening the connection also entails
re-reading the metadata, so in effect it removes metadata caching when
open()thrashing occurs. Option 2's possible performance downside in single-tab
situations does not seem measurable so I'm inclined to prefer this solution.
Implementations of both solutions are available in the last 2 commits, so
whichever one we decide to go with, the other can be removed from the history.
I have learned one thing from implementing solution 2:
bulkDocs()needs themetadata in memory, because it uses
idb_attachment_formatto decide thingsinside
preProcessAttachments(), which is async and so cannot be done inside anIndexedDB transaction. We pre-process attachments and then start a transaction
to write the docs and updated metadata.
I have solved this by keeping the metadata returned by
setup()in a var calledcachedMetaand passing this in tobulkDocs(), while everything else looksthe metadata up dynamically.
bulkDocs()also loads it dynamically inside thetransaction to get up to date
doc_countandseq. This makes the distinctionbetween
metadataandcachedMetaa bit confusing and not very revealing ofwhy any of this is being done.
To fix this, we could consider a migration to separate "static" metadata like
the UUID and
idb_attachment_formatfrom "dynamic" things likedoc_countandseqwhich change on doc writes. The former could be cached in memory onopen()without causing race conditions, while the latter needs to be read fromIDB all the time to make sure it's not stale.
Testing recommendations
Related Issues or Pull Requests
Checklist
docsfolder