diff --git a/bip.mediawiki b/bip.mediawiki new file mode 100644 index 0000000000..f22a20263e --- /dev/null +++ b/bip.mediawiki @@ -0,0 +1,224 @@ +
+ BIP: ? + Layer: Applications + Title: Encoding seed as themed mnemonic sentences + Authors: Yuri S Villas Boas+ +==Abstract== + +This BIP describes an expansion of BIP-0039 for the generation of deterministic +wallets. Where BIP-0039 uses a flat list of unrelated words, Formosa organizes +mnemonic words into themed sentences with syntactic structure and semantic +coherence, substantially improving memorability while retaining all properties +of the original scheme. + +It consists of two parts: generating the mnemonic and converting it into a +binary seed. This seed can be later used to generate deterministic wallets using +BIP-0032 or similar methods. + +Full forward and backward compatibility with BIP-0039 is maintained: seed +derivation internally converts any Formosa mnemonic back to its equivalent +BIP-0039 representation, so existing keys and addresses are preserved. + +==Copyright== + +This BIP is licensed under the BSD 2-clause license. + +==Motivation== + +A mnemonic code or sentence is superior for human interaction compared to the +handling of raw binary or hexadecimal representations of a wallet seed. The +sentence could be written on paper or spoken over the telephone. + +However, human memory is an associative process: information is more readily +retained when it can be linked to existing knowledge through semantic +associations, visual imagery, and narrative context. A BIP-0039 mnemonic is a +sequence of unrelated words with no syntactic or semantic relationship, making +it difficult to form the mental associations that aid long-term retention. + +Formosa builds upon BIP-0039 by organizing mnemonic words into themed sentences +with syntactic roles (e.g., subject, adjective, object, location). Each sentence +draws vocabulary from a coherent semantic domain --- medieval fantasy, science +fiction, nature, finance, or any custom theme --- enabling the user to form vivid +mental images that reduce memorization effort per bit of entropy. + +This guide is meant to be a way to transport computer-generated randomness with +a human-readable transcription. It's not a way to process user-created +sentences (also known as brainwallets) into a wallet seed. + +==Generating the mnemonic== + +The mnemonic must encode entropy in a multiple of 32 bits. With more entropy +security is improved but the sentence length increases. We refer to the +initial entropy length as ENT. The allowed size of ENT is 128-256 bits. + +First, an initial entropy of ENT bits is generated. A checksum is generated by +taking the first+ André Fidencio Gonçalves + Status: Draft + Type: Specification + Assigned: ? + License: BSD-2-Clause + Requires: 32, 39 + Discussion: https://gnusha.org/pi/bitcoindev/jQqInjh7VTC5byefTzENidJjigvRqf5Y7UvbrWjKPJykvhdlLETeglGE3zoAiVAxUyAXU8uWHsHEjJ0MHqqPTy4prgaIhgMyIrD9c6ZUuE0=@pm.me/#t + https://gnusha.org/pi/bitcoindev/F4cs-RJRQYBXhjoS9fc_cUc93yLrkQS5DNQAeFRHrLEQ5bScCjKSnaqN-IcXb16fxqO053muqFCx8_GzzKN5XCGCIHD9Ir1_baI5voKYfOo=@pm.me/ + https://www.toptal.com/cryptocurrency/formosa-crypto-wallet-management +
ENT / 32 bits of its SHA256 hash. This checksum is
+appended to the end of the initial entropy. Next, these concatenated bits
+are split into groups of 33 bits, which we call '''sentences'''. Each sentence is
+further subdivided into variable-length bit fields, one per syntactic category,
+whose lengths are defined by the active theme. Each bit field encodes an index
+into the corresponding category's word list. Finally, we convert these indices
+into words and use the joined words as a mnemonic sentence.
+
+BIP-0039 is a special case where each sentence contains three 11-bit fields
+indexing a single 2048-word list (3 x 11 = 33).
+
+The following table describes the relation between the initial entropy
+length (ENT), the checksum length (CS), the number of 33-bit sentences (S),
+and the length of the generated mnemonic sentence (MS) in words. The word
+count assumes a 6-word theme; for BIP-0039 (3 words per sentence), divide by 2.
+
++CS = ENT / 32 +S = (ENT + CS) / 33 + +| ENT | CS | ENT+CS | S | MS (6-word) | MS (BIP-0039) | ++-------+----+--------+-----+-------------+---------------+ +| 128 | 4 | 132 | 4 | 24 | 12 | +| 160 | 5 | 165 | 5 | 30 | 15 | +| 192 | 6 | 198 | 6 | 36 | 18 | +| 224 | 7 | 231 | 7 | 42 | 21 | +| 256 | 8 | 264 | 8 | 48 | 24 | ++ +For each 33-bit sentence, the word selection algorithm proceeds as follows: + +# Initialize an empty sentence array with one slot per category. +# For each category in the theme's ''filling order'': +## Extract
BIT_LENGTH bits from the current position in the bit stream.
+## Interpret them as an unsigned integer index.
+## If the category is ''led by'' another category, look up the appropriate sub-list from the leading category's mapping using the already-selected leading word. Otherwise, use the category's total word list.
+## Select the word at the computed index from the resolved word list.
+## Place the word into the sentence array at the position given by the theme's ''natural order''.
+# Output the words in natural order.
+
+==Themes==
+
+The Formosa equivalent to a BIP-0039 wordlist is a '''theme'''. A theme is a JSON
+document that defines syntactic categories, their word lists, bit-widths, and
+optional semantic restrictions between categories. The sum of all category
+bit-widths in a theme MUST equal 33.
+
+An ideal theme has the following characteristics:
+
+a) specific semantic scope (memory block)
+ - the entire vocabulary should adhere to a single coherent topic, enabling
+ the user to form a unified mental scene
+
+b) concrete imagery
+ - categories should consist of elements easily associated with mental images.
+ Prefer concrete nouns and tangible adjectives over abstract terms
+
+c) sorted wordlists
+ - the wordlist is sorted which allows for more efficient lookup of the code words
+ (i.e. implementations can use binary search instead of linear search)
+
+d) first-letters uniqueness
+ - the wordlist is created in such a way that it's enough to type the first two
+ letters to unambiguously identify the word
+
+The first-letters uniqueness property yields higher information density than
+BIP-0039. In BIP-0039, four characters are needed to identify each word,
+encoding 11 bits per 4 characters = 2.75 bits/character. In Formosa, two
+characters suffice per word. The achievable density depends on the theme's
+category bit-widths:
+
++| List size | Bits | Chars to identify | Density (bits/char) | ++-----------+------+-------------------+---------------------+ +| 2048 | 11 | 4 | 2.75 (BIP-0039) | +| 32 | 5 | 2 | 2.50 | +| 64 | 6 | 2 | 3.00 | +| 128 | 7 | 2 | 3.50 | ++ +As an example, the ''nationalities'' theme uses four 7-bit nationality +categories (128 entries each) and one 5-bit profession category (32 entries), +yielding 33 bits per 5-word sentence. A user typing only the first two +characters of each word types 10 characters to encode 33 bits, achieving an +information density of 33 / 10 = 3.30 bits/character --- a 20% improvement +over BIP-0039's 2.75 bits/character + +e) semantic restrictions (optional) + - themes may define restrictions between categories so that the available word list + for one category changes depending on the word selected in a leading category, + producing more semantically coherent sentences. Restriction relationships MUST + be acyclic + +The wordlist can contain native characters, but they must be encoded in UTF-8 +using Normalization Form Compatibility Decomposition (NFKD). + +==From mnemonic to seed== + +A user may decide to protect their mnemonic with a passphrase. If a passphrase is not +present, an empty string "" is used instead. + +To ensure forward and backward compatibility with BIP-0039, seed derivation first +converts any Formosa mnemonic back to its equivalent BIP-0039 mnemonic by extracting +the underlying entropy and re-encoding it using the BIP-0039 English word list. This +guarantees that the same entropy always produces the same seed, keys, and addresses +regardless of which theme was used. + +To create a binary seed from the resulting BIP-0039 mnemonic, we use the PBKDF2 function +with a mnemonic sentence (in UTF-8 NFKD) used as the password and the string "mnemonic" + +passphrase (again in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and +HMAC-SHA512 is used as the pseudo-random function. The length of the derived key is 512 +bits (= 64 bytes). + +This seed can be later used to generate deterministic wallets using BIP-0032 or +similar methods. + +The conversion of the mnemonic sentence to a binary seed is completely independent +from generating the sentence. This results in a rather simple code; there are no +constraints on sentence structure and clients are free to implement their own +themes or even whole sentence generators, allowing for flexibility in wordlists +for typo detection or other purposes. + +Although using a mnemonic not generated by the algorithm described in "Generating the +mnemonic" section is possible, this is not advised and software must compute a +checksum for the mnemonic sentence using a wordlist and issue a warning if it is +invalid. + +The described method also provides plausible deniability, because every passphrase +generates a valid seed (and thus a deterministic wallet) but only the correct one +will make the desired wallet available. + +==Standard themes== + +The reference implementation ships with standard themes listed at the link below. +Since BIP-0039 is a valid Formosa theme, all existing BIP-0039 mnemonics work +without modification. + +It is '''strongly discouraged''' to use non-standard custom themes for generating +mnemonic sentences, as the user assumes responsibility for ensuring the theme file +remains available and structurally valid. Users with proper training in security +protocols who understand these risks may benefit from custom themes through higher +memorization efficiency or an additional layer of obscurity. + +* [[https://github.com/Yuri-SVB/formosa/tree/master/src/mnemonic/themes|Standard Formosa Themes]] + +==Test vectors== + +The test vectors include input entropy, mnemonic and seed. The +passphrase "TREZOR" is used for all vectors. Since Formosa converts back to +BIP-0039 before seed derivation, the same test vectors apply to all themes +given the same underlying entropy. + +https://github.com/Yuri-SVB/formosa/blob/master/vectors.json + +==Reference Implementation== + +Reference implementation including themes is available from + +https://github.com/Yuri-SVB/formosa