diff --git a/bip.mediawiki b/bip.mediawiki new file mode 100644 index 0000000000..f22a20263e --- /dev/null +++ b/bip.mediawiki @@ -0,0 +1,224 @@ +
+  BIP: ?
+  Layer: Applications
+  Title: Encoding seed as themed mnemonic sentences
+  Authors: Yuri S Villas Boas 
+           André Fidencio Gonçalves 
+  Status: Draft
+  Type: Specification
+  Assigned: ?
+  License: BSD-2-Clause
+  Requires: 32, 39
+  Discussion: https://gnusha.org/pi/bitcoindev/jQqInjh7VTC5byefTzENidJjigvRqf5Y7UvbrWjKPJykvhdlLETeglGE3zoAiVAxUyAXU8uWHsHEjJ0MHqqPTy4prgaIhgMyIrD9c6ZUuE0=@pm.me/#t
+              https://gnusha.org/pi/bitcoindev/F4cs-RJRQYBXhjoS9fc_cUc93yLrkQS5DNQAeFRHrLEQ5bScCjKSnaqN-IcXb16fxqO053muqFCx8_GzzKN5XCGCIHD9Ir1_baI5voKYfOo=@pm.me/
+              https://www.toptal.com/cryptocurrency/formosa-crypto-wallet-management
+
+ +==Abstract== + +This BIP describes an expansion of BIP-0039 for the generation of deterministic +wallets. Where BIP-0039 uses a flat list of unrelated words, Formosa organizes +mnemonic words into themed sentences with syntactic structure and semantic +coherence, substantially improving memorability while retaining all properties +of the original scheme. + +It consists of two parts: generating the mnemonic and converting it into a +binary seed. This seed can be later used to generate deterministic wallets using +BIP-0032 or similar methods. + +Full forward and backward compatibility with BIP-0039 is maintained: seed +derivation internally converts any Formosa mnemonic back to its equivalent +BIP-0039 representation, so existing keys and addresses are preserved. + +==Copyright== + +This BIP is licensed under the BSD 2-clause license. + +==Motivation== + +A mnemonic code or sentence is superior for human interaction compared to the +handling of raw binary or hexadecimal representations of a wallet seed. The +sentence could be written on paper or spoken over the telephone. + +However, human memory is an associative process: information is more readily +retained when it can be linked to existing knowledge through semantic +associations, visual imagery, and narrative context. A BIP-0039 mnemonic is a +sequence of unrelated words with no syntactic or semantic relationship, making +it difficult to form the mental associations that aid long-term retention. + +Formosa builds upon BIP-0039 by organizing mnemonic words into themed sentences +with syntactic roles (e.g., subject, adjective, object, location). Each sentence +draws vocabulary from a coherent semantic domain --- medieval fantasy, science +fiction, nature, finance, or any custom theme --- enabling the user to form vivid +mental images that reduce memorization effort per bit of entropy. + +This guide is meant to be a way to transport computer-generated randomness with +a human-readable transcription. It's not a way to process user-created +sentences (also known as brainwallets) into a wallet seed. + +==Generating the mnemonic== + +The mnemonic must encode entropy in a multiple of 32 bits. With more entropy +security is improved but the sentence length increases. We refer to the +initial entropy length as ENT. The allowed size of ENT is 128-256 bits. + +First, an initial entropy of ENT bits is generated. A checksum is generated by +taking the first ENT / 32 bits of its SHA256 hash. This checksum is +appended to the end of the initial entropy. Next, these concatenated bits +are split into groups of 33 bits, which we call '''sentences'''. Each sentence is +further subdivided into variable-length bit fields, one per syntactic category, +whose lengths are defined by the active theme. Each bit field encodes an index +into the corresponding category's word list. Finally, we convert these indices +into words and use the joined words as a mnemonic sentence. + +BIP-0039 is a special case where each sentence contains three 11-bit fields +indexing a single 2048-word list (3 x 11 = 33). + +The following table describes the relation between the initial entropy +length (ENT), the checksum length (CS), the number of 33-bit sentences (S), +and the length of the generated mnemonic sentence (MS) in words. The word +count assumes a 6-word theme; for BIP-0039 (3 words per sentence), divide by 2. + +
+CS = ENT / 32
+S  = (ENT + CS) / 33
+
+|  ENT  | CS | ENT+CS |  S  | MS (6-word) | MS (BIP-0039) |
++-------+----+--------+-----+-------------+---------------+
+|  128  |  4 |   132  |  4  |     24      |      12       |
+|  160  |  5 |   165  |  5  |     30      |      15       |
+|  192  |  6 |   198  |  6  |     36      |      18       |
+|  224  |  7 |   231  |  7  |     42      |      21       |
+|  256  |  8 |   264  |  8  |     48      |      24       |
+
+ +For each 33-bit sentence, the word selection algorithm proceeds as follows: + +# Initialize an empty sentence array with one slot per category. +# For each category in the theme's ''filling order'': +## Extract BIT_LENGTH bits from the current position in the bit stream. +## Interpret them as an unsigned integer index. +## If the category is ''led by'' another category, look up the appropriate sub-list from the leading category's mapping using the already-selected leading word. Otherwise, use the category's total word list. +## Select the word at the computed index from the resolved word list. +## Place the word into the sentence array at the position given by the theme's ''natural order''. +# Output the words in natural order. + +==Themes== + +The Formosa equivalent to a BIP-0039 wordlist is a '''theme'''. A theme is a JSON +document that defines syntactic categories, their word lists, bit-widths, and +optional semantic restrictions between categories. The sum of all category +bit-widths in a theme MUST equal 33. + +An ideal theme has the following characteristics: + +a) specific semantic scope (memory block) + - the entire vocabulary should adhere to a single coherent topic, enabling + the user to form a unified mental scene + +b) concrete imagery + - categories should consist of elements easily associated with mental images. + Prefer concrete nouns and tangible adjectives over abstract terms + +c) sorted wordlists + - the wordlist is sorted which allows for more efficient lookup of the code words + (i.e. implementations can use binary search instead of linear search) + +d) first-letters uniqueness + - the wordlist is created in such a way that it's enough to type the first two + letters to unambiguously identify the word + +The first-letters uniqueness property yields higher information density than +BIP-0039. In BIP-0039, four characters are needed to identify each word, +encoding 11 bits per 4 characters = 2.75 bits/character. In Formosa, two +characters suffice per word. The achievable density depends on the theme's +category bit-widths: + +
+| List size | Bits | Chars to identify | Density (bits/char) |
++-----------+------+-------------------+---------------------+
+|   2048    |  11  |        4          |   2.75 (BIP-0039)   |
+|    32     |   5  |        2          |   2.50              |
+|    64     |   6  |        2          |   3.00              |
+|   128     |   7  |        2          |   3.50              |
+
+ +As an example, the ''nationalities'' theme uses four 7-bit nationality +categories (128 entries each) and one 5-bit profession category (32 entries), +yielding 33 bits per 5-word sentence. A user typing only the first two +characters of each word types 10 characters to encode 33 bits, achieving an +information density of 33 / 10 = 3.30 bits/character --- a 20% improvement +over BIP-0039's 2.75 bits/character + +e) semantic restrictions (optional) + - themes may define restrictions between categories so that the available word list + for one category changes depending on the word selected in a leading category, + producing more semantically coherent sentences. Restriction relationships MUST + be acyclic + +The wordlist can contain native characters, but they must be encoded in UTF-8 +using Normalization Form Compatibility Decomposition (NFKD). + +==From mnemonic to seed== + +A user may decide to protect their mnemonic with a passphrase. If a passphrase is not +present, an empty string "" is used instead. + +To ensure forward and backward compatibility with BIP-0039, seed derivation first +converts any Formosa mnemonic back to its equivalent BIP-0039 mnemonic by extracting +the underlying entropy and re-encoding it using the BIP-0039 English word list. This +guarantees that the same entropy always produces the same seed, keys, and addresses +regardless of which theme was used. + +To create a binary seed from the resulting BIP-0039 mnemonic, we use the PBKDF2 function +with a mnemonic sentence (in UTF-8 NFKD) used as the password and the string "mnemonic" + +passphrase (again in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and +HMAC-SHA512 is used as the pseudo-random function. The length of the derived key is 512 +bits (= 64 bytes). + +This seed can be later used to generate deterministic wallets using BIP-0032 or +similar methods. + +The conversion of the mnemonic sentence to a binary seed is completely independent +from generating the sentence. This results in a rather simple code; there are no +constraints on sentence structure and clients are free to implement their own +themes or even whole sentence generators, allowing for flexibility in wordlists +for typo detection or other purposes. + +Although using a mnemonic not generated by the algorithm described in "Generating the +mnemonic" section is possible, this is not advised and software must compute a +checksum for the mnemonic sentence using a wordlist and issue a warning if it is +invalid. + +The described method also provides plausible deniability, because every passphrase +generates a valid seed (and thus a deterministic wallet) but only the correct one +will make the desired wallet available. + +==Standard themes== + +The reference implementation ships with standard themes listed at the link below. +Since BIP-0039 is a valid Formosa theme, all existing BIP-0039 mnemonics work +without modification. + +It is '''strongly discouraged''' to use non-standard custom themes for generating +mnemonic sentences, as the user assumes responsibility for ensuring the theme file +remains available and structurally valid. Users with proper training in security +protocols who understand these risks may benefit from custom themes through higher +memorization efficiency or an additional layer of obscurity. + +* [[https://github.com/Yuri-SVB/formosa/tree/master/src/mnemonic/themes|Standard Formosa Themes]] + +==Test vectors== + +The test vectors include input entropy, mnemonic and seed. The +passphrase "TREZOR" is used for all vectors. Since Formosa converts back to +BIP-0039 before seed derivation, the same test vectors apply to all themes +given the same underlying entropy. + +https://github.com/Yuri-SVB/formosa/blob/master/vectors.json + +==Reference Implementation== + +Reference implementation including themes is available from + +https://github.com/Yuri-SVB/formosa