Arno’s Engram keyboard layout
Engram is a key layout optimized for comfortable and efficient touch typing in English created by Arno Klein, with open source code to create other optimized key layouts. Soon you will be able to install the Engram layout on Windows, macOS, and Linux or try it out online – currently a pull request is under review by the Keyman community. An article is under review (see the preprint for an earlier description and preliminary layout).
Letters are optimally arranged according to ergonomics factors that promote reduction of lateral finger movements and more efficient typing of high-frequency letter pairs. The most common punctuation marks are logically grouped together in the middle columns and numbers are paired with mathematical and logic symbols (shown as pairs of default and Shift-key-accessed characters). See below for a full description and comparisons with other key layouts.
Standard diagonal keyboard (default and Shift-key layers)
“Ergonomic” orthonormal keyboard (default and Shift-key layers)
(c) 2021 Arno Klein, MIT license
Contents
- Why a new key layout?
- How does Engram compare with other key layouts?
- Guiding criteria
- Summary of steps and results
Why a new key layout?
Personal history
In the future, I hope to include an engaging rationale for why I took on this challenge.
Suffice to say I love solving problems, and I have battled repetitive strain injury
ever since I worked on an old DEC workstation at the MIT Media Lab while composing
my thesis back in the 1990s.
I have experimented with a wide variety of human interface technologies over the years –
voice dictation, one-handed keyboard, keyless keyboard, foot mouse, and ergonomic keyboards
like the Kinesis Advantage and Ergodox keyboards with different key switches.
While these technologies can significantly improve comfort and reduce strain,
an optimized key layout can only help when typing on ergonomic or standard keyboards.
I have used different key layouts (Qwerty, Dvorak, Colemak, etc.) for communications and for writing and programming projects, and have primarily relied on Colemak for the last 10 years. I find that most to all of these key layouts:
- Demand too much strain on tendons
- strenuous lateral extension of the index and little fingers
- Ignore the ergonomics of the human hand
- different finger strengths
- different finger lengths
- natural roundedness of the hand
- home row easier than upper row for shorter fingers
- home row easier than lower row for longer fingers
- ease of little-to-index finger rolls vs. reverse
- Over-emphasize alternation between hands and under-emphasize same-hand, different-finger transitions
- same-row, adjacent finger transitions are easy and comfortable
- little-to-index finger rolls are easy and comfortable
While I used ergonomic principles outlined below and the accompanying code to help generate the Engram layout, I also relied on massive bigram frequency data for the English language. if one were to follow the procedure below and use a different set of bigram frequencies for another language or text corpus, they could create a variant of the Engram layout, say “Engram-French”, better suited to the French language.
Why “Engram”?
The name is a pun, referring both to “n-gram”, letter permutations and their frequencies that are used to compute the Engram layout, and “engram”, or memory trace, the postulated change in neural tissue to account for the persistence of memory, as a nod to my attempt to make this layout easy to remember.
How does Engram compare with other key layouts?
Despite the fact that the Engram layout was designed to reduce strain and discomfort, not specifically to increase speed or reduce finger travel from the home row, it scores higher than all other key layouts (Colemak, Dvorak, QWERTY, etc.) for some large, representative, publicly available data (all text sources are listed below and available on GitHub). Below are tables of different prominent key layouts scored using the Engram Scoring Model (detailed below), and generated by the online Keyboard Layout Analyzer (KLA):
The optimal layout score is based on a weighted calculation that factors in the distance your fingers moved (33%), how often you use particular fingers (33%), and how often you switch fingers and hands while typing (34%).
How does Engram compare with other key layouts?
Despite the fact that the Engram layout was designed to reduce strain and discomfort, not specifically to increase speed or reduce finger travel from the home row, it scores higher than all other key layouts (Colemak, Dvorak, QWERTY, etc.) for some large, representative, publicly available data (all text sources are listed below and available on GitHub). Below are tables of different prominent key layouts scored using the Engram Scoring Model (detailed below).
Engram Scoring Model scores (x100) for layouts, based on publicly available text data
Layout | Google bigrams | Alice | Romeo | Gita | Memento | 100K tweets | 20K tweets | MASC tweets | MASC spoken | COCA blogs | iweb | Monkey | Coder | Code |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Engram | 75.03 | 74.47 | 74.63 | 74.46 | 75.13 | 75.90 | 75.54 | 75.34 | 74.36 | 75.03 | 75.23 | 75.06 | 75.37 | 75.33 |
Halmak | 74.98 | 74.42 | 74.58 | 74.40 | 75.08 | 75.82 | 75.49 | 75.29 | 74.32 | 74.97 | 75.17 | 75.00 | 75.31 | 75.26 |
Hieamtsrn | 75.01 | 74.45 | 74.61 | 74.43 | 75.11 | 75.86 | 75.50 | 75.32 | 74.34 | 75.00 | 75.20 | 75.04 | 75.34 | 75.23 |
Colemak Mod-DH | 74.95 | 74.40 | 74.55 | 74.39 | 75.05 | 75.77 | 75.43 | 75.25 | 74.29 | 74.94 | 75.14 | 74.98 | 75.28 | 75.19 |
Norman | 74.91 | 74.38 | 74.52 | 74.37 | 75.02 | 75.72 | 75.39 | 75.21 | 74.27 | 74.90 | 75.09 | 74.94 | 75.22 | 75.17 |
Workman | 74.94 | 74.39 | 74.55 | 74.38 | 75.05 | 75.77 | 75.44 | 75.25 | 74.29 | 74.93 | 75.12 | 74.97 | 75.27 | 75.22 |
MTGap 2.0 | 74.94 | 74.39 | 74.54 | 74.38 | 75.05 | 75.75 | 75.42 | 75.23 | 74.28 | 74.92 | 75.12 | 74.96 | 75.27 | 75.18 |
QGMLWB | 74.91 | 74.37 | 74.52 | 74.35 | 75.02 | 75.73 | 75.40 | 75.21 | 74.27 | 74.90 | 75.09 | 74.94 | 75.23 | 75.13 |
Colemak | 74.93 | 74.37 | 74.53 | 74.37 | 75.02 | 75.75 | 75.40 | 75.24 | 74.27 | 74.92 | 75.12 | 74.96 | 75.26 | 75.20 |
Asset | 74.90 | 74.36 | 74.51 | 74.35 | 75.01 | 75.71 | 75.39 | 75.21 | 74.25 | 74.89 | 75.08 | 74.92 | 75.23 | 75.18 |
Capewell-Dvorak | 74.90 | 74.37 | 74.52 | 74.35 | 75.01 | 75.73 | 75.39 | 75.21 | 74.27 | 74.90 | 75.08 | 74.93 | 75.22 | 75.15 |
Klausler | 74.92 | 74.38 | 74.54 | 74.37 | 75.03 | 75.75 | 75.42 | 75.22 | 74.28 | 74.91 | 75.10 | 74.95 | 75.23 | 75.17 |
Dvorak | 74.90 | 74.37 | 74.53 | 74.35 | 75.01 | 75.73 | 75.40 | 75.20 | 74.27 | 74.90 | 75.09 | 74.93 | 75.20 | 75.17 |
QWERTY | 74.76 | 74.27 | 74.41 | 74.25 | 74.88 | 75.55 | 75.25 | 75.06 | 74.17 | 74.76 | 74.94 | 74.79 | 75.06 | 75.01 |
Keyboard Layout Analyzer (KLA) scores for the same text sources:
The optimal layout score is based on a weighted calculation that factors in the distance your fingers moved (33%), how often you use particular fingers (33%), and how often you switch fingers and hands while typing (34%).
Layout | Alice in Wonderland | Romeo Juliet | Bhagavad Gita | Memento screenplay | 100K tweets | 20K tweets | MASC tweets | MASC spoken | COCA blogs | iweb | Monkey | Coder | Software languages |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Engram | 70.13 | 61.48 | 61.93 | 57.16 | 64.64 | 58.58 | 60.24 | 64.39 | 69.66 | 68.25 | 67.66 | 46.81 | 47.69 |
Halmak | 66.25 | 57.02 | 57.45 | 55.03 | 60.86 | 55.53 | 57.13 | 62.32 | 67.29 | 65.50 | 64.75 | 45.68 | 47.60 |
Hieamtsrn | 69.43 | 60.94 | 60.87 | 56.75 | 64.40 | 58.95 | 60.47 | 64.33 | 69.93 | 69.15 | 68.30 | 46.01 | 46.48 |
Colemak Mod-DH | 65.74 | 56.05 | 57.52 | 54.91 | 60.75 | 54.94 | 57.15 | 61.29 | 67.12 | 65.98 | 64.85 | 47.35 | 48.50 |
Norman | 62.76 | 53.21 | 53.44 | 52.33 | 57.43 | 53.24 | 53.90 | 59.97 | 62.80 | 60.90 | 59.82 | 43.76 | 46.01 |
Workman | 64.78 | 56.67 | 56.97 | 54.29 | 59.98 | 55.81 | 56.25 | 61.34 | 65.27 | 63.76 | 62.90 | 45.33 | 47.76 |
MTGAP 2.0 | 66.13 | 53.98 | 56.57 | 53.78 | 59.87 | 55.30 | 55.81 | 60.32 | 65.68 | 63.81 | 62.74 | 45.38 | 44.34 |
QGMLWB | 65.45 | 55.67 | 55.57 | 54.07 | 60.51 | 56.05 | 56.90 | 62.23 | 66.26 | 64.76 | 63.91 | 46.38 | 45.72 |
Colemak | 65.83 | 56.12 | 57.63 | 54.94 | 60.67 | 54.97 | 57.04 | 61.36 | 67.14 | 66.01 | 64.91 | 47.30 | 48.65 |
Asset | 64.60 | 54.63 | 56.09 | 53.84 | 58.66 | 54.72 | 55.35 | 60.81 | 64.71 | 63.17 | 62.44 | 45.54 | 47.52 |
Capewell-Dvorak | 66.94 | 58.31 | 57.39 | 55.66 | 62.14 | 56.85 | 57.99 | 62.83 | 66.95 | 65.23 | 64.70 | 45.30 | 45.62 |
Klausler | 68.24 | 59.91 | 59.71 | 55.75 | 62.57 | 56.45 | 58.34 | 64.04 | 68.34 | 66.89 | 66.31 | 46.83 | 45.66 |
Dvorak | 65.86 | 58.18 | 57.29 | 55.09 | 60.93 | 55.56 | 56.59 | 62.75 | 66.64 | 64.87 | 64.26 | 45.46 | 45.55 |
QWERTY | 53.06 | 43.74 | 44.92 | 44.25 | 48.28 | 44.99 | 44.59 | 51.79 | 52.31 | 50.19 | 49.18 | 38.46 | 39.89 |
Layout | Year | Website |
---|---|---|
Engram | 2021 | https://engram.dev |
Halmak 2.2 | 2016 | https://github.com/MadRabbit/halmak |
Hieamtsrn | 2014 | https://mathematicalmulticore.wordpress.com/the-keyboard-layout-project/#comment-4976 |
Colemak Mod-DH | 2014 | https://colemakmods.github.io/mod-dh/ |
Norman | 2013 | https://normanlayout.info/ |
Workman | 2010 | https://workmanlayout.org/ |
MTGAP 2.0 | 2010 | https://mathematicalmulticore.wordpress.com/2010/06/21/mtgaps-keyboard-layout-2-0/ |
QGMLWB | 2009 | http://mkweb.bcgsc.ca/carpalx/?full_optimization |
Colemak | 2006 | https://colemak.com/ |
Asset | 2006 | http://millikeys.sourceforge.net/asset/ |
Capewell-Dvorak | 2004 | http://michaelcapewell.com/projects/keyboard/layout_capewell-dvorak.htm |
Klausler | 2002 | https://web.archive.org/web/20031001163722/http://klausler.com/evolved.html |
Dvorak | 1936 | https://en.wikipedia.org/wiki/Dvorak_keyboard_layout |
QWERTY | 1873 | https://en.wikipedia.org/wiki/QWERTY |
Text source | Information |
---|---|
“Alice in Wonderland” | Alice in Wonderland (Ch.1) |
“Romeo and Juliet” | Romeo and Juliet |
“Bhagavad Gita” | Bhagavad Gita |
“Memento screenplay” | Memento screenplay |
“100K tweets” | 100,000 tweets from: Sentiment140 dataset training data |
“20K tweets” | 20,000 tweets from Gender Classifier Data |
“MASC tweets” | MASC tweets (cleaned of html markup) |
“MASC spoken” | MASC spoken transcripts (phone and face-to-face: 25,783 words) |
“COCA blogs” | Corpus of Contemporary American English blog samples |
“Google website” | Google homepage (accessed 10/20/2020) |
“Software languages” | “Tower of Hanoi” (programming languages A-Z from Rosetta Code) |
“Monkey text” | Ian Douglas’s English-generated monkey0-7.txt corpus |
“Coder text” | Ian Douglas’s software-generated coder0-7.txt corpus |
“iweb cleaned corpus” | First 150,000 lines of Shai Coleman’s iweb-corpus-samples-cleaned.txt |
Reference for Monkey and Coder texts: Douglas, Ian. (2021, March 28). Keyboard Layout Analysis: Creating the Corpus, Bigram Chains, and Shakespeare’s Monkeys (Version 1.0.0). Zenodo. http://doi.org/10.5281/zenodo.4642460
Guiding criteria
1. Assign letters to keys that don't require lateral finger movements.
2. Promote alternating between hands over uncomfortable same-hand transitions.
3. Assign the most common letters to the most comfortable keys.
4. Arrange letters so that more frequent bigrams are easier to type.
5. Promote little-to-index-finger roll-ins over index-to-little-finger roll-outs.
6. Balance finger loads according to their relative strength.
7. Avoid stretching shorter fingers up and longer fingers down.
8. Avoid using the same finger.
9. Avoid skipping over the home row.
10. Assign the most common punctuation to keys in the middle of the keyboard.
11. Assign easy-to-remember symbols to the Shift-number keys.
Factors used to compute the Engram layout
-
N-gram letter frequencies
Peter Norvig’s analysis of data from Google’s book scanning project
-
Flow factors (transitions between ordered key pairs)
These factors are influenced by Dvorak’s 11 criteria (1936).
Summary of steps and results
Step 1: Define the shape of the key layout to minimize lateral finger movements
We will assign 24 letters to 8 columns of keys separated by two middle columns reserved for punctuation. These 8 columns require no lateral finger movements when touch typing, since there is one column per finger. The most comfortable keys include the left and right home rows (keys 5-8 and 17-20), the top-center keys (2,3 and 14,15) that allow the longer middle and ring fingers to uncurl upwards, as well as the bottom corner keys (9,12 and 21,24) that allow the shorter fingers to curl downwards. We will assign the two least frequent letters, Z and Q, to the two hardest-to-reach keys lying outside the 24-key columns in the upper right:
Left: Right:
1 2 3 4 13 14 15 16 Z/Q
5 6 7 8 17 18 19 20 Q/Z
9 10 11 12 21 22 23 24
Step 2: Arrange the most frequent letters based on comfort, their frequencies, and bigram frequencies
We will assign letters to keys by choosing the arrangement with the highest score according to our scoring model. However, there are over four hundred septillion, or four hundred trillion trillion (26! = 403,291,461,126,605,635,584,000,000, or 4.032914611 E+26) possible arrangements of 26 letters (24! = 6.204484017 E+23), so we will arrange the letters in stages, based on ergonomics principles.
In prior experiments using the methods below, all vowels consistently automatically clustered together. Below, we will arrange vowels on one side and the most frequent consonants to the other side to encourage balance and alternation across hands. Since aside from the letters Z and Q there is symmetry across left and right sides, we will decide later which side the vowels and which side the most frequent consonants should go.
Vowels
E, T, A, O, I, N, S, R, H, L, D, C, U, M, F, P, G, W, Y, B, V, K, X, J, Q, Z
The high-frequency bigrams that contain these vowels are listed below in bold, with more than 10 billion instances:
OU, IO, EA, IE, AI, IA, EI, UE, UA, AU, UI, OI, EO, OA, OE
OU 24531132241
IO 23542263265
EA 19403941063
IE 10845731320
AI 8922759715
IA 8072199471
EI 5169898489
UE 4158448570
UA 3844138094
AU 3356322923
UI 2852182384
OI 2474275212
EO 2044268477
OA 1620913259
OE 1089254517
We will assign the vowels (E,A,O,I,U) to the most comfortable keys (keys 5-8, 2-3, and also 4 or 12 for the letter U) on one side, with the letter E, the most frequent in the English language, assigned to either of the strongest keys (7 and 8, the middle and index fingers on the left home row). We will arrange the vowels such that any top-frequency bigram (more than 1 billion instances in Peter Norvig’s analysis of Google data) reads from left to right (ex: EA, not AE) for ease of typing (roll-in from little to index finger vs. roll-out from index to little finger). These constraints lead to 8 comfortable and efficient layouts:
- - U - - - - - - O U - - O - - - - O - - - O -
I O E A I O E A I - E A I - E A I - E A - I E A
- - - - - - - U - - - - - - - U - - - U - - - U
- O - U - - O U
I - E A - I E A
- - - - - - - -
Consonants
To maximize the number of bigrams we can comfortably type on the home row for the other hand, we consider all consonants (T, N, S, R, H, D, C) within the highest frequency bigrams (at least 10 billion instances in Norvig’s analysis: TH, ND, ST, NT, CH, NS, CT, TR, RS, NC, RT, where below 10 billion instances these bigrams start to occur in reverse, such as RT and TS):
TH 100,272,945,963 3.56%
ND 38,129,777,631 1.35%
ST 29,704,461,829 1.05%
NT 29,359,771,944 1.04%
CH 16,854,985,236 0.60%
NS 14,350,320,288
CT 12,997,849,406
TR 12,006,693,396
RS 11,180,732,354
NC 11,722,631,112
RT 10,198,055,461
Of all possible 4-consonant sequences containing four of the seven consonants, we select those sequences that contain the highest possible number (5 or 6) of high-frequency bigrams (at least one billion instances in Norvig’s analysis: TH, ND, ST, NT, CH, NS, CT, TR, RS, NC, RT, SH, LD, RD, LS, DS, LT, TL, RL, HR, NL, and SL). The resulting sequences and their bigrams are:
RNST: RN, RS, RT, NS, NT, ST
NRST: NS, NT, RS, RT, ST
RSNT: RS, RN, RT, ST, NT
RSTH: RS, RT, ST, SH, TH
NSTH: NS, NT, ST, SH, TH
NCTH: NC, NT, CT, CH, TH
The resulting eight arrangements of five vowels on the left and six sequences of four consonants on the right gives us 48 initial layouts, each with 15 unassigned keys. Below, the three rows on the left and right side of the keyboard are represented as a linear string of letters, with unassigned keys denoted by “-”. All sequences on the right side will be reversed so that they read from right to left for ease of typing (right-hand roll-in from little to index finger vs. roll-out from index to little finger):
Hand 1 Hand 2
--U- IOEA ---- ---- TSNR ----
--U- IOEA ---- ---- TSRN ----
--U- IOEA ---- ---- TNSR ----
--U- IOEA ---- ---- HTSR ----
--U- IOEA ---- ---- HTSN ----
--U- IOEA ---- ---- HTCN ----
---- IOEA ---U ---- TSNR ----
---- IOEA ---U ---- TSRN ----
---- IOEA ---U ---- TNSR ----
---- IOEA ---U ---- HTSR ----
---- IOEA ---U ---- HTSN ----
---- IOEA ---U ---- HTCN ----
-OU- I-EA ---- ---- TSNR ----
-OU- I-EA ---- ---- TSRN ----
-OU- I-EA ---- ---- TNSR ----
-OU- I-EA ---- ---- HTSR ----
-OU- I-EA ---- ---- HTSN ----
-OU- I-EA ---- ---- HTCN ----
-O-- I-EA ---U ---- TSNR ----
-O-- I-EA ---U ---- TSRN ----
-O-- I-EA ---U ---- TNSR ----
-O-- I-EA ---U ---- HTSR ----
-O-- I-EA ---U ---- HTSN ----
-O-- I-EA ---U ---- HTCN ----
--O- I-EA ---U ---- TSNR ----
--O- I-EA ---U ---- TSRN ----
--O- I-EA ---U ---- TNSR ----
--O- I-EA ---U ---- HTSR ----
--O- I-EA ---U ---- HTSN ----
--O- I-EA ---U ---- HTCN ----
--O- -IEA ---U ---- TSNR ----
--O- -IEA ---U ---- TSRN ----
--O- -IEA ---U ---- TNSR ----
--O- -IEA ---U ---- HTSR ----
--O- -IEA ---U ---- HTSN ----
--O- -IEA ---U ---- HTCN ----
-O-U I-EA ---- ---- TSNR ----
-O-U I-EA ---- ---- TSRN ----
-O-U I-EA ---- ---- TNSR ----
-O-U I-EA ---- ---- HTSR ----
-O-U I-EA ---- ---- HTSN ----
-O-U I-EA ---- ---- HTCN ----
--OU -IEA ---- ---- TSNR ----
--OU -IEA ---- ---- TSRN ----
--OU -IEA ---- ---- TNSR ----
--OU -IEA ---- ---- HTSR ----
--OU -IEA ---- ---- HTSN ----
--OU -IEA ---- ---- HTCN ----
Step 3: Optimize assignment of the remaining letters
We want to assign letters to the 15 unassigned keys in each of the above 36 layouts based on our scoring model. That would mean scoring all possible arrangements for each layout and choosing the arrangement with the highest score, but since there are over 1.3 trillion (15!) possible ways of arranging 15 letters, we will break up the assignment into two stages for the most frequent and least frequent remaining letters.
Engram Scoring Model
The optimization algorithm finds every permutation of a given set of letters, maps these letter permutations to a set of keys, and ranks these letter-key mappings according to a score reflecting ease of typing key pairs and frequency of letter pairs (bigrams). The score is the average of the scores for all possible bigrams in this arrangement. The score for each bigram is a product of the frequency of occurrence of that bigram, the frequency of each of the bigram’s characters, and flow, strength (and optional speed) factors for the key pair.
Flow factors to penalize strenuous key transitions
Direction:
- outward = 0.9
- outward roll of fingers from the index to little finger
Dexterity:
- side_above_3away = 0.9
- index and little finger type two keys, one or more rows apart
- side_above_2away = 0.9^2 = 0.81
- index finger types key a row or two above ring finger key, or
- little finger types key a row or two above middle finger key
- side_above_1away = 0.9^3 = 0.729
- index finger types key a row or two above middle finger key, or
- little finger types key a row or two above ring finger key
- middle_above_ring = 0.9
- middle finger types key a row or two above ring finger key
- ring_above_middle = 0.9^3 = 0.729
- ring finger types key a row or two above middle finger key
- lateral = 0.9
- lateral movement of (index or little) finger outside of 8 vertical columns
- always accompanied by same_finger parameter
Distance:
- skip_row_3away = 0.9
- index and little fingers type two keys that skip over home row
- (e.g., one on bottom row, the other on top row)
- skip_row_2away = 0.9^3 = 0.729
- little and middle or index and ring fingers type two keys that skip over home row
- skip_row_1away = 0.9^5 = 0.59049
- little and ring or middle and index fingers type two keys that skip over home row
Repetition:
- skip_row_0away = 0.9^4 = 0.6561
- same finger types two keys that skip over home row
- same_finger = 0.9^5 = 0.59049
- use same finger again for a different key
- cannot accompany outward, side_above, or adjacent_shorter_above
Strength: Accounted for by the strength matrix (minimum value for the little finger = 0.95)
Most frequent letters
For six of the eight vowel arrangements, we will compute scores for every possible arrangement of all but the least frequent 6 remaining letters (aside from Z and Q, in bold below), assigned to all but the least comfortable 6 keys. The least comfortable keys are assumed to be the corner keys accessed by the little fingers and the top corner keys accessed by the index fingers.
E, T, A, O, I, N, S, R, H, L, D, C, U, M, F, P, G, W, Y, B, V, K, X, J, Q, Z
Hand 1: Hand 2:
- 2 3 - - 14 15 -
x x x x x x x x
- 10 11 12 21 22 23 -
Since there are 9! = 362,880 combinations for 36 layouts, we need to score and evaluate 13,063,680 combinations.
For the remaining two vowel arrangements with U in the upper right corner, we will compute scores for every possible arrangement of 8 unassigned keys (covering all but 7 of the 8 corner keys).
Hand 1: Hand 2:
- x 3 x - 14 15 -
x x 7 x x x x x
- 10 11 - - 22 23 -
- O - U
I - E A
- - - -
Hand 1: Hand 2:
- 2 x x - 14 15 -
5 x x x x x x x
- 10 11 - - 22 23 -
- - O U
- I E A
- - - -
Since there are 8! = 40,320 possible combinations for 12 layouts combinations for each layout, we need to score and evaluate 13,063,680 more combinations.
To score each arrangement of letters, we construct a frequency matrix where we multiply a matrix containing the frequency of each ordered pair of letters (bigram) by our flow and strength matrices to compute a score.
Least frequent letters
Next we will compute scores for every possible arrangement of the least frequent 7 or 8 letters besides Z and Q, reassigning 2 previous letter assignments for 12 layouts without the letter U in any corner and incorporating 6 unused letters (among the eight in bold below), after substituting in the results above:
E, T, A, O, I, N, S, R, H, L, D, C, U, M, F, P, G, W, Y, B, V, K, X, J, Q, Z
Hand 1: Hand 2:
1 x x 4 13 x x 16
x x x x x x x x
9 x x 12 21 x x 24
Since there are 8! = 40,320 possible combinations for 12 layouts and 7! = 5,040 possible combinations for 36 layouts, we need to score and evaluate an additional 665,280 layouts.
Further optimize layouts by exchanging more letters
If we relax the above fixed initializations and permit further exchange of letters, then we can search for even higher-scoring layouts. As a final optimization step we exchange letters, 8 keys at a time (8! = 40,320) selected in 24 different ways, in each of the above 48 layouts, to score a total of 46,448,640 more combinations. We allow the following keys to exchange letters:
1. Center of the top and bottom rows on both sides
2. Top and bottom rows on the left side
3. Top and bottom rows on the right side
4. Bottom rows
5. Top rows
6. Top left and bottom right rows
7. Top right and bottom left rows
8. Left half of the top and bottom rows on both sides
9. Right half of the top and bottom rows on both sides
10. Left half of non-home rows on the left and right half of the same rows on the right
11. Right half of non-home rows on the left and left half of the same rows on the right
12. The eight corners
13. Repeat 1-12
After assigning letters Z and Q to upper right keys outside of the home blocks and testing left/right side swap of all letters, the top-scoring letter layout is:
B Y O U L D W V Z
C I E A H T S N Q
G X J K R M F P
Step 4: Stability Tests
We will run three stability tests on the winning layouts:
1. Compare score of the winning layout after rearranging random letters
2. Compare ranking of all final layouts based on interkey speed
3. Compare ranking of all final layouts after removing each scoring parameter
The first test is to see if rearranging random sets of eight of the 16 letters in non-home rows in every possible combination improves the score of the winning layout. We repeat this test 1,000 times. In the second test, we rescore all of the final layouts, replacing the factor matrix with either the flow matrix or the inter-key speed matrix to see if this affects their ranking. In the third test we remove each Engram scoring parameter one at a time and rescore all of the final layouts to see if this affects their ranking.
For test 1, the top-scoring layout remains unchanged, attesting to its stability. For test 2, no other layout consistently beats the top layout after these replacements. For test 3, the top-scored layout remains at the top, attesting to its robustness to parameter perturbations. These tests support the choice of the top-scoring layout as the winner.
Step 5. Arrange non-letter characters in easy-to-remember places
Now that we have all 26 letters accounted for, we turn our attention to non-letter characters, taking into account frequency of punctuation and ease of recall.
Frequency of punctuation marks
-
Statistical values of punctuation frequency in 20 English-speaking countries (Table 1):
Sun, Kun & Wang, Rong. (2018). Frequency Distributions of Punctuation Marks in English: Evidence from Large-scale Corpora. English Today. 10.1017/S0266078418000512.
https://www.researchgate.net/publication/328512136_Frequency_Distributions_of_Punctuation_Marks_in_English_Evidence_from_Large-scale_Corpora
“frequency of punctuation marks attested for twenty English-speaking countries and regions… The data were acquired through GloWbE.” “The corpus of GloWbE (2013) is a large English corpus collecting international English from the internet, containing about 1.9 billion words of text from twenty different countries. For further information on the corpora used, see https://corpus.byu.edu/.” -
Google N-grams and Twitter analysis:
“Punctuation Input on Touchscreen Keyboards: Analyzing Frequency of Use and Costs”
S Malik, L Findlater - College Park: The Human-Computer Interaction Lab. 2013
https://www.cs.umd.edu/sites/default/files/scholarly_papers/Malik.pdf
“the Twitter corpora included substantially higher punctuation use than the Google corpus,
comprising 7.5% of characters in the mobile tweets and 7.6% in desktop versus only 4.4%…
With the Google corpus,only 6 punctuation symbols (. -’ ( ) “) appeared more frequently than [q]” -
“Frequencies for English Punctuation Marks” by Vivian Cook
http://www.viviancook.uk/Punctuation/PunctFigs.htm
“Based on a writing system corpus some 459 thousand words long.
This includes three novels of different types (276 thousand words),
selections of articles from two newspapers (55 thousand),
one bureaucratic report (94 thousand), and assorted academic papers
on language topics (34 thousand). More information is in
Cook, V.J. (2013) ‘Standard punctuation and the punctuation of the street’
in M. Pawlak and L. Aronin (eds.), Essential Topics in Applied Linguistics and Multilingualism,
Springer International Publishing Switzerland (2013), 267-290” -
“A Statistical Study of Current Usage in Punctuation”:
Ruhlen, H., & Pressey, S. (1924). A Statistical Study of Current Usage in Punctuation. The English Journal, 13(5), 325-331. doi:10.2307/802253 -
“Computer Languages Character Frequency” by Xah Lee.
Date: 2013-05-23. Last updated: 2020-06-29.
http://xahlee.info/comp/computer_language_char_distribution.html
NOTE: biased toward C (19.8%) and Py (18.5%), which have high use of “_”.
Frequency:
Sun: Malik: Ruhlen: Cook: Xah:
/1M N-gram % /10,000 /1,000 All% JS% Py%
. 42840.02 1.151 535 65.3 6.6 9.4 10.3
, 44189.96 556 61.6 5.8 8.9 7.5
" 2.284 44 26.7 3.9 1.6 6.2
' 2980.35 0.200 40 24.3 4.4 4.0 8.6
- 9529.78 0.217 21 15.3 4.1 1.9 3.0
() 4500.81 0.140 7 7.4 9.8 8.1
; 1355.22 0.096 22 3.2 3.8 8.6
z 0.09 - -
: 3221.82 0.087 11 3.4 3.5 2.8 4.7
? 4154.78 0.032 14 5.6 0.3
/ 0.019 4.0 4.9 1.1
! 2057.22 0.013 3 3.3 0.4
_ 0.001 11.0 2.9 10.5
Add punctuation keys and number keys
We will assign the most frequent punctuation according to Sun, et al (2018) to the six keys in the middle two columns: . , “ ‘ - ? ; : () ! _
B Y O U ' " L D W V Z
C I E A , . H T S N Q
G X J K - ? R M F P
We will use the Shift key to group similar punctuation marks (separating and joining marks in the left middle column and closing marks in the right middle column):
B Y O U '( ") L D W V Z #$ @`
C I E A ,; .: H T S N Q
G X J K -_ ?! R M F P
Separating marks (left): The comma separates text in lists; the semicolon can be used in place of the comma to separate items in a list (especially if these items contain commas); open parenthesis sets off an explanatory word, phrase, or sentence.
Joining marks (left): The apostrophe joins words as contractions; the hyphen joins words as compounds; the underscore joins words in cases where whitespace characters are not permitted (such as in variables or file names).
Closing marks (right): A sentence usually ends with a period, question mark, or exclamation mark. The colon ends one statement but precedes the following: an explanation, quotation, list, etc. Double quotes and close parenthesis closes a word, clause, or sentence separated by an open parenthesis.
Number keys: The numbers are flanked to the left and right by [square brackets], and {curly brackets} accessed by the Shift key. Each of the numbers is paired with a mathematical or logic symbol accessed by the Shift key:
{ | = ~ + < > ^ & % * } \
[ 1 2 3 4 5 6 7 8 9 0 ] /
1: | (vertical bar or "pipe" represents the logical OR operator: 1 stroke, looks like the number one)
2: = (equal: 2 strokes, like the Chinese character for "2")
3: ~ (tilde: "almost equal", often written with 3 strokes, like the Chinese character for "3")
4: + (plus: has four quadrants; resembles "4")
5 & 6: < > ("less/greater than"; these angle brackets are directly above the other bracket keys)
7: ^ (caret for logical XOR operator as well as exponentiation; resembles "7")
8: & (ampersand: logical AND operator; resembles "8")
9: % (percent: related to division; resembles "9")
0: * (asterisk: for multiplication; resembles "0")
The three remaining keys in many common keyboards (flanking the upper right hand corner Backspace key) are displaced in special keyboards, such as the Kinesis Advantage and Ergodox. For the top right key, we will assign the forward slash and backslash: / \. For the remaining two keys, we will assign two symbols that in modern usage have significance in social media: the hash/pound sign and the “at sign”. The hash or hashtag identifies digital content on a specific topic (the Shift key accesses the dollar sign). The “at sign” identifies a location or affiliation (such as in email addresses) and acts as a “handle” to identify users in popular social media platforms and online forums.
The resulting Engram layout:
[{ 1| 2= 3~ 4+ 5< 6> 7^ 8& 9% 0* ]} /\
bB yY oO uU '( ") lL dD wW vV zZ #$ @`
cC iI eE aA ,; .: hH tT sS nN qQ
gG xX jJ kK -_ ?! rR mM fF pP