Character counts of “interesting” characters in corpus.xml (which is basically all of distribution/) as of 2020-07-25 at 09:12. The “interesting” characters are the ones left over after deleting normal whitespace characters (U+09, U+0A, U+0D, U+20) and those characters that are on your keyboard.
There was a total of 54,703,339 characters. Thus almost 2.2% of all characters are LONG S, ~⅓ of 1% are SOFT HYPHEN, ~0.11% are EM DASH, and each of the others acocunts for < 1‰. (Note that symbol is PER MILLE (U+2030), not percent.)
Click on a column header to sort by that column.
| count | codepoint | character | character name |
|---|---|---|---|
| 1189205 | U+017F | ſ | LATIN SMALL LETTER LONG S |
| 182959 | U+00AD | | SOFT HYPHEN |
| 59894 | U+2014 | — | EM DASH |
| 18563 | U+2015 | ― | HORIZONTAL BAR or QUOTATION DASH |
| 2130 | U+00E6 | æ | LATIN SMALL LETTER AE or LATIN SMALL LETTER A E |
| 1567 | U+00E9 | é | LATIN SMALL LETTER E WITH ACUTE or LATIN SMALL LETTER E ACUTE |
| 1469 | U+0113 | ē | LATIN SMALL LETTER E WITH MACRON or LATIN SMALL LETTER E MACRON |
| 1356 | U+014D | ō | LATIN SMALL LETTER O WITH MACRON or LATIN SMALL LETTER O MACRON |
| 1119 | U+201C | “ | LEFT DOUBLE QUOTATION MARK or DOUBLE TURNED COMMA QUOTATION MARK |
| 872 | U+2423 | ␣ | OPEN BOX |
| 715 | U+2019 | ’ | RIGHT SINGLE QUOTATION MARK or SINGLE COMMA QUOTATION MARK |
| 695 | U+0101 | ā | LATIN SMALL LETTER A WITH MACRON or LATIN SMALL LETTER A MACRON |
| 666 | U+0153 | œ | LATIN SMALL LIGATURE OE or LATIN SMALL LETTER O E |
| 600 | U+2018 | ‘ | LEFT SINGLE QUOTATION MARK or SINGLE TURNED COMMA QUOTATION MARK |
| 430 | U+00C6 | Æ | LATIN CAPITAL LETTER AE or LATIN CAPITAL LETTER A E |
| 360 | U+016B | ū | LATIN SMALL LETTER U WITH MACRON or LATIN SMALL LETTER U MACRON |
| 295 | U+201D | ” | RIGHT DOUBLE QUOTATION MARK or DOUBLE COMMA QUOTATION MARK |
| 236 | U+00FB | û | LATIN SMALL LETTER U WITH CIRCUMFLEX or LATIN SMALL LETTER U CIRCUMFLEX |
| 215 | U+00F4 | ô | LATIN SMALL LETTER O WITH CIRCUMFLEX or LATIN SMALL LETTER O CIRCUMFLEX |
| 205 | U+00EB | ë | LATIN SMALL LETTER E WITH DIAERESIS or LATIN SMALL LETTER E DIAERESIS |
| 201 | U+2026 | … | HORIZONTAL ELLIPSIS |
| 195 | U+00E2 | â | LATIN SMALL LETTER A WITH CIRCUMFLEX or LATIN SMALL LETTER A CIRCUMFLEX |
| 185 | U+2013 | – | EN DASH |
| 154 | U+03B1 | α | GREEK SMALL LETTER ALPHA |
| 145 | U+00E0 | à | LATIN SMALL LETTER A WITH GRAVE or LATIN SMALL LETTER A GRAVE |
| 144 | U+00A7 | § | SECTION SIGN |
| 130 | U+03BD | ν | GREEK SMALL LETTER NU |
| 128 | U+00E8 | è | LATIN SMALL LETTER E WITH GRAVE or LATIN SMALL LETTER E GRAVE |
| 126 | U+03B9 | ι | GREEK SMALL LETTER IOTA |
| 119 | U+03B5 | ε | GREEK SMALL LETTER EPSILON |
| 111 | U+03BF | ο | GREEK SMALL LETTER OMICRON |
| 104 | U+00EA | ê | LATIN SMALL LETTER E WITH CIRCUMFLEX or LATIN SMALL LETTER E CIRCUMFLEX |
| 94 | U+00DF | ß | LATIN SMALL LETTER SHARP S |
| 93 | U+00F3 | ó | LATIN SMALL LETTER O WITH ACUTE or LATIN SMALL LETTER O ACUTE |
| 86 | U+00E1 | á | LATIN SMALL LETTER A WITH ACUTE or LATIN SMALL LETTER A ACUTE |
| 80 | U+00B7 | · | MIDDLE DOT |
| 75 | U+03C4 | τ | GREEK SMALL LETTER TAU |
| 74 | U+00B6 | ¶ | PILCROW SIGN or PARAGRAPH SIGN |
| 71 | U+0304 | ̄ | COMBINING MACRON or NON-SPACING MACRON |
| 63 | U+012B | ī | LATIN SMALL LETTER I WITH MACRON or LATIN SMALL LETTER I MACRON |
| 58 | U+03C3 | σ | GREEK SMALL LETTER SIGMA |
| 57 | U+03C1 | ρ | GREEK SMALL LETTER RHO |
| 51 | U+00EF | ï | LATIN SMALL LETTER I WITH DIAERESIS or LATIN SMALL LETTER I DIAERESIS |
| 51 | U+03BA | κ | GREEK SMALL LETTER KAPPA |
| 50 | U+03B7 | η | GREEK SMALL LETTER ETA |
| 45 | U+03C0 | π | GREEK SMALL LETTER PI |
| 45 | U+03C2 | ς | GREEK SMALL LETTER FINAL SIGMA |
| 44 | U+05D9 | י | HEBREW LETTER YOD |
| 44 | U+03BC | μ | GREEK SMALL LETTER MU |
| 42 | U+0313 | ̓ | COMBINING COMMA ABOVE or NON-SPACING COMMA ABOVE |
| 42 | U+03BB | λ | GREEK SMALL LETTER LAMDA or GREEK SMALL LETTER LAMBDA |
| 41 | U+0301 | ́ | COMBINING ACUTE ACCENT or NON-SPACING ACUTE |
| 41 | U+03C5 | υ | GREEK SMALL LETTER UPSILON |
| 39 | U+00A3 | £ | POUND SIGN |
| 39 | U+03B4 | δ | GREEK SMALL LETTER DELTA |
| 39 | U+00FC | ü | LATIN SMALL LETTER U WITH DIAERESIS or LATIN SMALL LETTER U DIAERESIS |
| 38 | U+00E7 | ç | LATIN SMALL LETTER C WITH CEDILLA or LATIN SMALL LETTER C CEDILLA |
| 37 | U+03C9 | ω | GREEK SMALL LETTER OMEGA |
| 31 | U+00F9 | ù | LATIN SMALL LETTER U WITH GRAVE or LATIN SMALL LETTER U GRAVE |
| 29 | U+05E8 | ר | HEBREW LETTER RESH |
| 28 | U+0300 | ̀ | COMBINING GRAVE ACCENT or NON-SPACING GRAVE |
| 28 | U+05DE | מ | HEBREW LETTER MEM |
| 27 | U+05DC | ל | HEBREW LETTER LAMED |
| 26 | U+05D5 | ו | HEBREW LETTER VAV |
| 25 | U+03B8 | θ | GREEK SMALL LETTER THETA |
| 24 | U+03B3 | γ | GREEK SMALL LETTER GAMMA |
| 23 | U+05D4 | ה | HEBREW LETTER HE |
| 23 | U+005E | ^ | CIRCUMFLEX ACCENT or SPACING CIRCUMFLEX |
| 22 | U+03C7 | χ | GREEK SMALL LETTER CHI |
| 22 | U+05D1 | ב | HEBREW LETTER BET |
| 21 | U+05E0 | נ | HEBREW LETTER NUN |
| 21 | U+05D3 | ד | HEBREW LETTER DALET |
| 20 | U+05E2 | ע | HEBREW LETTER AYIN |
| 20 | U+05E9 | ש | HEBREW LETTER SHIN |
| 20 | U+00B0 | ° | DEGREE SIGN |
| 20 | U+261E | ☞ | WHITE RIGHT POINTING INDEX |
| 19 | U+2020 | † | DAGGER |
| 19 | U+05EA | ת | HEBREW LETTER TAV |
| 19 | U+00EE | î | LATIN SMALL LETTER I WITH CIRCUMFLEX or LATIN SMALL LETTER I CIRCUMFLEX |
| 19 | U+2011 | ‑ | NON-BREAKING HYPHEN |
| 18 | U+05D2 | ג | HEBREW LETTER GIMEL |
| 17 | U+05E4 | פ | HEBREW LETTER PE |
| 16 | U+0152 | Œ | LATIN CAPITAL LIGATURE OE or LATIN CAPITAL LETTER O E |
| 16 | U+0314 | ̔ | COMBINING REVERSED COMMA ABOVE or NON-SPACING REVERSED COMMA ABOVE |
| 15 | U+00F2 | ò | LATIN SMALL LETTER O WITH GRAVE or LATIN SMALL LETTER O GRAVE |
| 15 | U+05D7 | ח | HEBREW LETTER HET |
| 15 | U+05DD | ם | HEBREW LETTER FINAL MEM |
| 14 | U+03C6 | φ | GREEK SMALL LETTER PHI |
| 14 | U+05D0 | א | HEBREW LETTER ALEF |
| 13 | U+00ED | í | LATIN SMALL LETTER I WITH ACUTE or LATIN SMALL LETTER I ACUTE |
| 12 | U+0391 | Α | GREEK CAPITAL LETTER ALPHA |
| 10 | U+00F6 | ö | LATIN SMALL LETTER O WITH DIAERESIS or LATIN SMALL LETTER O DIAERESIS |
| 10 | U+261C | ☜ | WHITE LEFT POINTING INDEX |
| 9 | U+05E1 | ס | HEBREW LETTER SAMEKH |
| 9 | U+2022 | • | BULLET |
| 9 | U+0302 | ̂ | COMBINING CIRCUMFLEX ACCENT or NON-SPACING CIRCUMFLEX |
| 9 | U+05E7 | ק | HEBREW LETTER QOF |
| 9 | U+2010 | ‐ | HYPHEN |
| 9 | U+03B2 | β | GREEK SMALL LETTER BETA |
| 9 | U+00FA | ú | LATIN SMALL LETTER U WITH ACUTE or LATIN SMALL LETTER U ACUTE |
| 9 | U+00BD | ½ | VULGAR FRACTION ONE HALF or FRACTION ONE HALF |
| 9 | U+03BE | ξ | GREEK SMALL LETTER XI |
| 9 | U+05DF | ן | HEBREW LETTER FINAL NUN |
| 8 | U+05D8 | ט | HEBREW LETTER TET |
| 8 | U+05DB | כ | HEBREW LETTER KAF |
| 7 | U+2021 | ‡ | DOUBLE DAGGER |
| 7 | U+2041 | ⁁ | CARET INSERTION POINT |
| 7 | U+00E3 | ã | LATIN SMALL LETTER A WITH TILDE or LATIN SMALL LETTER A TILDE |
| 7 | U+00E4 | ä | LATIN SMALL LETTER A WITH DIAERESIS or LATIN SMALL LETTER A DIAERESIS |
| 7 | U+05E6 | צ | HEBREW LETTER TSADI |
| 6 | U+03AF | ί | GREEK SMALL LETTER IOTA WITH TONOS or GREEK SMALL LETTER IOTA TONOS |
| 6 | U+00F1 | ñ | LATIN SMALL LETTER N WITH TILDE or LATIN SMALL LETTER N TILDE |
| 6 | U+0399 | Ι | GREEK CAPITAL LETTER IOTA |
| 5 | U+0060 | ` | GRAVE ACCENT or SPACING GRAVE |
| 5 | U+03A3 | Σ | GREEK CAPITAL LETTER SIGMA |
| 5 | U+03AD | έ | GREEK SMALL LETTER EPSILON WITH TONOS or GREEK SMALL LETTER EPSILON TONOS |
| 5 | U+03AE | ή | GREEK SMALL LETTER ETA WITH TONOS or GREEK SMALL LETTER ETA TONOS |
| 5 | U+0395 | Ε | GREEK CAPITAL LETTER EPSILON |
| 5 | U+039D | Ν | GREEK CAPITAL LETTER NU |
| 4 | U+2663 | ♣ | BLACK CLUB SUIT |
| 4 | U+03CC | ό | GREEK SMALL LETTER OMICRON WITH TONOS or GREEK SMALL LETTER OMICRON TONOS |
| 4 | U+00F5 | õ | LATIN SMALL LETTER O WITH TILDE or LATIN SMALL LETTER O TILDE |
| 3 | U+2042 | ⁂ | ASTERISM |
| 3 | U+2003 | EM SPACE | |
| 3 | U+0263 | ɣ | LATIN SMALL LETTER GAMMA |
| 3 | U+03A4 | Τ | GREEK CAPITAL LETTER TAU |
| 3 | U+03A9 | Ω | GREEK CAPITAL LETTER OMEGA |
| 3 | U+03B6 | ζ | GREEK SMALL LETTER ZETA |
| 3 | U+0398 | Θ | GREEK CAPITAL LETTER THETA |
| 3 | U+05DA | ך | HEBREW LETTER FINAL KAF |
| 3 | U+00FD | ý | LATIN SMALL LETTER Y WITH ACUTE or LATIN SMALL LETTER Y ACUTE |
| 3 | U+1EBD | ẽ | LATIN SMALL LETTER E WITH TILDE |
| 3 | U+039F | Ο | GREEK CAPITAL LETTER OMICRON |
| 2 | U+03A0 | Π | GREEK CAPITAL LETTER PI |
| 2 | U+2605 | ★ | BLACK STAR |
| 2 | U+03A5 | Υ | GREEK CAPITAL LETTER UPSILON |
| 2 | U+00A6 | ¦ | BROKEN BAR or BROKEN VERTICAL BAR |
| 2 | U+03A6 | Φ | GREEK CAPITAL LETTER PHI |
| 2 | U+0107 | ć | LATIN SMALL LETTER C WITH ACUTE or LATIN SMALL LETTER C ACUTE |
| 2 | U+0169 | ũ | LATIN SMALL LETTER U WITH TILDE or LATIN SMALL LETTER U TILDE |
| 2 | U+00AF | ¯ | MACRON or SPACING MACRON |
| 2 | U+03F1 | ϱ | GREEK RHO SYMBOL or GREEK SMALL LETTER TAILED RHO |
| 2 | U+2032 | ′ | PRIME |
| 2 | U+00B4 | ´ | ACUTE ACCENT or SPACING ACUTE |
| 2 | U+2154 | ⅔ | VULGAR FRACTION TWO THIRDS or FRACTION TWO THIRDS |
| 2 | U+05D6 | ז | HEBREW LETTER ZAYIN |
| 2 | U+263E | ☾ | LAST QUARTER MOON |
| 2 | U+00FF | ÿ | LATIN SMALL LETTER Y WITH DIAERESIS or LATIN SMALL LETTER Y DIAERESIS |
| 1 | U+1F00 | ἀ | GREEK SMALL LETTER ALPHA WITH PSILI |
| 1 | U+1F40 | ὀ | GREEK SMALL LETTER OMICRON WITH PSILI |
| 1 | U+0342 | ͂ | COMBINING GREEK PERISPOMENI |
| 1 | U+05E3 | ף | HEBREW LETTER FINAL PE |
| 1 | U+0223 | ȣ | LATIN SMALL LETTER OU |
| 1 | U+2044 | ⁄ | FRACTION SLASH |
| 1 | U+2104 | ℄ | CENTRE LINE SYMBOL or C L SYMBOL |
| 1 | U+00C7 | Ç | LATIN CAPITAL LETTER C WITH CEDILLA or LATIN CAPITAL LETTER C CEDILLA |
| 1 | U+2008 | PUNCTUATION SPACE | |
| 1 | U+2089 | ₉ | SUBSCRIPT NINE or SUBSCRIPT DIGIT NINE |
| 1 | U+00C9 | É | LATIN CAPITAL LETTER E WITH ACUTE or LATIN CAPITAL LETTER E ACUTE |
| 1 | U+2609 | ☉ | SUN |
| 1 | U+220A | ∊ | SMALL ELEMENT OF |
| 1 | U+03CA | ϊ | GREEK SMALL LETTER IOTA WITH DIALYTIKA or GREEK SMALL LETTER IOTA DIAERESIS |
| 1 | U+00EC | ì | LATIN SMALL LETTER I WITH GRAVE or LATIN SMALL LETTER I GRAVE |
| 1 | U+03AC | ά | GREEK SMALL LETTER ALPHA WITH TONOS or GREEK SMALL LETTER ALPHA TONOS |
| 1 | U+03CD | ύ | GREEK SMALL LETTER UPSILON WITH TONOS or GREEK SMALL LETTER UPSILON TONOS |
| 1 | U+03CE | ώ | GREEK SMALL LETTER OMEGA WITH TONOS or GREEK SMALL LETTER OMEGA TONOS |
| 1 | U+00CF | Ï | LATIN CAPITAL LETTER I WITH DIAERESIS or LATIN CAPITAL LETTER I DIAERESIS |
| 1 | U+1F10 | ἐ | GREEK SMALL LETTER EPSILON WITH PSILI |
| 1 | U+1F50 | ὐ | GREEK SMALL LETTER UPSILON WITH PSILI |
| 1 | U+0392 | Β | GREEK CAPITAL LETTER BETA |
| 1 | U+2033 | ″ | DOUBLE PRIME |
| 1 | U+1F74 | ὴ | GREEK SMALL LETTER ETA WITH VARIA |
| 1 | U+0394 | Δ | GREEK CAPITAL LETTER DELTA |
| 1 | U+1FF6 | ῶ | GREEK SMALL LETTER OMEGA WITH PERISPOMENI |
| 1 | U+0397 | Η | GREEK CAPITAL LETTER ETA |
| 1 | U+2078 | ⁸ | SUPERSCRIPT EIGHT or SUPERSCRIPT DIGIT EIGHT |
| 1 | U+1F78 | ὸ | GREEK SMALL LETTER OMICRON WITH VARIA |
| 1 | U+2299 | ⊙ | CIRCLED DOT OPERATOR |
| 1 | U+02D9 | ˙ | DOT ABOVE or SPACING DOT ABOVE |
| 1 | U+055A | ՚ | ARMENIAN APOSTROPHE or ARMENIAN MODIFIER LETTER RIGHT HALF RING |
| 1 | U+00BC | ¼ | VULGAR FRACTION ONE QUARTER or FRACTION ONE QUARTER |
| 1 | U+201E | „ | DOUBLE LOW-9 QUOTATION MARK or LOW DOUBLE COMMA QUOTATION MARK |
| 1 | U+00BE | ¾ | VULGAR FRACTION THREE QUARTERS or FRACTION THREE QUARTERS |
| 1 | U+00BF | ¿ | INVERTED QUESTION MARK |
| 1 | U+1FBF | ᾿ | GREEK PSILI |
Total characters counted: 54,703,339.