Character counts of “interesting” characters in corpus.xml (which is basically all of distribution/) as of 2020-07-25 at 09:12. The “interesting” characters are the ones left over after deleting normal whitespace characters (U+09, U+0A, U+0D, U+20) and those characters that are on your keyboard.
There was a total of 54,703,339 characters. Thus almost 2.2% of all characters are LONG S, ~⅓ of 1% are SOFT HYPHEN, ~0.11% are EM DASH, and each of the others acocunts for < 1‰. (Note that symbol is PER MILLE (U+2030), not percent.)
Click on a column header to sort by that column.
count | codepoint | character | character name |
---|---|---|---|
1189205 | U+017F | ſ | LATIN SMALL LETTER LONG S |
182959 | U+00AD | | SOFT HYPHEN |
59894 | U+2014 | — | EM DASH |
18563 | U+2015 | ― | HORIZONTAL BAR or QUOTATION DASH |
2130 | U+00E6 | æ | LATIN SMALL LETTER AE or LATIN SMALL LETTER A E |
1567 | U+00E9 | é | LATIN SMALL LETTER E WITH ACUTE or LATIN SMALL LETTER E ACUTE |
1469 | U+0113 | ē | LATIN SMALL LETTER E WITH MACRON or LATIN SMALL LETTER E MACRON |
1356 | U+014D | ō | LATIN SMALL LETTER O WITH MACRON or LATIN SMALL LETTER O MACRON |
1119 | U+201C | “ | LEFT DOUBLE QUOTATION MARK or DOUBLE TURNED COMMA QUOTATION MARK |
872 | U+2423 | ␣ | OPEN BOX |
715 | U+2019 | ’ | RIGHT SINGLE QUOTATION MARK or SINGLE COMMA QUOTATION MARK |
695 | U+0101 | ā | LATIN SMALL LETTER A WITH MACRON or LATIN SMALL LETTER A MACRON |
666 | U+0153 | œ | LATIN SMALL LIGATURE OE or LATIN SMALL LETTER O E |
600 | U+2018 | ‘ | LEFT SINGLE QUOTATION MARK or SINGLE TURNED COMMA QUOTATION MARK |
430 | U+00C6 | Æ | LATIN CAPITAL LETTER AE or LATIN CAPITAL LETTER A E |
360 | U+016B | ū | LATIN SMALL LETTER U WITH MACRON or LATIN SMALL LETTER U MACRON |
295 | U+201D | ” | RIGHT DOUBLE QUOTATION MARK or DOUBLE COMMA QUOTATION MARK |
236 | U+00FB | û | LATIN SMALL LETTER U WITH CIRCUMFLEX or LATIN SMALL LETTER U CIRCUMFLEX |
215 | U+00F4 | ô | LATIN SMALL LETTER O WITH CIRCUMFLEX or LATIN SMALL LETTER O CIRCUMFLEX |
205 | U+00EB | ë | LATIN SMALL LETTER E WITH DIAERESIS or LATIN SMALL LETTER E DIAERESIS |
201 | U+2026 | … | HORIZONTAL ELLIPSIS |
195 | U+00E2 | â | LATIN SMALL LETTER A WITH CIRCUMFLEX or LATIN SMALL LETTER A CIRCUMFLEX |
185 | U+2013 | – | EN DASH |
154 | U+03B1 | α | GREEK SMALL LETTER ALPHA |
145 | U+00E0 | à | LATIN SMALL LETTER A WITH GRAVE or LATIN SMALL LETTER A GRAVE |
144 | U+00A7 | § | SECTION SIGN |
130 | U+03BD | ν | GREEK SMALL LETTER NU |
128 | U+00E8 | è | LATIN SMALL LETTER E WITH GRAVE or LATIN SMALL LETTER E GRAVE |
126 | U+03B9 | ι | GREEK SMALL LETTER IOTA |
119 | U+03B5 | ε | GREEK SMALL LETTER EPSILON |
111 | U+03BF | ο | GREEK SMALL LETTER OMICRON |
104 | U+00EA | ê | LATIN SMALL LETTER E WITH CIRCUMFLEX or LATIN SMALL LETTER E CIRCUMFLEX |
94 | U+00DF | ß | LATIN SMALL LETTER SHARP S |
93 | U+00F3 | ó | LATIN SMALL LETTER O WITH ACUTE or LATIN SMALL LETTER O ACUTE |
86 | U+00E1 | á | LATIN SMALL LETTER A WITH ACUTE or LATIN SMALL LETTER A ACUTE |
80 | U+00B7 | · | MIDDLE DOT |
75 | U+03C4 | τ | GREEK SMALL LETTER TAU |
74 | U+00B6 | ¶ | PILCROW SIGN or PARAGRAPH SIGN |
71 | U+0304 | ̄ | COMBINING MACRON or NON-SPACING MACRON |
63 | U+012B | ī | LATIN SMALL LETTER I WITH MACRON or LATIN SMALL LETTER I MACRON |
58 | U+03C3 | σ | GREEK SMALL LETTER SIGMA |
57 | U+03C1 | ρ | GREEK SMALL LETTER RHO |
51 | U+00EF | ï | LATIN SMALL LETTER I WITH DIAERESIS or LATIN SMALL LETTER I DIAERESIS |
51 | U+03BA | κ | GREEK SMALL LETTER KAPPA |
50 | U+03B7 | η | GREEK SMALL LETTER ETA |
45 | U+03C0 | π | GREEK SMALL LETTER PI |
45 | U+03C2 | ς | GREEK SMALL LETTER FINAL SIGMA |
44 | U+05D9 | י | HEBREW LETTER YOD |
44 | U+03BC | μ | GREEK SMALL LETTER MU |
42 | U+0313 | ̓ | COMBINING COMMA ABOVE or NON-SPACING COMMA ABOVE |
42 | U+03BB | λ | GREEK SMALL LETTER LAMDA or GREEK SMALL LETTER LAMBDA |
41 | U+0301 | ́ | COMBINING ACUTE ACCENT or NON-SPACING ACUTE |
41 | U+03C5 | υ | GREEK SMALL LETTER UPSILON |
39 | U+00A3 | £ | POUND SIGN |
39 | U+03B4 | δ | GREEK SMALL LETTER DELTA |
39 | U+00FC | ü | LATIN SMALL LETTER U WITH DIAERESIS or LATIN SMALL LETTER U DIAERESIS |
38 | U+00E7 | ç | LATIN SMALL LETTER C WITH CEDILLA or LATIN SMALL LETTER C CEDILLA |
37 | U+03C9 | ω | GREEK SMALL LETTER OMEGA |
31 | U+00F9 | ù | LATIN SMALL LETTER U WITH GRAVE or LATIN SMALL LETTER U GRAVE |
29 | U+05E8 | ר | HEBREW LETTER RESH |
28 | U+0300 | ̀ | COMBINING GRAVE ACCENT or NON-SPACING GRAVE |
28 | U+05DE | מ | HEBREW LETTER MEM |
27 | U+05DC | ל | HEBREW LETTER LAMED |
26 | U+05D5 | ו | HEBREW LETTER VAV |
25 | U+03B8 | θ | GREEK SMALL LETTER THETA |
24 | U+03B3 | γ | GREEK SMALL LETTER GAMMA |
23 | U+05D4 | ה | HEBREW LETTER HE |
23 | U+005E | ^ | CIRCUMFLEX ACCENT or SPACING CIRCUMFLEX |
22 | U+03C7 | χ | GREEK SMALL LETTER CHI |
22 | U+05D1 | ב | HEBREW LETTER BET |
21 | U+05E0 | נ | HEBREW LETTER NUN |
21 | U+05D3 | ד | HEBREW LETTER DALET |
20 | U+05E2 | ע | HEBREW LETTER AYIN |
20 | U+05E9 | ש | HEBREW LETTER SHIN |
20 | U+00B0 | ° | DEGREE SIGN |
20 | U+261E | ☞ | WHITE RIGHT POINTING INDEX |
19 | U+2020 | † | DAGGER |
19 | U+05EA | ת | HEBREW LETTER TAV |
19 | U+00EE | î | LATIN SMALL LETTER I WITH CIRCUMFLEX or LATIN SMALL LETTER I CIRCUMFLEX |
19 | U+2011 | ‑ | NON-BREAKING HYPHEN |
18 | U+05D2 | ג | HEBREW LETTER GIMEL |
17 | U+05E4 | פ | HEBREW LETTER PE |
16 | U+0152 | Œ | LATIN CAPITAL LIGATURE OE or LATIN CAPITAL LETTER O E |
16 | U+0314 | ̔ | COMBINING REVERSED COMMA ABOVE or NON-SPACING REVERSED COMMA ABOVE |
15 | U+00F2 | ò | LATIN SMALL LETTER O WITH GRAVE or LATIN SMALL LETTER O GRAVE |
15 | U+05D7 | ח | HEBREW LETTER HET |
15 | U+05DD | ם | HEBREW LETTER FINAL MEM |
14 | U+03C6 | φ | GREEK SMALL LETTER PHI |
14 | U+05D0 | א | HEBREW LETTER ALEF |
13 | U+00ED | í | LATIN SMALL LETTER I WITH ACUTE or LATIN SMALL LETTER I ACUTE |
12 | U+0391 | Α | GREEK CAPITAL LETTER ALPHA |
10 | U+00F6 | ö | LATIN SMALL LETTER O WITH DIAERESIS or LATIN SMALL LETTER O DIAERESIS |
10 | U+261C | ☜ | WHITE LEFT POINTING INDEX |
9 | U+05E1 | ס | HEBREW LETTER SAMEKH |
9 | U+2022 | • | BULLET |
9 | U+0302 | ̂ | COMBINING CIRCUMFLEX ACCENT or NON-SPACING CIRCUMFLEX |
9 | U+05E7 | ק | HEBREW LETTER QOF |
9 | U+2010 | ‐ | HYPHEN |
9 | U+03B2 | β | GREEK SMALL LETTER BETA |
9 | U+00FA | ú | LATIN SMALL LETTER U WITH ACUTE or LATIN SMALL LETTER U ACUTE |
9 | U+00BD | ½ | VULGAR FRACTION ONE HALF or FRACTION ONE HALF |
9 | U+03BE | ξ | GREEK SMALL LETTER XI |
9 | U+05DF | ן | HEBREW LETTER FINAL NUN |
8 | U+05D8 | ט | HEBREW LETTER TET |
8 | U+05DB | כ | HEBREW LETTER KAF |
7 | U+2021 | ‡ | DOUBLE DAGGER |
7 | U+2041 | ⁁ | CARET INSERTION POINT |
7 | U+00E3 | ã | LATIN SMALL LETTER A WITH TILDE or LATIN SMALL LETTER A TILDE |
7 | U+00E4 | ä | LATIN SMALL LETTER A WITH DIAERESIS or LATIN SMALL LETTER A DIAERESIS |
7 | U+05E6 | צ | HEBREW LETTER TSADI |
6 | U+03AF | ί | GREEK SMALL LETTER IOTA WITH TONOS or GREEK SMALL LETTER IOTA TONOS |
6 | U+00F1 | ñ | LATIN SMALL LETTER N WITH TILDE or LATIN SMALL LETTER N TILDE |
6 | U+0399 | Ι | GREEK CAPITAL LETTER IOTA |
5 | U+0060 | ` | GRAVE ACCENT or SPACING GRAVE |
5 | U+03A3 | Σ | GREEK CAPITAL LETTER SIGMA |
5 | U+03AD | έ | GREEK SMALL LETTER EPSILON WITH TONOS or GREEK SMALL LETTER EPSILON TONOS |
5 | U+03AE | ή | GREEK SMALL LETTER ETA WITH TONOS or GREEK SMALL LETTER ETA TONOS |
5 | U+0395 | Ε | GREEK CAPITAL LETTER EPSILON |
5 | U+039D | Ν | GREEK CAPITAL LETTER NU |
4 | U+2663 | ♣ | BLACK CLUB SUIT |
4 | U+03CC | ό | GREEK SMALL LETTER OMICRON WITH TONOS or GREEK SMALL LETTER OMICRON TONOS |
4 | U+00F5 | õ | LATIN SMALL LETTER O WITH TILDE or LATIN SMALL LETTER O TILDE |
3 | U+2042 | ⁂ | ASTERISM |
3 | U+2003 | EM SPACE | |
3 | U+0263 | ɣ | LATIN SMALL LETTER GAMMA |
3 | U+03A4 | Τ | GREEK CAPITAL LETTER TAU |
3 | U+03A9 | Ω | GREEK CAPITAL LETTER OMEGA |
3 | U+03B6 | ζ | GREEK SMALL LETTER ZETA |
3 | U+0398 | Θ | GREEK CAPITAL LETTER THETA |
3 | U+05DA | ך | HEBREW LETTER FINAL KAF |
3 | U+00FD | ý | LATIN SMALL LETTER Y WITH ACUTE or LATIN SMALL LETTER Y ACUTE |
3 | U+1EBD | ẽ | LATIN SMALL LETTER E WITH TILDE |
3 | U+039F | Ο | GREEK CAPITAL LETTER OMICRON |
2 | U+03A0 | Π | GREEK CAPITAL LETTER PI |
2 | U+2605 | ★ | BLACK STAR |
2 | U+03A5 | Υ | GREEK CAPITAL LETTER UPSILON |
2 | U+00A6 | ¦ | BROKEN BAR or BROKEN VERTICAL BAR |
2 | U+03A6 | Φ | GREEK CAPITAL LETTER PHI |
2 | U+0107 | ć | LATIN SMALL LETTER C WITH ACUTE or LATIN SMALL LETTER C ACUTE |
2 | U+0169 | ũ | LATIN SMALL LETTER U WITH TILDE or LATIN SMALL LETTER U TILDE |
2 | U+00AF | ¯ | MACRON or SPACING MACRON |
2 | U+03F1 | ϱ | GREEK RHO SYMBOL or GREEK SMALL LETTER TAILED RHO |
2 | U+2032 | ′ | PRIME |
2 | U+00B4 | ´ | ACUTE ACCENT or SPACING ACUTE |
2 | U+2154 | ⅔ | VULGAR FRACTION TWO THIRDS or FRACTION TWO THIRDS |
2 | U+05D6 | ז | HEBREW LETTER ZAYIN |
2 | U+263E | ☾ | LAST QUARTER MOON |
2 | U+00FF | ÿ | LATIN SMALL LETTER Y WITH DIAERESIS or LATIN SMALL LETTER Y DIAERESIS |
1 | U+1F00 | ἀ | GREEK SMALL LETTER ALPHA WITH PSILI |
1 | U+1F40 | ὀ | GREEK SMALL LETTER OMICRON WITH PSILI |
1 | U+0342 | ͂ | COMBINING GREEK PERISPOMENI |
1 | U+05E3 | ף | HEBREW LETTER FINAL PE |
1 | U+0223 | ȣ | LATIN SMALL LETTER OU |
1 | U+2044 | ⁄ | FRACTION SLASH |
1 | U+2104 | ℄ | CENTRE LINE SYMBOL or C L SYMBOL |
1 | U+00C7 | Ç | LATIN CAPITAL LETTER C WITH CEDILLA or LATIN CAPITAL LETTER C CEDILLA |
1 | U+2008 | PUNCTUATION SPACE | |
1 | U+2089 | ₉ | SUBSCRIPT NINE or SUBSCRIPT DIGIT NINE |
1 | U+00C9 | É | LATIN CAPITAL LETTER E WITH ACUTE or LATIN CAPITAL LETTER E ACUTE |
1 | U+2609 | ☉ | SUN |
1 | U+220A | ∊ | SMALL ELEMENT OF |
1 | U+03CA | ϊ | GREEK SMALL LETTER IOTA WITH DIALYTIKA or GREEK SMALL LETTER IOTA DIAERESIS |
1 | U+00EC | ì | LATIN SMALL LETTER I WITH GRAVE or LATIN SMALL LETTER I GRAVE |
1 | U+03AC | ά | GREEK SMALL LETTER ALPHA WITH TONOS or GREEK SMALL LETTER ALPHA TONOS |
1 | U+03CD | ύ | GREEK SMALL LETTER UPSILON WITH TONOS or GREEK SMALL LETTER UPSILON TONOS |
1 | U+03CE | ώ | GREEK SMALL LETTER OMEGA WITH TONOS or GREEK SMALL LETTER OMEGA TONOS |
1 | U+00CF | Ï | LATIN CAPITAL LETTER I WITH DIAERESIS or LATIN CAPITAL LETTER I DIAERESIS |
1 | U+1F10 | ἐ | GREEK SMALL LETTER EPSILON WITH PSILI |
1 | U+1F50 | ὐ | GREEK SMALL LETTER UPSILON WITH PSILI |
1 | U+0392 | Β | GREEK CAPITAL LETTER BETA |
1 | U+2033 | ″ | DOUBLE PRIME |
1 | U+1F74 | ὴ | GREEK SMALL LETTER ETA WITH VARIA |
1 | U+0394 | Δ | GREEK CAPITAL LETTER DELTA |
1 | U+1FF6 | ῶ | GREEK SMALL LETTER OMEGA WITH PERISPOMENI |
1 | U+0397 | Η | GREEK CAPITAL LETTER ETA |
1 | U+2078 | ⁸ | SUPERSCRIPT EIGHT or SUPERSCRIPT DIGIT EIGHT |
1 | U+1F78 | ὸ | GREEK SMALL LETTER OMICRON WITH VARIA |
1 | U+2299 | ⊙ | CIRCLED DOT OPERATOR |
1 | U+02D9 | ˙ | DOT ABOVE or SPACING DOT ABOVE |
1 | U+055A | ՚ | ARMENIAN APOSTROPHE or ARMENIAN MODIFIER LETTER RIGHT HALF RING |
1 | U+00BC | ¼ | VULGAR FRACTION ONE QUARTER or FRACTION ONE QUARTER |
1 | U+201E | „ | DOUBLE LOW-9 QUOTATION MARK or LOW DOUBLE COMMA QUOTATION MARK |
1 | U+00BE | ¾ | VULGAR FRACTION THREE QUARTERS or FRACTION THREE QUARTERS |
1 | U+00BF | ¿ | INVERTED QUESTION MARK |
1 | U+1FBF | ᾿ | GREEK PSILI |
Total characters counted: 54,703,339.