๐Ÿ”งToolify

Character Frequency Analyzer (letters, all chars, or words)

Paste any text to get a sorted frequency table. Three modes: all characters, letters/digits only, or whole words. Useful for cryptanalysis, writing analysis, and dataset cleanup.

How it works

What's it for

Cryptanalysis: classical ciphers (Caesar, substitution) preserve letter frequencies. English text always has E as the most common letter, then T, A, O, I, N. If you see roughly that distribution in cipher text, you have a substitution. CJK languages have very different distributions but still recognizable.

Writing analysis: spotting overused words is one of the fastest ways to improve drafts. If 'just' or 'really' appears 50 times in a 1000-word essay, you've found a tic to fix.

Dataset cleanup: scanning a CSV column with this tool reveals stray characters, encoding errors, and unexpected casing. Useful before importing data into a stricter system.

Three modes

All characters: includes spaces, punctuation, line breaks, emoji. Best for raw text analysis. Useful when you suspect hidden characters (zero-width space, BOM) corrupting a file.

Letters and digits: filters to only Unicode letters and numbers. Best for traditional letter-frequency analysis (cryptanalysis, language identification).

Words: splits on whitespace and counts whole words. Best for writing analysis and stylistic checking.

What 'case sensitive' does

Off (default): 'A' and 'a' count together. Best for letter-frequency on natural text where case is incidental.

On: 'A' and 'a' count separately. Useful when case is meaningful โ€” programming identifiers, branded terms, or analyzing capitalization patterns. Note: case-insensitive folding uses the locale's lowercase rules; for most languages this is the conventional Unicode case folding.

Frequently asked questions

โ€บDoes it work for Japanese, Chinese, Korean text?

Yes. Letter mode treats each ideograph as one 'letter', so you get hanzi/kanji frequency. Word mode splits on whitespace, which means CJK text without spaces shows as one giant word โ€” use letter mode for those.

โ€บWhat's the most common English letter?

'E' (about 12.7%), then T (9.1%), A (8.2%), O (7.5%), I (7.0%), N (6.7%). Knowing this is the foundation of breaking simple substitution ciphers.

โ€บAre emoji counted?

Yes in 'all characters' mode. Letter mode filters them out (they're not letters per Unicode classification).

โ€บWhy are emoji sometimes split into multiple characters?

Some emoji are multiple Unicode code points (e.g., flags = two regional indicator letters). The counter follows JavaScript string iteration which respects code points but not all grapheme clusters. For most analysis this is fine.

โ€บCan I export the table?

Not yet โ€” copy-paste the rendered table for now. CSV export is on the roadmap.

โ€บHow many entries does it show?

Top 50 in the table. The tail count is summarized at the bottom.

โ€บWhy don't case-insensitive Greek/Turkish results match my expectation?

Some languages have unusual case rules (Turkish dotted/dotless I; German รŸ โ†” SS). We use JavaScript's toLowerCase() which follows the default Unicode case folding โ€” usually fine but can surprise in edge cases.

โ€บDoes the data leave my browser?

No. All counting runs locally.

Related tools

Last updated:

Try our AI prompts โ†’