unihist − Generate a histogram of the characters in a Unicode file
unihist ([option flags])
unihist generates a histogram of the characters in its input, which must be encoded in UTF-8 Unicode. By default, for each character it prints the frequency of the character as a percentage of the total, the absolute number of tokens in the input, the UTF-32 code in hexadecimal, and, if the character is displayable, the glyph itself as UTF-8 Unicode. Command line flags allow unwanted information to be suppressed. In particular, note that by suppressing the percentages and counts it is possible to generate a list of the unique characters in the input.
Output is produced ordered by character code. To sort it in descending order of frequency, pipe the output into the command:
sort -k1 -n -r
By default, unihist handles all of Unicode. To reduce memory usage and increase speed, it may be compiled so as to handle only the Basic Monolingual Plane (plane 0) by defining BMPONLY.
-c |
Suppress printing of counts and percentages. |
|||
-g |
Suppress printing of glyphs. |
|||
-h |
Print usage information. |
|||
-u |
Suppress printing of the Unicode code as text. |
|||
-v |
Print version information. |
uniname (1)
Unicode Standard, version 5.0
Bill Poser
billposer AT alum DOT mit DOT edu
GNU General Public License