sourCEntral - mobile manpages

pdf

NGRAMS

NAME

ngrams − Compute the ngram frequencies and produce tables to the stdout.

SYNOPIS

  ngram [−−version] [−−help] [−−n=3] [−−normalize] [−−type=TYPE]
        [−−orderby=ORD] [−−onlyfirst=N] [input files]

DESCRIPTION

This script produces n−grams tables of the input files to the standard output.

Options:
−−normalize

Prints normalized n−gram frequencies; i.e., the n−gram counts divided by the total number of n−grams of the same size.

−−onlyfirst=NUMBER

Prints only the first NUMBER n−grams for each n. See Text::Ngrams module.

−−limit=NUMBER

Limit the total number of distinct n−grams (for efficiency reasons, the counts may not be correct at the end).

−−version

Prints version.

−−help

Prints help.

−−n=NUMBER

N−gram size, produces 3−grams by default.

−−orderby=frequency|ngram

The n−gram order. See Text::Ngrams module.

−−type=character|byte|word|utf8

Type of n−grams produces. See Text::Ngrams module.

PREREQUISITES

Text::Ngrams, Getopt::Long

SCRIPT CATEGORIES

Text::Statistics

README

N−gram analysis for various kinds of n−grams (character, words, bytes, utf8, and user-defined). Based on Text::Ngrams module.

SEE ALSO

Text::Ngrams module.

COPYRIGHT

Copyright 2003−2013 Vlado Keselj http://web.cs.dal.ca/~vlado

This module is provided "as is" without expressed or implied warranty. This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The latest version can be found at http://web.cs.dal.ca/~vlado/srcperl/.

pdf