sourCEntral - mobile manpages

pdf

POCKETSPHINX_CONTINUOUS

NAME

pocketsphinx_continuous − Run speech recognition in continuous listening mode

SYNOPSIS

pocketsphinx_continuous [−infile filename.wav ] [ −inmic yes ] [ options ]...

DESCRIPTION

This program opens the audio device or a file and waits for speech. When it detects an utterance, it performs speech recognition on it.

To record from microphone and decode use
−inmic yes

To decode a 16kHz 16-bit mono WAV file use
−infile
filename.wav

You can also specify −lm or −fsg or −kws depending on whether you are using a statistical language model or a finite-state grammar or look for a keyphase.

OPTIONS

−adcdev

of audio device to use for input.

−agc

Automatic gain control for c0 (’max’, ’emax’, ’noise’, or ’none’)

−agcthresh

Initial threshold for automatic gain control

−allphone

phoneme decoding with phonetic lm

−allphone_ci

Perform phoneme decoding with phonetic lm and context-independent units only

−alpha

Preemphasis parameter

−argfile

file giving extra arguments.

−ascale

Inverse of acoustic model scale for confidence score calculation

−aw

Inverse weight applied to acoustic scores.

−backtrace

Print results and backtraces to log file.

−beam

Beam width applied to every frame in Viterbi search (smaller values mean wider beam)

−bestpath

Run bestpath (Dijkstra) search over word lattice (3rd pass)

−bestpathlw

Language model probability weight for bestpath search

−ceplen

Number of components in the input feature vector

−cmn

Cepstral mean normalization scheme (’current’, ’prior’, or ’none’)

−cmninit

Initial values (comma-separated) for cepstral mean when ’prior’ is used

−compallsen

Compute all senone scores in every frame (can be faster when there are many senones)

−debug

level for debugging messages

−dict

pronunciation dictionary (lexicon) input file

−dictcase

Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)

−dither

Add 1/2-bit noise

−doublebw

Use double bandwidth filters (same center freq)

−ds

Frame GMM computation downsampling ratio

−fdict

word pronunciation dictionary input file

−feat

Feature stream type, depends on the acoustic model

−featparams

containing feature extraction parameters.

−fillprob

Filler word transition probability

−frate

Frame rate

−fsg

format finite state grammar file

−fsgusealtpron

Add alternate pronunciations to FSG

−fsgusefiller

Insert filler words at each state.

−fwdflat

Run forward flat-lexicon search over word lattice (2nd pass)

−fwdflatbeam

Beam width applied to every frame in second-pass flat search

−fwdflatefwid

Minimum number of end frames for a word to be searched in fwdflat search

−fwdflatlw

Language model probability weight for flat lexicon (2nd pass) decoding

−fwdflatsfwin

Window of frames in lattice to search for successor words in fwdflat search

−fwdflatwbeam

Beam width applied to word exits in second-pass flat search

−fwdtree

Run forward lexicon-tree search (1st pass)

−hmm

containing acoustic model files.

−infile

file to transcribe.

−inmic

Transcribe audio from microphone.

−input_endian

Endianness of input data, big or little, ignored if NIST or MS Wav

−jsgf

grammar file

−keyphrase

to spot

−kws

file with keyphrases to spot, one per line

−kws_delay

Delay to wait for best detection score

−kws_plp

Phone loop probability for keyword spotting

−kws_threshold

Threshold for p(hyp)/p(alternatives) ratio

−latsize

Initial backpointer table size

−lda

containing transformation matrix to be applied to features (single-stream features only)

−ldadim

Dimensionality of output of feature transformation (0 to use entire matrix)

−lifter

Length of sin-curve for liftering, or 0 for no liftering.

−lm

trigram language model input file

−lmctl

a set of language model

−lmname

language model in −lmctl to use by default

−logbase

Base in which all log-likelihoods calculated

−logfn

to write log messages in

−logspec

Write out logspectral files instead of cepstra

−lowerf

Lower edge of filters

−lpbeam

Beam width applied to last phone in words

−lponlybeam

Beam width applied to last phone in single-phone words

−lw

Language model probability weight

−maxhmmpf

Maximum number of active HMMs to maintain at each frame (or −1 for no pruning)

−maxwpf

Maximum number of distinct word exits at each frame (or −1 for no pruning)

−mdef

definition input file

−mean

gaussian means input file

−mfclogdir

to log feature files to

−min_endfr

Nodes ignored in lattice construction if they persist for fewer than N frames

−mixw

mixture weights input file (uncompressed)

−mixwfloor

Senone mixture weights floor (applied to data from −mixw file)

−mllr

transformation to apply to means and variances

−mmap

Use memory-mapped I/O (if possible) for model files

−ncep

Number of cep coefficients

−nfft

Size of FFT

−nfilt

Number of filter banks

−nwpen

New word transition penalty

−pbeam

Beam width applied to phone transitions

−pip

Phone insertion penalty

−pl_beam

Beam width applied to phone loop search for lookahead

−pl_pbeam

Beam width applied to phone loop transitions for lookahead

−pl_pip

Phone insertion penalty for phone loop

−pl_weight

Weight for phoneme lookahead penalties

−pl_window

Phoneme lookahead window size, in frames

−rawlogdir

to log raw audio files to

−remove_dc

Remove DC offset from each frame

−remove_noise

Remove noise with spectral subtraction in mel-energies

−remove_silence

Enables VAD, removes silence frames from processing

−round_filters

Round mel filter frequencies to DFT points

−samprate

Sampling rate

−seed

Seed for random number generator; if less than zero, pick our own

−sendump

dump (compressed mixture weights) input file

−senlogdir

to log senone score files to

−senmgau

to codebook mapping input file (usually not needed)

−silprob

Silence word transition probability

−smoothspec

Write out cepstral-smoothed logspectral files

−svspec

specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)

−time

Print word times in file transcription.

−tmat

state transition matrix input file

−tmatfloor

HMM state transition probability floor (applied to −tmat file)

−topn

Maximum number of top Gaussians to use in scoring.

−topn_beam

Beam width used to determine top-N Gaussians (or a list, per-feature)

−toprule

rule for JSGF (first public rule is default)

−transform

Which type of transform to use to calculate cepstra (legacy, dct, or htk)

−unit_area

Normalize mel filters to unit area

−upperf

Upper edge of filters

−uw

Unigram weight

−vad_postspeech

Num of silence frames to keep after from speech to silence.

−vad_prespeech

Num of speech frames to keep before silence to speech.

−vad_startspeech

Num of speech frames to trigger vad from silence to speech.

−vad_threshold

Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.

−var

gaussian variances input file

−varfloor

Mixture gaussian variance floor (applied to data from −var file)

−varnorm

Variance normalize each utterance (only if CMN == current)

−verbose

Show input filenames

−warp_params

defining the warping function

−warp_type

Warping function type (or shape)

−wbeam

Beam width applied to word exits

−wip

Word insertion penalty

−wlen

Hamming window length

AUTHOR

Written by numerous people at CMU from 1994 onwards. This manual page by David Huggins-Daines <dhuggins AT cs DOT cmu DOT edu>

COPYRIGHT

Copyright © 1994-2016 Carnegie Mellon University. See the file LICENSE included with this package for more information.

SEE ALSO

pocketsphinx_batch(1), sphinx_fe(1).

pdf