pocketsphinx_continuous − Run speech recognition in continuous listening mode
pocketsphinx_continuous [−infile filename.wav ] [ −inmic yes ] [ options ]...
This program opens the audio device or a file and waits for speech. When it detects an utterance, it performs speech recognition on it.
To record from microphone and decode use
−inmic yes
To decode a 16kHz 16-bit mono WAV file use
−infile filename.wav
You can also specify −lm or −fsg or −kws depending on whether you are using a statistical language model or a finite-state grammar or look for a keyphase.
−adcdev
of audio device to use for input.
−agc |
Automatic gain control for c0 (’max’, ’emax’, ’noise’, or ’none’) |
−agcthresh
Initial threshold for automatic gain control
−allphone
phoneme decoding with phonetic lm
−allphone_ci
Perform phoneme decoding with phonetic lm and context-independent units only
−alpha |
Preemphasis parameter |
−argfile
file giving extra arguments.
−ascale
Inverse of acoustic model scale for confidence score calculation
−aw |
Inverse weight applied to acoustic scores. |
−backtrace
Print results and backtraces to log file.
−beam |
Beam width applied to every frame in Viterbi search (smaller values mean wider beam) |
−bestpath
Run bestpath (Dijkstra) search over word lattice (3rd pass)
−bestpathlw
Language model probability weight for bestpath search
−ceplen
Number of components in the input feature vector
−cmn |
Cepstral mean normalization scheme (’current’, ’prior’, or ’none’) |
−cmninit
Initial values (comma-separated) for cepstral mean when ’prior’ is used
−compallsen
Compute all senone scores in every frame (can be faster when there are many senones)
−debug |
level for debugging messages |
|||
−dict |
pronunciation dictionary (lexicon) input file |
−dictcase
Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
−dither
Add 1/2-bit noise
−doublebw
Use double bandwidth filters (same center freq)
−ds |
Frame GMM computation downsampling ratio |
|||
−fdict |
word pronunciation dictionary input file |
|||
−feat |
Feature stream type, depends on the acoustic model |
−featparams
containing feature extraction parameters.
−fillprob
Filler word transition probability
−frate |
Frame rate |
|||
−fsg |
format finite state grammar file |
−fsgusealtpron
Add alternate pronunciations to FSG
−fsgusefiller
Insert filler words at each state.
−fwdflat
Run forward flat-lexicon search over word lattice (2nd pass)
−fwdflatbeam
Beam width applied to every frame in second-pass flat search
−fwdflatefwid
Minimum number of end frames for a word to be searched in fwdflat search
−fwdflatlw
Language model probability weight for flat lexicon (2nd pass) decoding
−fwdflatsfwin
Window of frames in lattice to search for successor words in fwdflat search
−fwdflatwbeam
Beam width applied to word exits in second-pass flat search
−fwdtree
Run forward lexicon-tree search (1st pass)
−hmm |
containing acoustic model files. |
−infile
file to transcribe.
−inmic |
Transcribe audio from microphone. |
−input_endian
Endianness of input data, big or little, ignored if NIST or MS Wav
−jsgf |
grammar file |
−keyphrase
to spot
−kws |
file with keyphrases to spot, one per line |
−kws_delay
Delay to wait for best detection score
−kws_plp
Phone loop probability for keyword spotting
−kws_threshold
Threshold for p(hyp)/p(alternatives) ratio
−latsize
Initial backpointer table size
−lda |
containing transformation matrix to be applied to features (single-stream features only) |
−ldadim
Dimensionality of output of feature transformation (0 to use entire matrix)
−lifter
Length of sin-curve for liftering, or 0 for no liftering.
−lm |
trigram language model input file |
|||
−lmctl |
a set of language model |
−lmname
language model in −lmctl to use by default
−logbase
Base in which all log-likelihoods calculated
−logfn |
to write log messages in |
−logspec
Write out logspectral files instead of cepstra
−lowerf
Lower edge of filters
−lpbeam
Beam width applied to last phone in words
−lponlybeam
Beam width applied to last phone in single-phone words
−lw |
Language model probability weight |
−maxhmmpf
Maximum number of active HMMs to maintain at each frame (or −1 for no pruning)
−maxwpf
Maximum number of distinct word exits at each frame (or −1 for no pruning)
−mdef |
definition input file |
|||
−mean |
gaussian means input file |
−mfclogdir
to log feature files to
−min_endfr
Nodes ignored in lattice construction if they persist for fewer than N frames
−mixw |
mixture weights input file (uncompressed) |
−mixwfloor
Senone mixture weights floor (applied to data from −mixw file)
−mllr |
transformation to apply to means and variances |
||
−mmap |
Use memory-mapped I/O (if possible) for model files |
||
−ncep |
Number of cep coefficients |
||
−nfft |
Size of FFT |
||
−nfilt |
Number of filter banks |
||
−nwpen |
New word transition penalty |
||
−pbeam |
Beam width applied to phone transitions |
||
−pip |
Phone insertion penalty |
−pl_beam
Beam width applied to phone loop search for lookahead
−pl_pbeam
Beam width applied to phone loop transitions for lookahead
−pl_pip
Phone insertion penalty for phone loop
−pl_weight
Weight for phoneme lookahead penalties
−pl_window
Phoneme lookahead window size, in frames
−rawlogdir
to log raw audio files to
−remove_dc
Remove DC offset from each frame
−remove_noise
Remove noise with spectral subtraction in mel-energies
−remove_silence
Enables VAD, removes silence frames from processing
−round_filters
Round mel filter frequencies to DFT points
−samprate
Sampling rate
−seed |
Seed for random number generator; if less than zero, pick our own |
−sendump
dump (compressed mixture weights) input file
−senlogdir
to log senone score files to
−senmgau
to codebook mapping input file (usually not needed)
−silprob
Silence word transition probability
−smoothspec
Write out cepstral-smoothed logspectral files
−svspec
specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
−time |
Print word times in file transcription. |
|||
−tmat |
state transition matrix input file |
−tmatfloor
HMM state transition probability floor (applied to −tmat file)
−topn |
Maximum number of top Gaussians to use in scoring. |
−topn_beam
Beam width used to determine top-N Gaussians (or a list, per-feature)
−toprule
rule for JSGF (first public rule is default)
−transform
Which type of transform to use to calculate cepstra (legacy, dct, or htk)
−unit_area
Normalize mel filters to unit area
−upperf
Upper edge of filters
−uw |
Unigram weight |
−vad_postspeech
Num of silence frames to keep after from speech to silence.
−vad_prespeech
Num of speech frames to keep before silence to speech.
−vad_startspeech
Num of speech frames to trigger vad from silence to speech.
−vad_threshold
Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.
−var |
gaussian variances input file |
−varfloor
Mixture gaussian variance floor (applied to data from −var file)
−varnorm
Variance normalize each utterance (only if CMN == current)
−verbose
Show input filenames
−warp_params
defining the warping function
−warp_type
Warping function type (or shape)
−wbeam |
Beam width applied to word exits |
|||
−wip |
Word insertion penalty |
|||
−wlen |
Hamming window length |
Written by numerous people at CMU from 1994 onwards. This manual page by David Huggins-Daines <dhuggins AT cs DOT cmu DOT edu>
Copyright © 1994-2016 Carnegie Mellon University. See the file LICENSE included with this package for more information.