.de Sp
.if t .sp .5v
.if n .sp
..
.de Ip
.br
.ie \\n.$>=3 .ne \\$3
.el .ne 3
.IP "\\$1" \\$2
..
.TH JULIAN 1 LOCAL
.UC 6
.SH NAME
Julian - open source grammar based continuous speech recognition parser
.SH SYNOPSIS
.B julian [-C jconffile] [options ...]
.SH DESCRIPTION
.I Julian
is a multi-purpose speech recognition parser based on finite state
grammar.  It is an another variation of
.I Julius
, and is included in the distribution of Julius.  It has a capability
of performing almost real-time recognition of continuous speech with
over ten thousands of words on most current PCs.
.PP
Written finite state grammar and triphone HMM acoustic model of any
units and sizes can be used.  The grammar format is original one, and
tools to create a recognirion grammar are included in the
distribution.  Standard formats are also adopted for acoustic models.
Users can make their own grammars, their own acoustic models with
Julian to build recognition system of their own.
.PP
Julian can perform recognition on audio files, live microphone input,
network input and feature parameter files.  The maximum size of
vocabulary is 65,535 words.
.SH "RECOGNITION MODELS"
.I Julian
supports the following models.
.Ip "Acoustic Models" 10
Same as Julius: Sub-word HMM (Hidden Markov Model) in HTK format are
supported.  Phoneme models (monophone), context dependent phoneme
models (triphone), tied-mixture and phonetic tied-mixture models of
any unit can be used.  When using context dependent models, interword
context is also handled.
.Ip "Lanaguage model" 10
For the task grammar, sentence structures are written in a BNF style
using word categories as terminating symbols to a grammar file. A voca
file contains the pronunciation (phoneme sequence) for all words
within each category are created. These files are converted by
mkdfa.pl(1) to a deterministic finite automaton file (.dfa) and a
dictionary file (.dict).
.SH SPEECH INPUT
Same as Julius: Speech waveform files (16bit WAV (no compression),
RAW format, and many other if used with 
.I libsndfile
library) and feature parameter files (HTK format) can be used as
speech input.  Live input from either a Microphone, a DatLink
(NetAudio) system, or via tcpip network is also supported.
.PP
Notice: Julian can only extract MFCC_E_D_N_Z features internally.  If
you want to use HMMs based on another type of feature extraction then
microphone input and speech waveform files cannot be used.  Use an
external tool such as
.I Hcopy
or 
.I wav2mfcc
to create the appropriate feature parameter files.
.SH "SEARCH ALGORITHM"
Recognition algorithm of
.I Julian
is based on a two-pass strategy.  In the first pass, a high-speed
approximate search is performed using weaker constraints then the
given grammar.  Here a LR beam search using only inter-category
constraints extracted from the grammar is performed. The second pass
re-searches the input, using the original grammar rules and
intermediate results from the first pass, to gain a high precision
result quickly.  In the second pass the optimal solution is
theoretically guaranteed using the A* search.
.PP
When using context dependent phones (triphones), interword contexts
are taken into consideration.  For tied-mixture and phonetic
tied-mixture models, high-speed acoustic likelihood calculation is
possible using gaussian pruning.
.PP
For more details, see the related document or web site below.
.SH "OPTIONS"
The options below allow you to specify the models and set system
parameters.  You can set these option at the command line, however it
is recommended that you combine these options in a "jconf settings
file" and use the "-C" option to read it at run time.
.PP
Most are the same as Julius.
.br
Options only in Julian: -dfa, -penalty1, -penalty2, -looktrellis
.br
Options only in Julius: -nlr, -nrl, -d, -lmp, -lmp2, -transp,
-silhead, -siltail, -spdur, -sepnum, -separatescore
.PP
Below is an explanation of all the available options.
.SS Speech Input
.Ip "-input {rawfile|mfcfile|mic|adinnet|netaudio|stdin}"
Select speech data input source.  'rawfile' is from waveform file
(file name should be specified after startup).  'mfcfile' is a feature
vector file extracted by HTK HCopy tool.  'mic' means live microphone
input, and 'adinnet' means receiving waveform data via tcpip network
from an adinnet client. 'stdin' means standard tty input.
.sp
The supported waveform file format varies based on compilation time
configuration.  To see what format is actually supported, see the help
message using option "-help".  (for stdin input, only WAV (no
compression) and RAW (16bit, BE) is supported.)
.br
(default: mfcfile)
.Ip "\-filelist file"
(with -input rawfile|mfcfile) perform recognition on all files contained
within the target file.
.Ip "\-adport portnum"
(with -input adinnet) adinnet port number (default: 5530)
.Ip "\-NA server:unit"
(with -input netaudio) set the server name and unit ID of the Datlink
unit.
.SS Speech Detection
.Ip "\-cutsilence"
.Ip "\-nocutsilence"
Force silence cutting (=speech segment detection) ON/OFF. (default: ON
for mic/adinnet, OFF for files)
.Ip "\-lv threslevel"
Amplitude threshold (0 - 32767).  If the amplitude passes this
threshold it is considered to be the beginning of a speech segment, if
it drops below this level then it is the end of the speech
segment. (default: 3000)
.Ip "\-zc zerocrossnum"
Zero crossing threshold per a second (default: 60)
.Ip "\-headmargin msec"
Margin at the start of the speech segment in milliseconds. (default: 300)
.Ip "\-tailmargin msec"
Margin at the end of the speech segment in milliseconds. (default: 400)
.Ip "\-nostrip"
On some sound devices, invalid "0" samples may be recorded at the start and
end of recording.  Julian remove them automatically by default.  This
option inhibit the automatic removal.
.SS Acoustic Analysis
.Ip "\-smpFreq frequency"
Sampling frequency (Hz).
.br
(default: 16kHz = 625ns).
.Ip "\-smpPeriod period"
Sampling rate (nanoseconds).
.br
(default: 625ns = 16kHz).
.Ip "\-fsize sample"
Analysis window size (No. samples) (default: 400).
.Ip "\-fshift sample"
Frame shift (No. samples) (default: 160).
.Ip "\-delwin frame"
Delta window size (No. frames) (default: 2).
.Ip "\-hipass frequency"
High-pass filter cutoff frequency (Hz).
.br
(default: -1 = disabled)
.Ip "\-lopass frequency"
Low-pass filter cutoff frequency (Hz).
.br
(default: -1 = disabled)
.Ip "\-sscalc"
Perform spectral subtraction using the head silence of files.  Valid
only for rawfile input.
.Ip "\-sscalclen"
Specify the length of head silence in milliseconds (default: 300)
.Ip "\-ssload filename"
Perform spectral subtraction for speech input using pre-estimated
noise spectrum from file.  The noise spectrum data should be computed
beforehand by 
.I mkss.
.Ip "\-ssalpha value"
Alpha coefficient of spectral subtraction.  Noise will be subtracted
stronger as this value gets larger, but distortion of the resulting
signal also becomes remarkable.  (default: 2.0)
.Ip "\-ssfloor value"
Flooring coefficient of spectral subtraction.  For spectral parameters
that go under zero after subtraction, the source signal is assigned
with this coefficient multiplied. (default: 0.5)
.SS Language Model (Finite State Grammar)
.Ip "\-dfa dfa_filename"
finite state automaton grammar file. (required)
.Ip "\-penalty1 float"
Word insertion penalty for the first pass. (default: 0.0)
.Ip "\-penalty2 float"
Word insertion penalty for the first pass. (default: 0.0)
.SS Word Dictionary
.Ip "\-v dictionary_file"
Word dictionary file (required)
.Ip "\-spmodel {WORD|WORD[OUTSYM]|#num}"
Name of short pause model as defined in the hmmdefs.
(default: "sp")
.sp
For Words that has this model as a pronunciation and intended to match
the short pauses between words, Julian handle them especially to deal
with short pause insertion.  They can be defined as
shown below.
.sp
.RS 4
.TS
.if \n+(b.=1 .nr d. \n(.c-\n(c.-1
.de 35
.ps \n(.s
.vs \n(.vu
.in \n(.iu
.if \n(.u .fi
.if \n(.j .ad
.if \n(.j=0 .na
..
.nf
.nr #~ 0
.if n .nr #~ 0.6n
.ds #d .d
.if \(ts\n(.z\(ts\(ts .ds #d nl
.fc
.nr 33 \n(.s
.rm 80 81
.nr 80 0
.nr 38 \wWord_name
.if \n(80<\n(38 .nr 80 \n(38
.nr 38 \wWord_name[output_symbol]
.if \n(80<\n(38 .nr 80 \n(38
.nr 38 \w#Word_ID
.if \n(80<\n(38 .nr 80 \n(38
.80
.rm 80
.nr 81 0
.nr 38 \wExample
.if \n(81<\n(38 .nr 81 \n(38
.nr 38 \w<s>
.if \n(81<\n(38 .nr 81 \n(38
.nr 38 \w<s>[silB]
.if \n(81<\n(38 .nr 81 \n(38
.nr 38 \w#14
.if \n(81<\n(38 .nr 81 \n(38
.81
.rm 81
.nr 38 1n
.nr 79 0
.nr 40 \n(79+(0*\n(38)
.nr 80 +\n(40
.nr 41 \n(80+(3*\n(38)
.nr 81 +\n(41
.nr TW \n(81
.if t .if \n(TW>\n(.li .tm Table at line 103 file julius.man is too wide - \n(TW units
.fc  
.nr #T 0-1
.nr #a 0-1
.eo
.de T#
.ds #d .d
.if \(ts\n(.z\(ts\(ts .ds #d nl
.mk ##
.nr ## -1v
.ls 1
.ls
..
.ec
.ta \n(80u \n(81u 
.nr 31 \n(.f
.nr 35 1m
\&\h'|\n(40u'\h'|\n(41u'Example
.ta \n(80u \n(81u 
.nr 31 \n(.f
.nr 35 1m
\&\h'|\n(40u'Word_name\h'|\n(41u'<s>
.ta \n(80u \n(81u 
.nr 31 \n(.f
.nr 35 1m
\&\h'|\n(40u'Word_name[output_symbol]\h'|\n(41u'<s>[silB]
.ta \n(80u \n(81u 
.nr 31 \n(.f
.nr 35 1m
\&\h'|\n(40u'#Word_ID\h'|\n(41u'#14
.fc
.nr T. 1
.T# 1
.35
.TE
.if \n-(b.=0 .nr c. \n(.c-\n(d.-7
.RE
.sp
     (Word_ID is the word position in the dictionary
      file starting from 0)
.Ip "\-forcedict"
Disregard dictionary errors.  Word definitions with errors will be
skipped on startup.
.SS Acoustic Model (HMM)
.Ip "\-h hmmfilename"
HMM definition file to use. (required)
.Ip "\-hlist HMMlistfilename"
HMMList file to use.  Required when using triphone based HMMs.
This file provides a mapping between the logical triphones names
genertated from the phonetic representation in the dictionary and the
HMM definition names.
.Ip "\-iwcd1 {max|avg}"
When using a triphone model, select method to handle inter-word triphone
context on the first and last phone of a word in the first pass.
.sp
max: use maximum likelihood of the same
     context triphones (default)
.br
avg: use average likelihood of the same
     context triphones
.Ip "\-force_ccd / \-no_ccd "
Normally Julian determines whether the specified hmmdefs is a
context-dependent model by the model definition names, i.e., whether
the model names contain character '+' and '-'.  In case the automatic
detection fails, you can explicitly specify by these options.
These options will override the automatic detection result.
.Ip "\-notypecheck"
Disable check of the input parameter type. (default: enabled)
.SS Acoustic Computation
Gaussian Pruning will be automatically enabled when using
tied-mixture based acoutic model.  Gaussian Selection needs a
monophone model converted by 
.I mkgshmm
to activate.
.Ip "\-tmix K"
With Gaussian Pruning, specify the number of Gaussians to compute per
codebook. (default: 2)
.Ip "\-gprune {safe|heuristic|beam|none}"
Set the Gaussian pruning technique to use.
.br
(default: safe (setup=standard) beam (setup=fast))
.Ip "\-gshmm hmmdefs"
Specify monophone hmmdefs to use for Gaussian Mixture Selectio.
Monophone model for GMS is generated from an ordinary monophone HMM
model using
.I mkgshmm.
This option is disabled by default. (no GMS applied)
.Ip "\-gsnum N"
When using GMS, specify number of monophone state to select from whole
monophone states. (default: 24)
.SS Inter-word Short Pause Handling
.Ip "\-iwsp"
(Multi-path version only) Enable inter-word context-free short pause
handling.  This option appends a skippable short pause model for every
word end.  The added model will also be ignored in context modeling.
The model specified by "-spmodel" will be appended.
.SS Search Parameters (First Pass)
.Ip "\-b beamwidth"
Beam width (Number of HMM nodes).
As this value increases the precision also increases, however,
processing time and memory usage also increase.
.sp
default value: acoustic model dependent
  400 (monophone)
  800 (triphone,PTM)
 1000 (triphone,PTM, setup=v2.1)
.Ip "\-1pass "
Only perform the first pass search.  This mode is automatically set
when no 3-gram language model has been specified (-nlr).
.Ip "\-realtime"
.Ip "\-norealtime"
Explicitly specify whether real-time (pipeline) processing will be
done in the first pass or not.  For file input, the default is OFF
(-norealtime), for microphone, adinnet and NetAudio input, the default
is ON (-realtime).  This option relates to the way CMN is performed:
when OFF CMN is calculated for each input independently, when the
realtime option is ON the previous 5 second of input is always
used.  Also refer to -progout.
.Ip "\-cmnsave filename"
Save last CMN parameters computed while recognition to the specified
file.  The parameters will be saved to the file in each time a input
is recognized, so the output file always keeps the last CMN
parameters.  If output file already exist, it will be overridden.
.Ip "\-cmnload filename"
Load initial CMN parameters previously saved in a file by "-cmnsave".
This option enables Julian to recognize the first utterance of a live
microphone input or adinnet input with CMN.
.SS Search Parameters (Second Pass)
.Ip "\-b2 hyponum"
Beam width (number of hypothesis) in second pass.  If the count of
word expantion at a certain length of hypothesis reaches this limit
while search, shorter hypotheses are not expanded further.  This
prevents search to fall in breadth-first-like status stacking on the
same position, and improve search failure.  (default: 30)
.Ip "\-n candidatenum"
The search continues till 'candidate_num' sentence hypotheses have
been found.  The obtained sentence hypotheses are sorted by score, and
final result is displayed in the order (see also the "-output" option).
.sp
The possibility that the optimum hypothesis is found increases as this
value is increased, but the processing time also becomes longer.
.sp
Default value depends on the  engine setup on compilation time:
.br
  10  (standard)
   1  (fast, v2.1)
.Ip "\-output N "
The top N sentence hypothesis will be Output at the end of search.
Use with "-n" option. (default: 1)
.Ip "\-sb score"
Score envelope width for enveloped scoring.  When calculating
hypothesis score for each generated hypothesis, its trellis expansion
and viterbi operation will be pruned in the middle of the speech if
score on a frame goes under [current maximum score of the frame-
width].  Giving small value makes computation cost of the second pass
smaller, but computation error may occur.  (default: 80.0)
.Ip "\-s stack_size"
The maximum number of hypothesis that can be stored on the stack
during the search.  A larger value may give more stable results, but
increases the amount of memory required. (default: 500) 
.Ip "\-m overflow_pop_times"
Number of expanded hypotheses required to discontinue the search.  If
the number of expanded hypotheses is greater then this threshold then,
the search is discontinued at that point.  The larger this value is,
the longer the search will continue, but processing time for search
failures will also increase. (default: 2000)
.Ip "\-lookuprange nframe"
When performing word expansion, this option sets the number of frames
before and after in which to determine next word hypotheses.  This
prevents the omission of short words but, with a large value, the
number of expanded hypotheses increases and system becomes
slow. (default: 5)
.Ip "\-looktrellis"
Expand only the trellis words instead of grammar-permitted words.
This option makes second pass decoding faster, but may increase
deletion error of short words. (default: disabled)
.SS "Forced Alignment"
.Ip "\-walign"
Do viterbi alignment per word units from the recognition result.  The
word boundary frames and the average acoustic scores per frame are
calculated.
.Ip "\-palign"
Do viterbi alignment per phoneme (model) units from the recognition
result.  The phoneme boundary frames and the average acoustic scores per
frame are calculated.
.Ip "\-salign"
Do viterbi alignment per HMM state from the recognition result.  The
state boundary frames and the average acoustic scores per frame are
calculated.
.SS Server Module Mode
.Ip "\-module [port]"
Run Julian on "Server Module Mode".  After startup, Julian waits for
tcp/ip connection from client.  Once connection is established, Julian
start communication with the client to process incoming commands from
the client, or to output recognition results, input trigger
information and other system status to the client.  The multi-grammar
mode is only supported at this Server Module Mode.  The default port
number is 10500.
.Ip "\-outcode [W][L][P][S][w][l][p][s]"
(Only for Server Module Mode) Switch which symbols of recognized words to
be sent to client.  Specify 'W' for output symbol, 'L' for grammar
entry, 'P' for phoneme sequence, 'S' for score, respectively.  Capital
letters are for the second pass (final result), and small letters are
for results of the first pass.  For example, if you want to send only
the output symbols and phone sequences as a recognition result to a
client, specify "-outcode WP".
.SS Message Output
.Ip "\-quiet"
Omit phoneme sequence and score, only output the best word sequence
hypothesis.
.Ip "\-progout"
Enable progressive output of the partial results on the first pass at
regular intervals.
.Ip "\-proginterval msec"
set the output time interval of "-progout" in milliseconds.
.Ip "\-demo"
Equivalent to "-progout -quiet"
.SS OTHERS
.Ip "\-debug"
(For debug) display internal status and debug information.
.Ip "\-C jconffile"
Load the jconf file.  The options written in the file are included and
expanded at the point.  This option can also be used within other
jconf file.
.Ip "\-check wchmm"
(For debug) turn on interactive check mode of tree lexicon structure
at startup.
.Ip "\-check triphone"
(For debug) turn on interactive check mode of model mapping between 
Acoustic model, HMMList and dictionary at startup.
.Ip "\-version"
Display version information and exit.
.Ip "\-help "
Display a brief description of all options.
.SH "EXAMPLES"
For examples of system usage, refer to the tutorial section in the
Julian documents.
.SH "NOTICE"
Note about path names in jconf files: relative paths in a jconf file
are interpreted as relative to the jconf file itself, not to the
current directory.
.SH "SEE ALSO"
julius(1), mkbingram(1), mkss(1), jcontrol(1), adinrec(1), adintool(1), mkdfa(1),
mkgsmm(1), wav2mfcc(1)
.PP
http://julius.sourceforge.jp/  (main)
.br
http://sourceforge.jp/projects/julius/ (development site)
.SH DIAGNOSTICS
Julian normally will return the exit status 0.  If an error occurs,
Julian exits abnormally with exit status 1.  If an input file cannot be
found or cannot be loaded for some reason then Julian will skip
processing for that file.
.SH BUGS
There are some restrictions to the type and size of the models Julian
can use.  For a detailed explanation refer to the Julius documentation.
For bug-reports, inquires and comments please contact
julius@kuis.kyoto-u.ac.jp or julius@is.aist-nara.ac.jp.
.SH AUTHORS
.Ip "Rev.1.0 (1998/07/20)"
Designed by Tatsuya KAWAHARA and Akinobu LEE (Kyoto University)
.Ip "Rev.2.0 (1999/02/20)"
.Ip "Rev.2.1 (1999/04/20)"
.Ip "Rev.2.2 (1999/10/04)"
.Ip "Rev.3.1 (2000/05/11)"
Development of above versions by Akinobu LEE (Kyoto University)
.Ip "Rev.3.2 (2001/08/15)"
.Ip "Rev.3.3 (2002/09/11)"
Development of above versions by Akinobu LEE (Nara Institute of
Science and Technology)
.SH "THANKS TO"
From Rev.3.2 Julian is released in the "Information Processing
Society, Continuous Speech Consortium".
.PP
The Windows Microsoft Speech API compatible version was developed by
Takashi SUMIYOSHI (Kyoto University).
