Usage: arrow [OPTION...] [ARG...]
Arrow -- a document retrieval front-end to libbow

 For building data structures from text files:
  -i, --index                tokenize training documents found under ARG...,
                             build weight vectors, and save them to disk

 For doing document retreival using the data structures built with -i:
  -c, --compare=FILE         Print the TFIDF cosine similarity metric of the
                             query with this FILE.
  -n, --num-hits-to-show=N   Show the N documents that are most similar to the
                             query text (default N=1)
      --query-forking-server=PORTNUM
                             Run arrow in socket server mode, forking a new
                             process with every connection.  Allows multiple
                             simultaneous connections.
      --query-server=PORTNUM Run arrow in socket server mode.
  -q, --query[=FILE]         tokenize input from stdin [or FILE], then print
                             document most like it

 Diagnostics
      --print-coo            Print word co-occurrence statistics.
      --print-idf            Print, in unsorted order the IDF of all words in
                             the model's vocabulary

 General options
      --annotations=FILE     The sarray file containing annotations for the
                             files in the index
  -b, --no-backspaces        Don't use backspace when verbosifying progress
                             (good for use in emacs)
  -d, --data-dir=DIR         Set the directory in which to read/write
                             word-vector data (default=~/.<program_name>).
      --random-seed=NUM      The non-negative integer to use for seeding the
                             random number generator
      --score-precision=NUM  The number of decimal digits to print when
                             displaying document scores
  -v, --verbosity=LEVEL      Set amount of info printed while running;
                             (0=silent, 1=quiet, 2=show-progess,...5=max)

 Lexing options
      --append-stoplist-file=FILE
                             Add words in FILE to the stoplist.
      --exclude-filename=FILENAME
                             When scanning directories for text files, skip
                             files with name matching FILENAME.
  -g, --gram-size=N          Create tokens for all 1-grams,... N-grams.
  -h, --skip-header          Avoid lexing news/mail headers by scanning forward
                             until two newlines.
      --istext-avoid-uuencode   Check for uuencoded blocks before saying that
                             the file is text, and say no if there are many
                             lines of the same length.
      --lex-pipe-command=SHELLCMD
                             Pipe files through this shell command before
                             lexing them.
      --max-num-words-per-document=N
                             Only tokenize the first N words in each document.
      --no-stemming          Do not modify lexed words with a stemming
                             function. (usually the default, depending on
                             lexer)
      --replace-stoplist-file=FILE
                             Empty the default stoplist, and add
                             space-delimited words from FILE.
      --shortest-word=LENGTH Toss lexed words that are shorter than LENGTH.
                             Default is usually 2.
  -s, --no-stoplist          Do not toss lexed words that appear in the
                             stoplist.
  -S, --use-stemming         Modify lexed words with the `Porter' stemming
                             function.
      --use-stoplist         Toss lexed words that appear in the stoplist.
                             (usually the default SMART stoplist, depending on
                             lexer)
      --use-unknown-word     When used in conjunction with -O or -D, captures
                             all words with occurrence counts below threshold
                             as the `<unknown>' token
      --xxx-words-only       Only tokenize words with `xxx' in them

 Mutually exclusive choice of lexers
      --flex-mail            Use a mail-specific flex lexer
      --flex-tagged          Use a tagged flex lexer
  -H, --skip-html            Skip HTML tokens when lexing.
      --lex-alphanum         Use a special lexer that includes digits in
                             tokens, delimiting tokens only by non-alphanumeric
                             characters.
      --lex-infix-string=ARG Use only the characters after ARG in each word for
                             stoplisting and stemming.  If a word does not
                             contain ARG, the entire word is used.
      --lex-suffixing        Use a special lexer that adds suffixes depending
                             on Email-style headers.
      --lex-white            Use a special lexer that delimits tokens by
                             whitespace only, and does not change the contents
                             of the token at all---no downcasing, no stemming,
                             no stoplist, nothing.  Ideal for use with an
                             externally-written lexer interfaced to rainbow
                             with --lex-pipe-cmd.

 Feature-selection options
  -D, --prune-vocab-by-doc-count=N
                             Remove words that occur in N or fewer documents.
  -O, --prune-vocab-by-occur-count=N
                             Remove words that occur less than N times.
  -T, --prune-vocab-by-infogain=N
                             Remove all but the top N words by selecting words
                             with highest information gain.

 Weight-vector setting/scoring method options
      --binary-word-counts   Instead of using integer occurrence counts of
                             words to set weights, use binary absence/presence.
                            
      --event-document-then-word-document-length=NUM
                             Set the normalized length of documents when
                             --event-model=document-then-word
      --event-model=EVENTNAME   Set what objects will be considered the
                             `events' of the probabilistic model.  EVENTNAME
                             can be one of: word, document, document-then-word.
                              Default is `word'.
      --infogain-event-model=EVENTNAME
                             Set what objects will be considered the `events'
                             when information gain is calculated.  EVENTNAME
                             can be one of: word, document, document-then-word.
                              Default is `document'.
  -m, --method=METHOD        Set the word weight-setting method; METHOD may be
                             one of: tfidf_words, tfidf_log_words,
                             tfidf_log_occur, tfidf, default=naivebayes.
      --print-word-scores    During scoring, print the contribution of each
                             word to each class.
      --smoothing-goodturing-k=NUM
                             Smooth word probabilities for words that occur NUM
                             or less times. The default is 7.
      --smoothing-method=METHOD   Set the method for smoothing word
                             probabilities to avoid zeros; METHOD may be one
                             of: goodturing, laplace, mestimate, wittenbell
      --uniform-class-priors When setting weights, calculating infogain and
                             scoring, use equal prior probabilities on
                             classes.

  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

Report bugs to <mccallum@cs.cmu.edu>.