kylm.model.ngram
Class NgramLM

java.lang.Object
  extended by kylm.model.LanguageModel
      extended by kylm.model.ngram.NgramLM
All Implemented Interfaces:
java.io.Serializable

public class NgramLM
extends LanguageModel
implements java.io.Serializable

A class that implements a normal n-gram model

Author:
neubig
See Also:
Serialized Form

Constructor Summary
NgramLM()
           
NgramLM(int n)
          A constructor that creates a model of size n
NgramLM(int n, NgramSmoother smoother)
           
 
Method Summary
 void countNgrams(java.lang.Iterable<java.lang.String[]> sl)
          Count the ngrams in the corpus
 boolean equals(java.lang.Object obj)
           
 void expandUnknowns()
          Expand unknown words in the vocabulary explicitly (useful for WFSTs) TODO: This only works for unigrams TODO: This assigns a uniform probability, doesn't take unknown word models into account
 int getN()
          Get the length of the n-gram context
 int[] getNgramCounts()
          Get the number of n-grams at each level
 java.lang.String getNodeName(NgramNode child)
           
 BranchNode getRoot()
          Get the root node of the n-gram Tree
 NgramSmoother getSmoother()
           
 float[] getWordEntropies(int[] iids)
          Get the entropies of every word in a sentence by ID.
 float getWordEntropy(int[] ids, int pos)
          Get the entropies of the last word in the sequence by ID
 java.lang.String printReport()
           
 void setN(int n)
          Set the length of the n-gram context
 void setNgramCounts(int[] cs)
           
 void setSmoother(NgramSmoother smoother)
           
 void trainModel(java.lang.Iterable<java.lang.String[]> sl)
           
 
Methods inherited from class kylm.model.LanguageModel
findUnknownId, getClassEntropies, getClassMap, getCountTerminals, getDebug, getId, getMaxLength, getName, getRegex, getSentenceClassEntropy, getSentenceEntropy, getSentenceIds, getSentenceSimpleEntropy, getSentenceUnknownEntropy, getSimpleEntropies, getStartSymbol, getSymbol, getTerminalSymbol, getUnknownEntropies, getUnknownModelCount, getUnknownModels, getUnknownSymbol, getVocab, getVocabFrequency, getVocabLimit, getVocabulary, getWordEntropies, importVocabulary, isClosed, isInVocab, isInVocab, setClassMap, setClosed, setCountTerminals, setDebug, setMaxLength, setName, setRegex, setStartSymbol, setSymbol, setTerminalSymbol, setUnknownModels, setUnknownSymbol, setVocab, setVocabFrequency, setVocabLimit, setVocabulary
 
Methods inherited from class java.lang.Object
getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

NgramLM

public NgramLM(int n)
A constructor that creates a model of size n

Parameters:
n - the length of the context of the n-gram model

NgramLM

public NgramLM(int n,
               NgramSmoother smoother)

NgramLM

public NgramLM()
Method Detail

getWordEntropies

public float[] getWordEntropies(int[] iids)
Description copied from class: LanguageModel
Get the entropies of every word in a sentence by ID. The version implemented in LanguageModel calls getWordEntropy individually for each value, but might be overridden for higher efficiency.

Overrides:
getWordEntropies in class LanguageModel
Parameters:
iids - The IDs of the words in the sentence. Will always start and end with the sentence terminal symbol.
Returns:
An array of entropies of length ids.length-1. The first non-terminal symbol need not be assigned an entropy.

getWordEntropy

public float getWordEntropy(int[] ids,
                            int pos)
Description copied from class: LanguageModel
Get the entropies of the last word in the sequence by ID

Specified by:
getWordEntropy in class LanguageModel
Parameters:
ids - The IDs of the words in the sentence. Will always start and end with the sentence terminal symbol.
pos - The position of the word to be judged in ids
Returns:
The entropy of the word at position pos given the rest as context

trainModel

public void trainModel(java.lang.Iterable<java.lang.String[]> sl)
                throws java.io.IOException
Specified by:
trainModel in class LanguageModel
Throws:
java.io.IOException

countNgrams

public void countNgrams(java.lang.Iterable<java.lang.String[]> sl)
                 throws java.io.IOException
Count the ngrams in the corpus

Parameters:
sl - An iterator of sentences in the corpus
Throws:
java.io.IOException

getRoot

public BranchNode getRoot()
Get the root node of the n-gram Tree

Returns:
The root node of the n-gram tree

getN

public int getN()
Get the length of the n-gram context

Returns:
The length

setN

public void setN(int n)
Set the length of the n-gram context

Parameters:
n - The length

expandUnknowns

public void expandUnknowns()
Expand unknown words in the vocabulary explicitly (useful for WFSTs) TODO: This only works for unigrams TODO: This assigns a uniform probability, doesn't take unknown word models into account


equals

public boolean equals(java.lang.Object obj)
Overrides:
equals in class LanguageModel

getNgramCounts

public int[] getNgramCounts()
Get the number of n-grams at each level

Returns:
An array containing the number of n-gram counts at each level

getSmoother

public NgramSmoother getSmoother()

setSmoother

public void setSmoother(NgramSmoother smoother)

setNgramCounts

public void setNgramCounts(int[] cs)

getNodeName

public java.lang.String getNodeName(NgramNode child)

printReport

public java.lang.String printReport()
Specified by:
printReport in class LanguageModel