NgramLM

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

kylm.model.ngram
Class NgramLM

java.lang.Object
  kylm.model.LanguageModel
      kylm.model.ngram.NgramLM

All Implemented Interfaces:: java.io.Serializable

public class NgramLM
extends LanguageModel
implements java.io.Serializable
extends LanguageModel
implements java.io.Serializable

A class that implements a normal n-gram model

Author:: neubig
See Also:: Serialized Form

Constructor Summary
`NgramLM()`
`NgramLM(int n)` A constructor that creates a model of size n
`NgramLM(int n, NgramSmoother smoother)`

Method Summary
`void`	`countNgrams(java.lang.Iterable<java.lang.String[]> sl)` Count the ngrams in the corpus
`boolean`	`equals(java.lang.Object obj)`
`void`	`expandUnknowns()` Expand unknown words in the vocabulary explicitly (useful for WFSTs) TODO: This only works for unigrams TODO: This assigns a uniform probability, doesn't take unknown word models into account
`int`	`getN()` Get the length of the n-gram context
`int[]`	`getNgramCounts()` Get the number of n-grams at each level
`java.lang.String`	`getNodeName(NgramNode child)`
`BranchNode`	`getRoot()` Get the root node of the n-gram Tree
`NgramSmoother`	`getSmoother()`
`float[]`	`getWordEntropies(int[] iids)` Get the entropies of every word in a sentence by ID.
`float`	`getWordEntropy(int[] ids, int pos)` Get the entropies of the last word in the sequence by ID
`java.lang.String`	`printReport()`
`void`	`setN(int n)` Set the length of the n-gram context
`void`	`setNgramCounts(int[] cs)`
`void`	`setSmoother(NgramSmoother smoother)`
`void`	`trainModel(java.lang.Iterable<java.lang.String[]> sl)`

Methods inherited from class kylm.model.LanguageModel
findUnknownId, getClassEntropies, getClassMap, getCountTerminals, getDebug, getId, getMaxLength, getName, getRegex, getSentenceClassEntropy, getSentenceEntropy, getSentenceIds, getSentenceSimpleEntropy, getSentenceUnknownEntropy, getSimpleEntropies, getStartSymbol, getSymbol, getTerminalSymbol, getUnknownEntropies, getUnknownModelCount, getUnknownModels, getUnknownSymbol, getVocab, getVocabFrequency, getVocabLimit, getVocabulary, getWordEntropies, importVocabulary, isClosed, isInVocab, isInVocab, setClassMap, setClosed, setCountTerminals, setDebug, setMaxLength, setName, setRegex, setStartSymbol, setSymbol, setTerminalSymbol, setUnknownModels, setUnknownSymbol, setVocab, setVocabFrequency, setVocabLimit, setVocabulary

Methods inherited from class kylm.model.LanguageModel

findUnknownId, getClassEntropies, getClassMap, getCountTerminals, getDebug, getId, getMaxLength, getName, getRegex, getSentenceClassEntropy, getSentenceEntropy, getSentenceIds, getSentenceSimpleEntropy, getSentenceUnknownEntropy, getSimpleEntropies, getStartSymbol, getSymbol, getTerminalSymbol, getUnknownEntropies, getUnknownModelCount, getUnknownModels, getUnknownSymbol, getVocab, getVocabFrequency, getVocabLimit, getVocabulary, getWordEntropies, importVocabulary, isClosed, isInVocab, isInVocab, setClassMap, setClosed, setCountTerminals, setDebug, setMaxLength, setName, setRegex, setStartSymbol, setSymbol, setTerminalSymbol, setUnknownModels, setUnknownSymbol, setVocab, setVocabFrequency, setVocabLimit, setVocabulary

Methods inherited from class java.lang.Object
`getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

NgramLM

public NgramLM(int n)

A constructor that creates a model of size n

Parameters:: n - the length of the context of the n-gram model

NgramLM

public NgramLM(int n,
               NgramSmoother smoother)

NgramLM

public NgramLM()

Method Detail

getWordEntropies

public float[] getWordEntropies(int[] iids)

Description copied from class: LanguageModel

Get the entropies of every word in a sentence by ID. The version implemented in LanguageModel calls getWordEntropy individually for each value, but might be overridden for higher efficiency.

Overrides:: getWordEntropies in class LanguageModel

Parameters:: iids - The IDs of the words in the sentence. Will always start and end with the sentence terminal symbol.
Returns:: An array of entropies of length ids.length-1. The first non-terminal symbol need not be assigned an entropy.

getWordEntropy

public float getWordEntropy(int[] ids,
                            int pos)

Description copied from class: LanguageModel

Get the entropies of the last word in the sequence by ID

Specified by:: getWordEntropy in class LanguageModel

Parameters:: ids - The IDs of the words in the sentence. Will always start and end with the sentence terminal symbol.; pos - The position of the word to be judged in ids
Returns:: The entropy of the word at position pos given the rest as context

trainModel

public void trainModel(java.lang.Iterable<java.lang.String[]> sl)
                throws java.io.IOException

Specified by:: trainModel in class LanguageModel

Throws:: java.io.IOException

countNgrams

public void countNgrams(java.lang.Iterable<java.lang.String[]> sl)
                 throws java.io.IOException

Count the ngrams in the corpus

Parameters:: sl - An iterator of sentences in the corpus
Throws:: java.io.IOException

getRoot

public BranchNode getRoot()

Get the root node of the n-gram Tree

Returns:: The root node of the n-gram tree

getN

public int getN()

Get the length of the n-gram context

Returns:: The length

setN

public void setN(int n)

Set the length of the n-gram context

Parameters:: n - The length

expandUnknowns

public void expandUnknowns()

Expand unknown words in the vocabulary explicitly (useful for WFSTs) TODO: This only works for unigrams TODO: This assigns a uniform probability, doesn't take unknown word models into account

equals

public boolean equals(java.lang.Object obj)

Overrides:: equals in class LanguageModel

getNgramCounts

public int[] getNgramCounts()

Get the number of n-grams at each level

Returns:: An array containing the number of n-gram counts at each level

getSmoother

public NgramSmoother getSmoother()

setSmoother

public void setSmoother(NgramSmoother smoother)

setNgramCounts

public void setNgramCounts(int[] cs)

getNodeName

public java.lang.String getNodeName(NgramNode child)

printReport

public java.lang.String printReport()

Specified by:: printReport in class LanguageModel

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

kylm.model.ngram Class NgramLM

NgramLM

NgramLM

NgramLM

getWordEntropies

getWordEntropy

trainModel

countNgrams

getRoot

getN

setN

expandUnknowns

equals

getNgramCounts

getSmoother

setSmoother

setNgramCounts

getNodeName

printReport

kylm.model.ngram
Class NgramLM