Parsing NLP with external list

Question

Parsing NLP with external list

Navigation

#1 by (1 votes)

1

Having an input text that will pass through the grammar and the output must be all the entries that the grammar finds in the text. The problem is that my non-terminals are external list files and I can not find a way to do it.

Example of a pseudo-code:

Open a text

Pass the grammar (only one example):

grammar ("""    
S -> NP VP    
NP -> DET N    
VP -> V N    
DET -> **lista_det.txt**    
N -> **lista_n.txt**
V -> **lista.txt** """)

Print the results of the text that obey the grammar

Example:

with open ("corpus_risque.txt", "r") as f:
    texte = f.read()

    grammar = nltk.parse_cfg("""
    S-> NP VP
    NP -> DET N
    VP -> V N 
    DET -> lista_det.txt
    N -> lista_n.txt
    V -> lista.txt""")

    parser = nltk.ChartParser(grammar)
    parsed = parser.parse(texte)
    print(texte)

Normally, grammars are presented in this way, already in lists:

grammar = nltk.parse_cfg("""

S -> NP VP
VP -> VBZ NP PP
PP -> IN NP
NP -> NNP | DT JJ NN NN | NN
NNP -> 'Python'
VBZ -> 'is'
DT -> 'a'
JJ -> 'good'
NN -> 'programming' | 'language' | 'research'
IN -> 'for'
""")

Would it be possible to do what I want?

python python-3.x parsear

asked by pitanga 31.08.2017 в 15:22

source

1 answer

Open asp: visible panel = true from a link in a different file C ++ - Problems with Char variable input

score 1 · Accepted Answer

If I did not misunderstand the problem, what you can do is the following: Suppose you have a lista_det.txt file in the following way:

el
la
un | una

We can read it line by line and concatenate it to a variable, for example:

gramatica ="""
S-> NP VP
NP -> DET N
VP -> V N 
"""

with open("lista_det.txt", "r") as f:
  gramatica = gramatica + "".join(["DET -> {0}".format(line) for line in f])

print(gramatica)

Basically we go through all the lines of the file lista_det.txt and we concatenate them to the variable gramatica putting the text DET -> , everything would end up generating something like this:

S-> NP VP
NP -> DET N
VP -> V N 
DET -> el
DET -> la
DET -> un | una

We repeat the procedure for the lists of nouns and verbs and then simply load the grammar directly from the variable gramatica that we have filled:

grammar = nltk.parse_cfg(gramatica)