Parsing NLP with external list

1

Having an input text that will pass through the grammar and the output must be all the entries that the grammar finds in the text. The problem is that my non-terminals are external list files and I can not find a way to do it.

Example of a pseudo-code:

  • Open a text
  • Pass the grammar (only one example):

    grammar ("""    
    S -> NP VP    
    NP -> DET N    
    VP -> V N    
    DET -> **lista_det.txt**    
    N -> **lista_n.txt**
    V -> **lista.txt** """)
    
  • Print the results of the text that obey the grammar

  • Example:

    with open ("corpus_risque.txt", "r") as f:
        texte = f.read()
    
        grammar = nltk.parse_cfg("""
        S-> NP VP
        NP -> DET N
        VP -> V N 
        DET -> lista_det.txt
        N -> lista_n.txt
        V -> lista.txt""")
    
        parser = nltk.ChartParser(grammar)
        parsed = parser.parse(texte)
        print(texte)
    

    Normally, grammars are presented in this way, already in lists:

    grammar = nltk.parse_cfg("""
    
    S -> NP VP
    VP -> VBZ NP PP
    PP -> IN NP
    NP -> NNP | DT JJ NN NN | NN
    NNP -> 'Python'
    VBZ -> 'is'
    DT -> 'a'
    JJ -> 'good'
    NN -> 'programming' | 'language' | 'research'
    IN -> 'for'
    """)
    

    Would it be possible to do what I want?

        
    asked by pitanga 31.08.2017 в 17:22
    source

    1 answer

    1

    If I did not misunderstand the problem, what you can do is the following: Suppose you have a lista_det.txt file in the following way:

    el
    la
    un | una
    

    We can read it line by line and concatenate it to a variable, for example:

    gramatica ="""
    S-> NP VP
    NP -> DET N
    VP -> V N 
    """
    
    with open("lista_det.txt", "r") as f:
      gramatica = gramatica + "".join(["DET -> {0}".format(line) for line in f])
    
    print(gramatica)
    

    Basically we go through all the lines of the file lista_det.txt and we concatenate them to the variable gramatica putting the text DET -> , everything would end up generating something like this:

    S-> NP VP
    NP -> DET N
    VP -> V N 
    DET -> el
    DET -> la
    DET -> un | una
    

    We repeat the procedure for the lists of nouns and verbs and then simply load the grammar directly from the variable gramatica that we have filled:

    grammar = nltk.parse_cfg(gramatica)
    
        
    answered by 31.08.2017 / 19:01
    source