Python 2.7 Alkane Nomenclature

1

I need to program the following situation in Python 2.7 ...

  • ENTRY : Text file (.txt) with the formula of an alkane.
  • OUTPUT : Carbon number where a structure is coupled to the longest chain next to said structure separated by "-". Example: 1-CH3, 2-CH2-CH3.

                   CH3       CH3            #Ejemplo del contenido del
                     |         |            #archivo de texto
        CH3-CH2-CH2-CH-CH-CH2-CH-CH2-CH3    # Nombre de la molécula:
                        |                   #5-butil-3,6-dimetilnonano
                       CH2-CH2-CH2-CH3
    

What I'm looking for is something simpler than the name. I need the program to tell me the carbon number in which a structure is attached to the longest chain, for example the name of the molecule says 5-butil , what I want to program is to obtain a similar result, but in this case for my program would be 5-CH2-CH2-CH2-CH3 . I have tried, but I do not achieve close results. I leave an example of what I have, it is little and maybe it does not serve much, but I do not know how to approach the problem.

archivo = open("rama.txt","r") #Ramas es el nombre del archivo con el mismo contenido
lineas = archivo.readlines() #Guardo las líneas del archivo en listas
cad = lineas[len(lineas)/2]  #Almaceno la cadena más larga en la variable cad
cad2 = cad                   #Variable auxiliar para no modificar la original
cad2 = cad2.split("-")       #Creo una lista de la cadena más larga separando elementos por el caracter "-"
i = 1
while i < len(cad2):           #Itero sobre la cadena larga para reemplazar los CH por números (como string)
     if cad2[i] == "CH":
          cad2[i] = str(i + 1)
     i += 1
print cad2

Exit:

['CH3', 'CH2', 'CH2', '4', '5', 'CH2', '7', 'CH2', 'CH3\n']

It is not the order I should have since 7 should be a 3 and 4 a 6, but I have not thought of how to correct it.

PS: I do not know the subject of "graphs".

    
asked by Tamos 26.11.2016 в 16:48
source

1 answer

1

The problem is that according to the rules of alphanumeric nomenclature begins to be numbered by the end that has a closer radical.

Just as you have your code, it always begins to number from left to right when in the example you should start to number from right to left.

To solve it you simply have to see which side of the list you have before with a CH , if it is for the end you simply have to invert the list, do the same thing you do and then invert it again.

The code could look like this:

def identidicar_radicales(cadena):
    for i, atomo in enumerate(cadena):
        if cadena[i] == "CH":
             cadena[i] = str(i + 1)
    return cadena


archivo = open("molecula.txt","r")
lineas = archivo.readlines()
archivo.close()

cad = lineas[len(lineas)//2].rstrip('\n')
cad2 = cad.split("-")
cad2_invertida = cad2[::-1]

if cad2.index('CH') > cad2_invertida.index('CH'):
    cad2 = identidicar_radicales(cad2_invertida)[::-1]

else:
    cad2 = identidicar_radicales(cad2)

print cad2

Your cycle while passed to a for with enumerate() that makes the variable i unnecessary since it returns in a tuple each elemnto with its index.

Using your example to create a txt, we get the following output:

['CH3', 'CH2', 'CH2', '6', '5', 'CH2', '3', 'CH2', 'CH3']

Update:

Since there are other similar questions that have been marked as a duplicate of this one and that ask about how to identify the longest chain I will expose a possible way to do it by implementing a grafo to represent the molecule:

  • We read the data of txt and create a grafo (a tree) in which each carbon is a nodo or vértice and each link a arista . To do this, we usually use diccionarios and represent the graph as adjacency lists . Since the nodes have to have different names, we rename each carbon, for example, by numbering them in order of appearance in txt . With an independent dictionary we can save the carbon associated with each number. To parse the txt the use of regular expressions is very useful (see module re of Python).

    For the given example here would be a graph like the following:

    The graph would be represented as a dictionary in the following way:

    {1: {6}, 2: {9}, 3: {4}, 4: {3, 5}, 5: {4, 6}, 6: {1, 5, 7}, 7: {8, 12, 6}, 8: {9, 7}, 9: {8, 2, 10}, 10: {9, 11}, 11: {10}, 12: {13, 7}, 13: {12, 14}, 14: {13, 15}, 15: {14}}
    

    Where each key is the name of a node and has as valor a set with the adjacent nodes (with which it has links).

    The dictionary with the translations (to later reconstruct the molecule or its chains) could be:

    {1:'CH3', 2:'CH3', 3:'CH3', 4:'CH2', 5:'CH2', 6:'CH', 7:'CH', 8:'CH', 9:'CH', 10:'CH2', 11'CH3':, 12:'CH2', 13:'CH2', 14:'CH2', 15:'CH3'}
    
  • Once we have the graph, finding the longest chain or chains is reduced to finding the longest paths between the leaves of the tree, that is, we look for the paths between the nodes that they only have one edge .

    In this case the longest roads have 9 carbons and are the following:

    [3, 4, 5, 6, 7, 8, 9, 10, 11]
    [3, 4, 5, 6, 7, 12, 13, 14, 15]
    [11, 10, 9, 8, 7, 12, 13, 14, 15]
    

    To get the path between two nodes we can use the search in width (BFS algorithm) that is easy to implement.

  • Now we have to continue working with the facilities that the graph gives us with the rest of the rules of the alphanumeric nomenclature, in order to obtain outputs with the name of the molecule or its correct graphic representation.

Since this is usually the subject of school assignments my goal was only to give a possible idea without providing the code since it does not fall within the philosophy of Stack Overflow to provide complete answers to school tasks. It can be implemented using OOP in about 80 lines perfectly and once the graph is obtained, handling the molecule is really simple. The same idea can be extended to other types of molecules with double bonds, rings, etc.

    
answered by 27.11.2016 / 01:40
source