The problem is that according to the rules of alphanumeric nomenclature begins to be numbered by the end that has a closer radical.
Just as you have your code, it always begins to number from left to right when in the example you should start to number from right to left.
To solve it you simply have to see which side of the list you have before with a CH
, if it is for the end you simply have to invert the list, do the same thing you do and then invert it again.
The code could look like this:
def identidicar_radicales(cadena):
for i, atomo in enumerate(cadena):
if cadena[i] == "CH":
cadena[i] = str(i + 1)
return cadena
archivo = open("molecula.txt","r")
lineas = archivo.readlines()
archivo.close()
cad = lineas[len(lineas)//2].rstrip('\n')
cad2 = cad.split("-")
cad2_invertida = cad2[::-1]
if cad2.index('CH') > cad2_invertida.index('CH'):
cad2 = identidicar_radicales(cad2_invertida)[::-1]
else:
cad2 = identidicar_radicales(cad2)
print cad2
Your cycle while
passed to a for
with enumerate()
that makes the variable i
unnecessary since it returns in a tuple each elemnto with its index.
Using your example to create a txt, we get the following output:
['CH3', 'CH2', 'CH2', '6', '5', 'CH2', '3', 'CH2', 'CH3']
Update:
Since there are other similar questions that have been marked as a duplicate of this one and that ask about how to identify the longest chain I will expose a possible way to do it by implementing a grafo
to represent the molecule:
-
We read the data of txt
and create a grafo
(a tree) in which each carbon is a nodo
or vértice
and each link a arista
. To do this, we usually use diccionarios
and represent the graph as adjacency lists . Since the nodes have to have different names, we rename each carbon, for example, by numbering them in order of appearance in txt
. With an independent dictionary we can save the carbon associated with each number. To parse the txt
the use of regular expressions is very useful (see module re
of Python).
For the given example here would be a graph like the following:
The graph would be represented as a dictionary in the following way:
{1: {6}, 2: {9}, 3: {4}, 4: {3, 5}, 5: {4, 6}, 6: {1, 5, 7}, 7: {8, 12, 6}, 8: {9, 7}, 9: {8, 2, 10}, 10: {9, 11}, 11: {10}, 12: {13, 7}, 13: {12, 14}, 14: {13, 15}, 15: {14}}
Where each key
is the name of a node and has as valor
a set
with the adjacent nodes (with which it has links).
The dictionary with the translations (to later reconstruct the molecule or its chains) could be:
{1:'CH3', 2:'CH3', 3:'CH3', 4:'CH2', 5:'CH2', 6:'CH', 7:'CH', 8:'CH', 9:'CH', 10:'CH2', 11'CH3':, 12:'CH2', 13:'CH2', 14:'CH2', 15:'CH3'}
-
Once we have the graph, finding the longest chain or chains is reduced to finding the longest paths between the leaves of the tree, that is, we look for the paths between the nodes that they only have one edge .
In this case the longest roads have 9 carbons and are the following:
[3, 4, 5, 6, 7, 8, 9, 10, 11]
[3, 4, 5, 6, 7, 12, 13, 14, 15]
[11, 10, 9, 8, 7, 12, 13, 14, 15]
To get the path between two nodes we can use the search in width (BFS algorithm) that is easy to implement.
-
Now we have to continue working with the facilities that the graph gives us with the rest of the rules of the alphanumeric nomenclature, in order to obtain outputs with the name of the molecule or its correct graphic representation.
Since this is usually the subject of school assignments my goal was only to give a possible idea without providing the code since it does not fall within the philosophy of Stack Overflow to provide complete answers to school tasks. It can be implemented using OOP in about 80 lines perfectly and once the graph is obtained, handling the molecule is really simple. The same idea can be extended to other types of molecules with double bonds, rings, etc.