Iterate over a list and a dictionary to find mutations in a DNA sequence

3

I want to find a series of mutations by comparing a couple of sequences. The mutations are defined in two dictionaries and the two sequences have been converted into a list of tuples per position with zip() .

I have written the following code, but it returns a value much higher than the length of the sequences:

transicion = {'A':'G', 'G':'A', 'T':'C', 'C':'T'}

transversion = {'A':('T','C'), 'G':('T','C'), 'T':('A','G'), 'C':('A','G')}

def rel_trans(s1,s2):

    trans_result = 0

    for x in len(s1):

        for i,j in zip(s1, s2):

            for (i,j),(i1,j1) in zip(transicion.items(), transversion.items()):
            trans_result += 1

    return trans_result

s1 = 'CAACGCA'

s2 = 'TGTCTGA'

print rel_trans(s1,s2) # maximo deberia salir 7 y sale un valor mucho mayor
    
asked by user45731 01.06.2017 в 15:34
source

1 answer

4

The problem is that at no time are you comparing one base with another to see if there is a mutation or not. You should do something like:

transicion = {'A':'G', 'G':'A', 'T':'C', 'C':'T'}
transversion = {'A':('T','C'), 'G':('T','C'), 'T':('A','G'), 'C':('A','G')}

def rel_trans(s1,s2):
    res = 0
    for i,j in zip(s1, s2):
        if transicion[i] == j or j in transversion[i]:
            res += 1
    return (res)

Or using generator compression:

def rel_trans(s1,s2):
    return sum(transicion[i] == j or j in transversion[i] for i,j in zip(s1, s2))

s1 = 'CAACGCA'
s2 = 'TGTCTGA'
print(rel_trans(s1,s2))

Exit:

  

5

This really makes little sense if you do not get transversions and transitions separately. In the previous way you get all the substitution mutations possible and for that you do not need the dictionaries (unless you only look for some specific ones and not the 12 possible ones) since that is enough to do (if they are valid chains that only contain the characters A, G, T, C):

def rel_trans(s1,s2):
    return sum(i==j for i, j in zip(s1, s2))

Something more informative would be:

transicion = {'A':'G', 'G':'A', 'T':'C', 'C':'T'}
transversion = {'A':('T','C'), 'G':('T','C'), 'T':('A','G'), 'C':('A','G')}

def  mutaciones_por_sustitucion(s1, s2):
    ts = [(ind+1, i, j) for ind, (i, j) in enumerate(zip(s1, s2)) if transicion[i] == j]
    tv = [(ind+1, i, j) for ind, (i, j) in enumerate(zip(s1, s2)) if j in transversion[i]]

    print('Encontradas {} mutaciones por sustitucion:'. format(len(ts)+len(tv)))
    print('    Transiciones ({}): '.format(len(ts)))
    for m in ts:
        print('        Posicion {}: {} cambiada por {}'.format(m[0], m[1], m[2]))

    print('    Transversiones ({}): '.format(len(tv)))
    for m in tv:
        print('        Posicion {}: {} cambiada por {}'.format(m[0], m[1], m[2]))


s1 = 'CAACGCA'
s2 = 'TGTCTGA'
mutaciones_por_sustitucion(s1, s2)

Exit:

Encontradas 5 mutaciones por sustitucion:
    Transiciones (2): 
        Posicion 1: C cambiada por T
        Posicion 2: A cambiada por G
    Transversiones (3): 
        Posicion 3: A cambiada por T
        Posicion 5: G cambiada por T
        Posicion 6: C cambiada por G
    
answered by 01.06.2017 в 16:22