Iterate over a list and a dictionary to find mutations in a DNA sequence

Question

Iterate over a list and a dictionary to find mutations in a DNA sequence

Navigation

#1 by (4 votes)

3

I want to find a series of mutations by comparing a couple of sequences. The mutations are defined in two dictionaries and the two sequences have been converted into a list of tuples per position with zip() .

I have written the following code, but it returns a value much higher than the length of the sequences:

transicion = {'A':'G', 'G':'A', 'T':'C', 'C':'T'}

transversion = {'A':('T','C'), 'G':('T','C'), 'T':('A','G'), 'C':('A','G')}

def rel_trans(s1,s2):

    trans_result = 0

    for x in len(s1):

        for i,j in zip(s1, s2):

            for (i,j),(i1,j1) in zip(transicion.items(), transversion.items()):
            trans_result += 1

    return trans_result

s1 = 'CAACGCA'

s2 = 'TGTCTGA'

print rel_trans(s1,s2) # maximo deberia salir 7 y sale un valor mucho mayor

python python-2.7

asked by user45731 01.06.2017 в 13:34

source

1 answer

How Does Inheritance Work Between Classes? Pass a std :: vector by reference (Avoid Copy)

score 4 · Answer 1

The problem is that at no time are you comparing one base with another to see if there is a mutation or not. You should do something like:

transicion = {'A':'G', 'G':'A', 'T':'C', 'C':'T'}
transversion = {'A':('T','C'), 'G':('T','C'), 'T':('A','G'), 'C':('A','G')}

def rel_trans(s1,s2):
    res = 0
    for i,j in zip(s1, s2):
        if transicion[i] == j or j in transversion[i]:
            res += 1
    return (res)

Or using generator compression:

def rel_trans(s1,s2):
    return sum(transicion[i] == j or j in transversion[i] for i,j in zip(s1, s2))

s1 = 'CAACGCA'
s2 = 'TGTCTGA'
print(rel_trans(s1,s2))

Exit:

5

This really makes little sense if you do not get transversions and transitions separately. In the previous way you get all the substitution mutations possible and for that you do not need the dictionaries (unless you only look for some specific ones and not the 12 possible ones) since that is enough to do (if they are valid chains that only contain the characters A, G, T, C):

def rel_trans(s1,s2):
    return sum(i==j for i, j in zip(s1, s2))

Something more informative would be:

transicion = {'A':'G', 'G':'A', 'T':'C', 'C':'T'}
transversion = {'A':('T','C'), 'G':('T','C'), 'T':('A','G'), 'C':('A','G')}

def  mutaciones_por_sustitucion(s1, s2):
    ts = [(ind+1, i, j) for ind, (i, j) in enumerate(zip(s1, s2)) if transicion[i] == j]
    tv = [(ind+1, i, j) for ind, (i, j) in enumerate(zip(s1, s2)) if j in transversion[i]]

    print('Encontradas {} mutaciones por sustitucion:'. format(len(ts)+len(tv)))
    print('    Transiciones ({}): '.format(len(ts)))
    for m in ts:
        print('        Posicion {}: {} cambiada por {}'.format(m[0], m[1], m[2]))

    print('    Transversiones ({}): '.format(len(tv)))
    for m in tv:
        print('        Posicion {}: {} cambiada por {}'.format(m[0], m[1], m[2]))


s1 = 'CAACGCA'
s2 = 'TGTCTGA'
mutaciones_por_sustitucion(s1, s2)

Exit:

Encontradas 5 mutaciones por sustitucion:
    Transiciones (2): 
        Posicion 1: C cambiada por T
        Posicion 2: A cambiada por G
    Transversiones (3): 
        Posicion 3: A cambiada por T
        Posicion 5: G cambiada por T
        Posicion 6: C cambiada por G