Error in conditional with estamento in: "requires string as left operand, not set"

1

I am creating a loop that will iterate over a list of given strings. For this I have created a set of data on which I will establish my conditions within the loop.

But I think I'm doing something wrong because in the Python shell I get the following error:

  

"TypeError: 'in' requires string as left operand, not set".

I do not know if it will be a problem of the created sets (same should be formed with the set() ), but I think that in theory it should work with the sets that I created.

Result=[]
Dna_bases = {'a', 't', 'c', 'g'}
Rna_bases = {'a', 'u', 'c', 'g'}

chain_list= ['ttgaatgccttacaact', 'aucgcgauacgacgu', 'aaacggacgacgxxn4']
for i in chain_list:
if Dna_bases in i and 'u' not in i:
    print(Result.append('DNA'))
elif Rna_bases in i and 't' not in i:
    print(Result.append('RNA'))
else:
    print(Result.append('UKN'))

print ('Result =', Result)

The result should be:

  

Result = ['DNA', 'RNA', 'UKN']

    
asked by Steve Jade 03.10.2017 в 22:29
source

2 answers

1

With if Dna_bases in i You are 'asking' if a set ( set ) is inside a string ("str"). You are expected to ask if one string is contained in the other, that the objects on both sides of the in are of the same type .

Anyway you would not prove anything with that logic, if the set is in the chain it does not mean that the string does not contain bases that are not in the set and different from u or t respectively.

What you want to do is see if all the bases of a chain are part of one of the sets or none at all. To do this simply create a set with the bases of each chain and make the difference of sets with Dna_bases and Rna_bases .

When you pass a string to the set constructor, a set is created with all the characters contained in the string without repetitions. For example:

>>> c = aucgcgauacgacgu
>>> set_c = set(c)
>>> set_c
{'c', 'u', 'a', 'g'}

If you make the difference of sets with another set you will get the characters that are in the first but not in the second one:

>>> {'c', 'u', 'a', 'g'} - {'a', 'u', 'c', 'g'}
set()
>>> {'c', 'u', 'a', 'g'} - {'a', 't', 'c', 'g'}
set('u')

This can be applied to our problem because if all the bases of the chain are in the set, an empty set will be returned.

Your code would be something like this:

Result=[]
Dna_bases = {'a', 't', 'c', 'g'}
Rna_bases = {'a', 'u', 'c', 'g'}

chain_list= ['ttgaatgccttacaact', 'aucgcgauacgacgu', 'aaacggacgacgxxn4']
for i in chain_list:
    chain_set = set(i) 
    if not chain_set - Dna_bases:
        Result.append('DNA')
    elif not chain_set - Rna_bases:
        Result.append('RNA')
    else:
        Result.append('UKN')

print ('Result =', Result)

Exit:

  

Result = ['DNA', 'RNA', 'UKN']

We must bear in mind that a supposed string like "acagcc" will be returned as DNA although it could also be RNA. If this possibility exists you could do something like:

Result=[]
Dna_bases = {'a', 't', 'c', 'g'}
Rna_bases = {'a', 'u', 'c', 'g'}

chain_list= ["acagcc", 'ttgaatgccttacaact', 'aucgcgauacgacgu', 'aaacggacgacgxxn4']
for i in chain_list:
    chain_set = set(i) 
    if (not chain_set - Dna_bases) and (not chain_set - Rna_bases):
        Result.append('DNA/RNA')
    elif not chain_set - Dna_bases:
        Result.append('DNA')
    elif not chain_set - Rna_bases:
        Result.append('RNA')
    else:
        Result.append('UKN')

print ('Result =', Result)

Exit:

  

Result = ['DNA / RNA', 'DNA', 'RNA', 'UKN']

    
answered by 03.10.2017 в 22:36
1

In the answer of @FJSevilla explains why it fails and a way to fix it.

By doing Dna_bases in i you are checking if in i the set Dna_bases is included. Since both objects are not comparable, it gives an error.

Before continuing, I will change your initial code to make it more readable and pythonic:

result=[]
DNA_BASES = {'a', 't', 'c', 'g'}
RNA_BASES = {'a', 'u', 'c', 'g'}

chain_list= ['ttgaatgccttacaact', 'aucgcgauacgacgu', 'aaacggacgacgxxn4']

for dna in chain_list:
    if DNA_BASES in dna and 'u' not in dna:
        print(result.append('DNA'))
    elif RNA_BASES in dna and 't' not in dna:
        print(result.append('RNA'))
    else:
        print(result.append('UKN'))

print ('Result =', result)

Actually, what you need is to check if each character of a string belongs to a set:

"a" in DNA_BASES

To iterate the entire string, we can use all in the following idiomatic expression:

all(x in DNA_BASES for x in dna)

For all to return True , all letters of dna have to be in DNA_BASES . It is not necessary to check that the letter u is not present.

You can do much better using sets ( sets ):

set(dna) <= DNA_BASES

We are looking for the amino acids of dna to be included in DNA_BASES , for that we use <= .

The same would be for RNA_BASES . If we create a check function, the resulting code would look like this:

DNA_BASES = {'a', 't', 'c', 'g'}
RNA_BASES = {'a', 'u', 'c', 'g'}

chain_list= ['ttgaatgccttacaact', 'aucgcgauacgacgu', 'aaacggacgacgxxn4']

def check(seq):
    amins = set(seq)
    return 'DNA' if amins <= DNA_BASES \
      else 'RNA' if amins <= RNA_BASES \
      else 'UKN'

result = [check(dna) for dna in chain_list]
    
answered by 04.10.2017 в 15:01