Delete duplicate items in a list


I need to find a more efficient way to remove duplicates from a list in Python.

I'm doing it this way:

for i in mj:
    if i not in mj2:

Where kj is a list as [2, 4, 4, 4, 4, 4, 9, 9] and output mj2 is of the form:

   [2, 4, 9]

There is a more efficient way that does not include loops, since I have to analyze large lists.

asked by Jorge Ponti 18.07.2017 в 20:27

3 answers


The easiest way is to use set() :

>>> mj = [2, 4, 4, 4, 4, 4, 9, 9]
>>> mj2 = set(mj1)
>>> mj2
set([9, 2, 4])
>>> list(mj2)
[9, 2, 4]

If you want to maintain order (since the sets are a unordered list of elements), you can pass a sort at the end:

>>> sorted(list(mj2))
[2, 4, 9]

Another option, if your list is originally sorted and you want to maintain order, you can use the class OrderedDict and take advantage of it to maintain this order:

>>> from collections import OrderedDict
>>> OrderedDict.fromkeys(mj)
OrderedDict([(2, None), (4, None), (9, None)])
>>> OrderedDict.fromkeys(mj).keys()
[2, 4, 9]

OrderedDict is an implementation of the dictionaries that allows you to "remember" the order in which your elements have been inserted. Therefore, you can use the fromkeys dictionary method to use the elements of mj as the dictionary keys, since the elements of mj are previously sorted then the order is maintained.

answered by 18.07.2017 / 20:31

You can test how well the performance is with the following line of code:

mj2 = sorted(set(mj))

although use sorted able to consume a bit of resource. If you have no problems with the order you can use as follows:

mj2 = set(mj)
answered by 18.07.2017 в 20:31

If the original list is very large and ordered, it is much more efficient to use itetools.groupby that creates an iterator without creating new lists:

from itetools import groupby

mj2 = (k for (k,_) in groupby(m2))

It is possible to obtain the first elements without having to process the entire list:

first = next(mj2)
second = next(mj2)
answered by 19.07.2017 в 00:18