Filter words in python

7

I would like to filter in a program that I have made that hunts specific messages from a database.

I've been watching other posts and googling a bit, and if I duplicate any existing questions, I apologize, but I can not find exactly what I'm looking for.

What I want to do is a filter that blocks me the texts that come from the database, but the filter that I have done right now does not filter me exactly as I want, since it does not literally filter for what I have written.

I do not know very well what criteria it follows to filter but, looking for, I found this:

#Keywords to ignore messages
exclude : "^(?!.*(paga|pago|expul)).*$"

What he does for example with a text that I saw, is, if he has the text "win" automatically he does not pick it up, when he should,

So my question is: what method can I use in python to filter "literally" the keywords that I want?

EDIT: Program code:

blacklist = ["paga", "pago", "expul"]

@client.on(events.NewMessage(pattern=lambda msg: not 
is_blacklisted(msg.message, blacklist)))
async def my_event_handler(event): 
  from_channel_id = event.original_update.message.to_id.channel_id
  entity = redirections.get(from_channel_id)
  if entity:
    await event.client.send_message(entity, event.original_update.message)

def is_blacklisted(frase, palabras):
  for palabra in palabras:
    if palabra in frase:
      return True
  return False

Edit2:

Although the filter works without any error, if the message contains a line break, it filters it, even if it does not contain any keywords as such.

The same thing happens when using a regular expression.

    
asked by Peisou 12.12.2018 в 16:50
source

1 answer

9

The following would be a generic function that returns True if the frase that you pass contains as a substring any of the ones you pass in the parameter blacklist :

def is_blacklisted(frase, palabras):
  for palabra in palabras:
    if palabra in frase:
      return True
  return False

This you can use to filter a list, for example the following list:

frases =[
    "En un lugar de la Mancha ",
    "de cuyo nombre no quiero acordarme ",
    "no ha mucho tiempo que vivía ",
    "un hidalgo ",
    "de los de lanza en astillero, ",
    "adarga antigua, ",
    "rocín flaco, ",
    "y galgo corredor."
]

blacklist = ["de", "nombre"]

filtradas = [frase for frase in frases if not is_blacklisted(frase, blacklist)]

with the result:

['no ha mucho tiempo que vivía ',
 'un hidalgo ',
 'adarga antigua, ',
 'rocín flaco, ',
 'y galgo corredor.']

Notice that what you are looking for is the sub-chain. If you put ["a"] in blacklist , none would happen. On the other hand this version is case sensitive (mayus / minus), it can be easily modified so that it is not:

def is_blacklisted(frase, palabras):
  for palabra in palabras:
    if palabra.lower() in frase.lower():
      return True
  return False

Update

The events.NewMessage() function that your framework uses supports a parameter called pattern , which, in addition to a regular expression, can be a function that will receive the message as a parameter and return as a result True if that message should be handled.

The function that I provided before ( is_blacklisted() ) does not serve directly for this purpose, because on the one hand it receives two parameters instead of one, and on the other hand it returns the opposite of what is expected ( True indicates that the message should not be handled). But it is trivial to make an expression lambda that makes use of it and adjusts to what is needed.

The following should work (although I could not test it for lack of the necessary infrastructure):

blacklist = ["paga", "pago", "expul"]
@client.on(events.NewMessage(
              pattern=lambda msg: not is_blackisted(msg, blacklist)))
async def my_event_handler(event):
  from_channel_id = event.original_update.message.to_id.channel_id
  entity = redirections.get(from_channel_id)
  if entity:
    await event.client.send_message(entity, event.original_update.message)
    
answered by 12.12.2018 / 18:02
source