Search for url and replace with another in txt with python

1

I have a .txt file that contains several urls, among all I want to find those that end with zip and rar extension, then change the domain / ip for another and finally save in the same file.

Example of file.txt:

Lorem ipsum dolor sit amet, consectetur http://url.com/file.zip
Curabitur sit amet semper erat http://www.gogle.com/
Cras viverra neque et libero eleifend http://158.989.87.7/file.rar
http://url.com/image.jpg

Output of file.txt

Lorem ipsum dolor sit amet, consectetur http://miserver.com/file.zip
Curabitur sit amet semper erat http://www.gogle.com/
Cras viverra neque et libero eleifend http://miserver.com/file.rar
http://url.com/image.jpg

I know that with regex I can find it, but I come here because I do not know how to start and I have not found something so specific.

    
asked by tomillo 18.05.2018 в 00:20
source

1 answer

1

I'm definitely not an expert in regular expressions, but this should work.

lista = [
'Lorem ipsum dolor sit amet, consectetur http://url.com/file.zip',
'Curabitur sit amet semper erat http://www.gogle.com/',
'Cras viverra neque et libero eleifend http://158.989.87.7/file.rar',
'http://url.com/image.jpg'
]

import re

patron = r'\/\/([^\/]+)\/.+\.(rar|zip)'

for linea in lista:
  m = re.search(patron, linea)
  if m:
    linea = linea.replace(m.group(1), "miserver.com")

  print(linea)

The operation is simple, we go through the list and to each line we apply the pattern regex that has a group to capture that would coincide exactly with the domain. If there is a coincidence, we simply replace this text with the new domain.

Regarding the pattern:

  • Must match \/\/ The two bars // and then ...
  • ([^\/]+) we capture any character minus the / until ...
  • \/ the next bar and also match ...
  • .+\.(rar|zip) any character, a point and the extensions mentioned
answered by 18.05.2018 / 02:10
source