Get a part of the text

1

I have the following text

  

/department-sales-palermo-page-3.html

And I need to get everything that is after the 3 script (-)

The first thing I thought was doing

url = /departamento-venta-palermo-pagina-3.html

url[-13:]

and he returned me correctly

  

pagina-3.html

Now how can I solve the problem that instead of a digit the number is two digits example /department-venta-palermo-pagina-13.html since using the above I would be returning

  

agina-13.html

that would be wrong.

    
asked by Sebastian 12.12.2018 в 18:14
source

3 answers

1

You can use rfind( ) , which returns the index of the last found encounter, or -1 if there is none. In combination with the slicing of strings in python:

text = '/departamento-venta-palermo-pagina-3.html'
print( text[text.rfind( '-' ) + 1: ] )

Detail:

  • text.rfind( '-' ) : returns the last occurrence of - in the string.
  • + 1 : no we want the position of the last - , but the next position .
  • text[ inicio : final ] : cut the string; returns a new string, from position inicio to position final .

Advantages:

  • It's fast : few operations.
  • It's flexible : you only need a - in the string, and it's easy to adapt.

Disadvantages:

  • It's flexible : it does not check absolutely nothing , except for the presence of that - .
answered by 12.12.2018 / 18:34
source
2

If the urls follow the same pattern where after the third script appears the page you could use this:

url = '/departamento-venta-palermo-pagina-13.html'
''.join(url.split('-')[3:]) // 'pagina13.html'
  • Separate with split indicating that the separator is '-' and you get a list
  • As what you need this after the 3 item in the list you get it with [3:]
  • Finally you join with join the list so that you have a text
  • answered by 12.12.2018 в 18:33
    1

    You can do it with a regular expression:

    import re
    url="/departamento-venta-palermo-pagina-3333.html"
    
    num = re.findall('[1-9]\d*|0\d+',url)[0]
    print(num)
    
    #solo obtener de pagina en adelante
    print(''.join(url.split('-')[-2:]))
    

    reference:
    python re

        
    answered by 12.12.2018 в 18:25