Get part of a Selenium Python url

1

I have this URL and I just want to get a part of it:

https://onevideo.aol.com/#/inventorysource/1024374?makeDuplicate=true

The part I want to obtain is the id: 1024374 to then enter it in this other url:

https://onevideo.aol.com/inventory_sources/get_adtag_urls_export?secure=0&ft=EXCEL&piggyback_type=ANY&id=1024374&at=MOBILE_WEB&_sid=60c7302b-9ede-4308-93f6-014975706aff

I explain a little to clear up doubts. With selenium I am redirected to the first URL. Here I want to get the ID of the url. I thought I would do it with driver.current_url but I get the whole url. Then I need the ID that I get to enter the second url where the ID goes.

Working with Python 2.7

    
asked by Martin Bouhier 24.10.2017 в 02:25
source

2 answers

1

You have several options, from using str.split to regular expressions, or you can use the module urlparse next to str.rpartition :

import urlparse

url = 'https://onevideo.aol.com/sd/inventorysource/1024374?makeDuplicate=true'
url_parts = urlparse.urlparse(url,  allow_fragments = False)
id = url_parts[2].rpartition('/')[2]
print(id)

In Python 3 the module is urllib.parse .

If your urls always have the same structure you can simply use str.rpartition twice:

url = 'https://onevideo.aol.com/sd/inventorysource/1024374?makeDuplicate=true'
id = url.rpartition('/')[2].rpartition('?')[0] 
print(id)

str.rpartition divides a string using the last occurrence of the separator provided as an argument. Return a tuple with three elements where the second is the separator, the first what is left and the third what is on your right.

Edit:

To get the URL as a string you only need to use the current_url attribute:

url = driver.current_url

To create your new url format the string:

base = "https://onevideo.aol.com/inventory_sources/get_adtag_urls_export?secure=0&ft=EXCEL&piggyback_type=ANY&id={}&at=MOBILE_WEB&_sid=60c7302b-9ede-4308-93f6-014975706aff"
new_url = base.format(id)
    
answered by 24.10.2017 / 02:52
source
1

I add the method of regular expressions although in your case, since it is about urls, the best solution is FJSevilla.

import re

url = "https://onevideo.aol.com/#/inventorysource/1024374?makeDuplicate=true"
m = re.search('(^.*\/)([0-9]{7})(\?.*$)', url)

print m.group(1)
print m.group(2)
print m.group(3)

https://onevideo.aol.com/#/inventorysource/
1024374
?makeDuplicate=true

As you will see, the second group is the id you are looking for. The explanation is:

(^         # Primer grupo, desde el comienzo
.*         # Cualquier caracter
\/)        # Hasta la primer barra antes del segundo grupo
([0-9]{7}) # 2do grupo solo 7 números, puedes hacer también {1,7} de 1 a 7 números
(\?        # 3er grupo desde el ?
.*         # Cualquier caracter
$)         # Hasta el final de la cadena
    
answered by 24.10.2017 в 03:13