Extract the value of two indicators

1

I am new to python and I am taking my first steps. It turns out that I have a file of thousands of rows. The content of the string that contains each row can be of two classes (I leave the capture of both types of string) and is determined by a label called "mime" which can be video or audio.

My problem is that I do not know how to extract the value of the variables "lmt", "mime", "itag", "dur" prsentes in each line and at the end store them in a file .csv .

I have tried dividing and creating a new file and using vectors. But I do not come to a solution (I feel a bit frustrated) .

String1:

  

Line 219 (6420), Dh, "" "url" ":   "" link ", "

String2:

  

Line 479 (106021), Dh, "" "url" ":   "" link " ","

I would really appreciate any help. In the end what I want is a file that contains the following output data:

Output .csv file:

    
asked by Laura 13.11.2018 в 14:36
source

3 answers

0

I think this could help you:

ENTRADA = "test.csv"
SALIDA = "salida.csv"

with open(ENTRADA,'r') as e:
    for line in e: # lee cada línea del archivo

        b = line.split('&') # separa la línea por '&'
        c = {} # La idea es crear un diccionario con los elementos de interés

        for x in b:
            kv = x.split('=')

            if kv[0] in ['lmt','itag','dur']:
                c[kv[0]] = kv[1]
            elif kv[0] == 'mime': # en el caso de mime, el valor contiene un caracter '%'
                c[kv[0]] = kv[1].split('%')[0]

        with open(SALIDA,'a') as f:
            f.write("%s,%s,%s,%s\n" % (c['lmt'], c['mime'], c['itag'], c['dur']))
    
answered by 13.11.2018 / 15:53
source
0

Hi, look, I came up with a function that can help you:

def buscarCadenas(cadena, nombre, largo, caracter):
#Extraer de la cadena el fragmento que coincida con el nombre pasado por parametro:
fragmento = cadena[cadena.find(nombre)+largo:len(cadena)]
resultado = ""
for x in fragmento:
    if x == caracter:
        break;
    else:
        resultado = resultado + x
return resultado

What it does is pass as parameters the chain in which to look for all those values. The parameters "name", "long" and "character" correspond to the data you need, therefore they vary in each case.

This would be the way to use the function:

lmt = buscarCadenas(cadena,"lmt=", 4,"&")
mime = buscarCadenas(cadena,"mime=", 5,"%")
itag = buscarCadenas(cadena,"itag=", 5, "&")
dur = buscarCadenas(cadena,"dur=", 4, "&")
print("lmt: ", lmt)
print("mime: ", mime)
print("itag: ", itag)
print("dur: ", dur)

I hope it serves you: D

    
answered by 13.11.2018 в 15:41
0

you can do it like this:

import re
data = []
with open("file.txt") as f:
    data =[ [re.findall('lmt=(.*?)&',x),re.findall('mime=(.*?)%',x),re.findall('itag=(.*?)&',x),re.findall('dur=(.*?)&',x)] for x in f.readlines() ] 

print("{:<18}|{:^10}|{:^10}|{:^10}".format("lmt","mime","itag","duracion"))
for x in data:
    print("{:<18}|{:^10}|{:^10}|{:^10}".format(x[0][0],x[1][0],x[2][0],x[3][0] ))
    
answered by 13.11.2018 в 16:49