Delete text after ==

2

A requirements file contains all the packages installed in Python, so that the file can be used elsewhere and rebuild the original programming environment.

A requirements file looks like this:

alabaster==0.7.9
arrow==0.8.0
awesome-slugify==1.6.5
Babel==2.3.4
binaryornot==0.4.0
blessings==1.6

What I want is to remove the part that indicates the version, in the case of the first line alabaster==0.7.9 , delete the part ==0.7.9 and leave only alabaster .

I understand that finding a match creates two groups, but I can not make it work. I'm trying it in using as follows.

  • When I ask for the first group:

    $ awk -F"==" '{print $1}' base.txt 
    

    I get this:

    alabaster==0.7.9
    arrow==0.8.0
    awesome-slugify==1.6.5
    

    that is, the file is repeated.

  • When I ask for the second group with

    $ awk -F"==" '{print $2}' base.txt
    

    I only get 50 blank lines.

  • ADDITION:

    Now I'm looking for this pattern (\w+)(==.) with what I do two match groups, I'm interested in the first one. But if the package is called python-mimeparse there is no match anymore. You should be able to add scripts in case a package is called paquete_python or paquete-python .

    Addendum 2

    This expression (.+)(==)(.+) finds three groups, the first is the package (which is what I'm looking for) and the third is the version. Now I just need to know how to use it in awk .

    third edition

    I published an answer that solves the problem in Python, but the idea is that the solution is applied with some other tool such as awk , gawk , sed or even perl .

    There are several options in this SOEN publication, but do not I have been able to use my search pattern in none. I do not get errors, but I do not get any results either.

    Alguas considerations:

    • I'm looking for just the name of the package , not the version
    • There is no package installed, so there is nothing to update
    • The solution can use another tool, such as sed or grep
    asked by toledano 10.03.2017 в 20:26
    source

    4 answers

    2

    The awk -F'==' '{print $1}' archivo solution uses a field separator ( FS ) with multi-characters. This is valid as long as you are using a version of awk compatible with POSIX. For example, on Solaris it will not work.

    So the question is: how to make it work?

    So let's simplify: the file consists of lines of the form módulo==versión . Therefore, what we can do is eliminate = and everything that follows:

    $ cut -d'=' -f1 fichero
    alabaster
    arrow
    awesome-slugify
    Babel
    binaryornot
    blessings
    

    This is saying: separate the line based on = as a separator ( -d= ) and print the first resulting field ( -f1 ).

    It may be a bit fragile, so you can also choose to use sed :

    sed 's/=.*//' fichero
    

    This does the same thing: delete from the first symbol = . However, it allows you to extend the command to something more complex, such as:

    $ sed '/==/s/=.*//' fichero
    alabaster
    arrow
    awesome-slugify
    Babel
    binaryornot
    blessings
    

    That performs this substitution only on lines that contain == . And if you hurry me, you can say:

    sed -n '/==/s/=.*//p' fichero
    

    To print only these lines ( -n inhibits printing by default and p prints the current line).

    If you really want to use% co_of% awk, use:

    $ awk 'match($0, /^(.+)==(.+)/, res) {print res[1]}' fichero
    alabaster
    arrow
    awesome-slugify
    Babel
    binaryornot
    blessings
    

    As you can see, the syntax is match() . Therefore, it is a question of capturing those that interest us: in this case only the first, so in fact we could limit ourselves to saying match(línea, patrón, matriz de resultados) , without needing to capture the rest.

    In a nutshell: match($0, /^(.+)==/, res) does not seem like the best solution here because depending on which environments the field separator with multi-characters may give you problems. Make your life easy by using awk in this case: you do not need to use such complex regular expressions when a sed sencillito already gives you everything you need.

        
    answered by 11.03.2017 / 00:18
    source
    3

    A. VALUES OF THE LEFT OF ==

    Option 1.

    Capture everything that is before ==

    ^.*?(?=\=\=)/gm
    

    Option 2.

    Make a match without capturing the group from ==

    Thanks @fedorqui

    ^.*?(?:==)/gm
    

    DEMO

    Result

    alabaster
    arrow
    awesome-slugify
    Babel
    binaryornot
    blessings
    

    B. VALUES OF THE RIGHT OF ==

    =.*

    DEMO

    Result

    ==0.7.9
    ==0.8.0
    ==1.6.5
    ==2.3.4
    ==0.4.0
    ==1.6
    
        
    answered by 10.03.2017 в 20:56
    1

    Try this command in Bash:

    cat requirements.txt | grep -oP "\w+[-_]{0,1}\w+"
    

    requirements.txt would be the pip requirements file.

    The important thing is the regular expression to use and the one that I include includes the requirement of the hyphen or dash separator; I updated the example of @A. Cedano so you can see it live here .

    If you need to save the result to a file (surely yes), you can obviously use the output redirection; that is:

    cat requirements.txt | grep -oP "\w+[-_]{0,1}\w+" > salida.txt
    

    I hope you serve, greetings.

        
    answered by 10.03.2017 в 22:03
    1

    The alternative in Python is as follows:

    import re
    
    
    r = re.compile('(?P<paquete>.+)(==)(?P<version>.+)'
    for l in open('base.txt').readlines():
        print (r.search(l).group('paquete'))
    
    • The re module that handles regular expressions is imported into the first line.
    • Since we are going to apply the same search to all the lines, we create an object pattern or search pattern with the expression we are looking for:
      • The first group has name paquete and is formed with any character and any amount.
      • The second group is only a separator, formed by the signs of equality.
      • The third group is named version and is formed by the rest of the characters after the second group.
    • We go through the requirements file, line by line,
    • And we pass it as an argument to the search (which uses the previously compiled object) and only the result of the group paquete (ie the match ) is printed.
    answered by 10.03.2017 в 22:45