Problem when retrieving data on Google Sheet using RegexExtract

4

I have a database in Google Spreadsheet where there are cells that have more than one e-mail address. I am occupying the function RegexExtract , but it only extracts the first address of the e-mail that it finds.

How can I make a loop or rescue all the e-mails that are in the same cell?

Cell E2 contains 3 e-mail addresses. Cell F2 has function =Regexextract(E2;"[A-z0-9._%+-][email protected][A-z0-9.-]+\.[A-z]{2,4}") that extracts only the first e-mail.

How do I get to loop until the last e-mail (could it be more than 3)?

    
asked by Fernando Paredes 01.02.2016 в 20:15
source

2 answers

2

For this solution it would be much better and, above all, more legible, to build a script. However, just for fun, I give you two solutions:


Option 1: easy, with REGEXREPLACE

Deleting all what is not an email. For that:

  • Match the text that is before the email
  • Match the email (and capture it)
  • Replace all of the above (1 and 2) with the value captured in 1.
  • Also add a ; to separate the emails, otherwise it would not be different from one another.
  • Solution

    =REGEXREPLACE(E2;"[^A-z0-9._%+-]*(?:[A-z0-9._%+-][^A-z0-9._%+-]*)*?([A-z0-9._%+-][email protected][A-z0-9.-]+\.[A-z]{2,4})";"$1;")
    

    Description

    #TEXTO ANTES DEL EMAIL
    [^A-z0-9._%+-]*              # Consumir los caracteres que no pueden estar al principio del mail
    (?:                          # REPETIR 0+, la menor cant. de veces
        [A-z0-9._%+-]            #   Caracteres que pueden estar, pero no son el mail
        [^A-z0-9._%+-]*          #   Caracteres que no pueden estar al principio del mail
    )*?                          # FIN REPETIR
    
    #EL EMAIL
    ([A-z0-9._%+-][email protected][A-z0-9.-]+\.[A-z]{2,4})  # Entre paréntesis lo captura en $1
    

    And we replace it with "$1;" . That is, the email and a semicolon to separate them.


    Option 2: with REGEXEXTRACT and magic in the regular expression

    What if we wanted to get 1 single email from those in the text? If we are interested in choosing the first, the second, one in particular?

    Well, it can be done perfectly, but from now on I anticipate, it will not be easy to understand, and above all, a nightmare when it comes to maintaining it. So consider it a fun thing to see, but do not use it if at some point you may want to modify it.

    We use a column to put what number of emails we want, and in the next one we get that email:

    Solution

    =RegexExtract(B2;"(?:[A-z0-9._%+-][email protected][A-z0-9.-]+\.[A-z]{2,4}(?:[^A-z0-9._%+-][A-z0-9._%+-]*)*?[^A-z0-9._%+-]+){" & C2 - 1 & "}([A-z0-9._%+-][email protected][A-z0-9.-]+\.[A-z]{2,4})")
    

    Description

    Using the same logic as before, the text is consumed between emails, but this time it is required to repeat N times (the value of the cell) before coinciding with the mail. And we capture the last mail (which is the value returned by the function).

    (?:                                          # REPETIR
        [A-z0-9._%+-][email protected][A-z0-9.-]+\.[A-z]{2,4}   #   Coincidir con email
        (?:[^A-z0-9._%+-][A-z0-9._%+-]*)*?       #   Texto que no es
        [^A-z0-9._%+-]+                          #   Tiene que separar 2 mails
    ){⋘n⋙}                                     # FIN REPETIR n veces 
                                                 # (⋘n⋙ se cambia por la celda)
    ([A-z0-9._%+-][email protected][A-z0-9.-]+\.[A-z]{2,4})     # Capturar email
    
        
    answered by 02.08.2016 в 09:23
    1

    What if you use the function REGEXREPLACE , to iteratively search all matches according to your validation, and then replace each line by the coincidence of the first parenthesis.

    =REGEXREPLACE(E2;"^([A-z0-9._%+-][email protected][A-z0-9.-]+\.[A-z]{2,4})";"$1")
    

    Explanation :

    ^ : Start line

    () : First capture

    $1 : Replace for first capture

        
    answered by 02.02.2016 в 14:05