Problem when retrieving data on Google Sheet using RegexExtract

Question

Problem when retrieving data on Google Sheet using RegexExtract

Navigation

#1 by (2 votes)
#2 by (1 votes)

4

I have a database in Google Spreadsheet where there are cells that have more than one e-mail address. I am occupying the function RegexExtract , but it only extracts the first address of the e-mail that it finds.

How can I make a loop or rescue all the e-mails that are in the same cell?

Cell E2 contains 3 e-mail addresses. Cell F2 has function =Regexextract(E2;"[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4}") that extracts only the first e-mail.

How do I get to loop until the last e-mail (could it be more than 3)?

regex

asked by Fernando Paredes 01.02.2016 в 19:15

source

2 answers

Swift iOS prepareForSegue How to select 'onclick' with random 'id'?

score 2 · Answer 1

For this solution it would be much better and, above all, more legible, to build a script. However, just for fun, I give you two solutions:

Option 1: easy, with REGEXREPLACE

Deleting all what is not an email. For that:

Match the text that is before the email

Match the email (and capture it)

Replace all of the above (1 and 2) with the value captured in 1.

Also add a ; to separate the emails, otherwise it would not be different from one another.

Solution

=REGEXREPLACE(E2;"[^A-z0-9._%+-]*(?:[A-z0-9._%+-][^A-z0-9._%+-]*)*?([A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4})";"$1;")

Description

#TEXTO ANTES DEL EMAIL
[^A-z0-9._%+-]*              # Consumir los caracteres que no pueden estar al principio del mail
(?:                          # REPETIR 0+, la menor cant. de veces
    [A-z0-9._%+-]            #   Caracteres que pueden estar, pero no son el mail
    [^A-z0-9._%+-]*          #   Caracteres que no pueden estar al principio del mail
)*?                          # FIN REPETIR

#EL EMAIL
([A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4})  # Entre paréntesis lo captura en $1

And we replace it with "$1;" . That is, the email and a semicolon to separate them.

Option 2: with REGEXEXTRACT and magic in the regular expression

What if we wanted to get 1 single email from those in the text? If we are interested in choosing the first, the second, one in particular?

Well, it can be done perfectly, but from now on I anticipate, it will not be easy to understand, and above all, a nightmare when it comes to maintaining it. So consider it a fun thing to see, but do not use it if at some point you may want to modify it.

We use a column to put what number of emails we want, and in the next one we get that email:

Solution

=RegexExtract(B2;"(?:[A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4}(?:[^A-z0-9._%+-][A-z0-9._%+-]*)*?[^A-z0-9._%+-]+){" & C2 - 1 & "}([A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4})")

Description

Using the same logic as before, the text is consumed between emails, but this time it is required to repeat N times (the value of the cell) before coinciding with the mail. And we capture the last mail (which is the value returned by the function).

(?:                                          # REPETIR
    [A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4}   #   Coincidir con email
    (?:[^A-z0-9._%+-][A-z0-9._%+-]*)*?       #   Texto que no es
    [^A-z0-9._%+-]+                          #   Tiene que separar 2 mails
){⋘n⋙}                                     # FIN REPETIR n veces 
                                             # (⋘n⋙ se cambia por la celda)
([A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4})     # Capturar email

score 1 · Answer 2

What if you use the function REGEXREPLACE , to iteratively search all matches according to your validation, and then replace each line by the coincidence of the first parenthesis.

=REGEXREPLACE(E2;"^([A-z0-9._%+-]+@[A-z0-9.-]+\.[A-z]{2,4})";"$1")

Explanation :

^ : Start line

() : First capture

$1 : Replace for first capture