Validate an email in JavaScript that accepts all Latin characters

75

Question

How to validate an e-mail that accepts all Latin characters?

  • By Latin characters I mean accented letters, ñ , ç , and all those used by languages such as Spanish, Portuguese, Italian ... Latin.

Context

  • The goal is to display an icon next to the text as the user types in their email address.
  • I'm not interested in accepting all valid cases. It was a design decision to cover only the most frequent mails. That is, letters (including accents and the like) and symbols ._%+- .
  • I can use code from other sources, as long as they are popular (eg: jQuery).

Code

document.getElementById('email').addEventListener('input', function() {
    campo = event.target;
    valido = document.getElementById('emailOK');
        
    emailRegex = /^[-\w.%+]{1,64}@(?:[A-Z0-9-]{1,63}\.){1,125}[A-Z]{2,63}$/i;
    //Se muestra un texto a modo de ejemplo, luego va a ser un icono
    if (emailRegex.test(campo.value)) {
      valido.innerText = "válido";
    } else {
      valido.innerText = "incorrecto";
    }
});
<p>
    Email:
    <input id="email">
    <span id="emailOK"></span>
</p>

Cases

I'm using the regex

/^[-\w.%+]{1,64}@(?:[A-Z0-9-]{1,63}\.){1,125}[A-Z]{2,63}$/i

That works perfect in cases like

[email protected]
[email protected]

But it fails with accents and other Latin letters

germá[email protected]
yo@mi-compañía.com
estaçã[email protected]
    
asked by Mariano 01.12.2015 в 21:55
source

6 answers

79

With this regular expression you can validate any email address that contains Unicode characters:

/^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})$/i

If you test it in a JavaScript console:

> emailRegex.test("[email protected]");
< true
> emailRegex.test("germá[email protected]");
< true

Source

From there, and as you have very well mentioned, an expression that best suits your needs would be the following:

/^(?:[^<>()[\].,;:\s@"]+(\.[^<>()[\].,;:\s@"]+)*|"[^\n"]+")@(?:[^<>()[\].,;:\s@"]+\.)+[^<>()[\]\.,;:\s@"]{2,63}$/i
    
answered by 22.06.2017 / 15:57
source
28

There are certain restrictions for emails but I can comment that regularly they should be based on these rules:

  
  • Uppercase and lowercase letters of the English alphabet.
  •   
  • Numbers from 0 to 9
  •   
  • may contain period but not at the beginning or repeated.
  •   
  • can use the characters:! # $% & '* + - / =? ^ _' {|} ~
  •   

There are restrictions with certain types of emails, for example if they contain:

  
  • Greek alphabet.
  •   
  • Cyrillic characters.
  •   
  • Japanese characters.
  •   
  • Latin alphabet with diacritics.
  •   

examples not accepted as valid email addresses:

червь.ca®[email protected]

josé.patroñ[email protected]

See more:

link link

Imagine an email with Cyrillic characters, even worse if you want to store that data in a DB, what kind of SQL collation to use!

But good the question refers to how to validate that type of emails, this is a script that would help with the task:

function validarEmail(valor) {
  if (/^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})$/i.test(valor)){
   alert("La dirección de email " + valor + " es correcta!.");
  } else {
   alert("La dirección de email es incorrecta!.");
  }
}

for example:

validarEmail("jorgé[email protected]");

The script would show you that the email address is correct.

  • Update:

Currently it is possible to use international characters in domain names and email addresses .

Traditional email addresses are limited to English alphabet characters and some other special characters. The following are valid traditional email addresses:

  [email protected]                                (English, ASCII)
  [email protected]                            (English, ASCII)
  user+mailbox/[email protected]   (English, ASCII)
  !#$%&'*+-/=?^_'.{|}[email protected]               (English, ASCII)
  "Abc@def"@example.com                          (English, ASCII)
  "Fred Bloggs"@example.com                      (English, ASCII)
  "Joe.\Blow"@example.com                       (English, ASCII)

International email, on the other hand, uses Unicode characters encoded as UTF-8 , which allows you to encode the address text in most of the world's writing systems.

The following are all valid international email addresses:

  用户@例子.广告                   (Chinese, Unicode)
  अजय@डाटा.भारत                    (Hindi, Unicode)
  квіточка@пошта.укр             (Ukrainian, Unicode)
  θσερ@εχαμπλε.ψομ               (Greek, Unicode)
  Dörte@Sörensen.example.com     (German, Unicode)
  аджай@экзампл.рус              (Russian, Unicode)
    
answered by 01.12.2015 в 22:14
17

I found an article here (in English) that talks about some different expressions regular expressions that can verify addresses of email based on the RFC standard. There are many different expressions of regular expressions recommended and there is not a single all-in-one solution. But this regular expression is probably the one I would leave with, adding accented characters to the list of valid characters as well.

\A[a-z0-9!#$%&'*+/=?^_'{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_'{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z
    
answered by 01.12.2015 в 22:19
14
  

How to validate an email that accepts all Latin characters?

The only 100% sure way to verify if an email is valid is to send one. If the user wrote the wrong email, they will simply retry.

According to RFC 5322 , [email protected] is a "valid" email, but, Is someone going to receive it? Is there a server behind the domain that accepts emails? These are the concerns you should have. Whatever you are doing, a distribution list, registration, etc. You must send a confirmation email to validate it . The implementation will depend on the stack you use (C #, PHP, Java?) And you will have valid emails that someone receives.

You can implement something on the client side that at least says "this is an email address", but it should not be your "validation" tool, it just tries to make the user realize that what wrote is # ($ ^ % # $ @ ^ ( # $ ^. com

    
answered by 02.12.2015 в 23:34
12

Simply point out that, according to the official specification , the regex that represents a orthographically valid email address is the following:

/^[a-zA-Z0-9.!#$%&'*+/=?^_'{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

I purposely put the email address orthographically valid , because what defines a valid really email address is that it works, that is, that there is and can receive emails.

It follows that verification by means of Javascript is not enough. You can help us to do a validation [orthographic] , provided that Javascript is activated on the client side.

If you want to verify that the email really exists , there is no other way than sending an email and the recipient responds. That's what you can call " validation [real] of an email .

In fact, that is what all the serious subscription services do, they send us an email that we must verify to be definitively registered in their sites or in their distribution lists.

I allow myself to graphically show the steps to validate an e-mail. We will see that what is treated here is only step 2/5 of a validation process that would include 5 stages :

  • Stage 1 : The user writes an e-mail
  • Stage 2 : Validation Spelling of the e-mail written by the user
  • Stage 3 : Verify if the domain corresponding to the orthographically validated e-mail has an e-mail server
  • Stage 4 : Send a request (ping) or an email to verify that the server is accepting e-mails
  • Stage 5 : The e-mail was received correctly in that address

Until we reach stage 5, we can not say that the e-mail has been validated .

If in any case the OP requests a validation method that accepts addresses with ñ and other characters not defined up to now by the official specification of w3.org (link above), the regex mentioned in a previous answer works.

The code that follows is the same used in the question, but implementing on the one hand the official regex and the regex that allows Latin characters such as the ñ.

document.getElementById('email').addEventListener('input', function() {
    campo = event.target;
    valido = document.getElementById('emailOK');
        
  var reg = /^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;

 var regOficial = /^[a-zA-Z0-9.!#$%&'*+/=?^_'{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;

    //Se muestra un texto a modo de ejemplo, luego va a ser un icono
    if (reg.test(campo.value) && regOficial.test(campo.value)) {
      valido.innerText = "válido oficial y extraoficialmente";
    } else if (reg.test(campo.value)) {
      valido.innerText = "válido extraoficialmente";

    } else {
      valido.innerText = "incorrecto";

}
});
<p>
    Email:
    <input id="email">
    <span id="emailOK"></span>
</p>

Validation [orthographic] in HTML5

HTML5 allows us to declare our input of the email type and is responsible (in part) for the validation by us, as MDN says :

  

email : The attribute represents an email address. The   Line breaks are automatically deleted from the value entered. Can   enter an invalid email address, but the entry field   it will only work if the address satisfies the production ABNF 1*( atext / "." ) "@" ldh-str 1*( "." ldh-str ) where atext is defined   in RFC 5322, section 3.2.3 and ldh-str is defined in RFC 1034, section 3.5.

You can combine email with the attribute pattern :

  

pattern : A regular expression against which the value is evaluated. The boss   must match the full value, not just one part. It can be used   the title attribute to describe the pattern as it helps the user. East   attribute applies when the type attribute is text, search, tel, url,   email, or password, and otherwise it is ignored. The language of   regular expression is the same as the JavaScript RegExp algorithm,   with the parameter 'u' that allows to treat the pattern as a sequence   of Unicode code. The pattern is not surrounded by diagonals.

The disadvantage is that not all clients are compatible with HTML5.

<form>
<input type="email" pattern='^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$' title="Entre un email válido"  placeholder="Entre su email">
<input type="submit" value="Submit">
</form>
    
answered by 18.03.2017 в 02:50
10

According to RFC 6531, more characters than we are used to would have to be supported. But the servers limit it with previous ones. I do not see a solution with a single range that implies introducing "all Latin characters". Although they seem to go together (as in this table from 0080 to 00FF ), there are others in between.

A possible regex for the Latin characters that might interest you ( source ) and adding the ( suggestion ):

/[A-Za-z\u0021-\u007F\u00C0-\u00D6\u00D8-\u00f6\u00f8-\u00ff]+/g

You could join with your regex, those that you have already indicated previously or one according to RFC 2822, like this, so that you do not exclude the ranges that interest you (that there are many types of tildes) ( source ):

^([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])*\x22)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x22([^\x0d\x22\x5c\x80-\xff]|\x5c[\x00-\x7f])*\x22))*\x40([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])*\x5d)(\x2e([^\x00-\x20\x22\x28\x29\x2c\x2e\x3a-\x3c\x3e\x40\x5b-\x5d\x7f-\xff]+|\x5b([^\x0d\x5b-\x5d\x80-\xff]|\x5c[\x00-\x7f])*\x5d))*$
    
answered by 02.12.2015 в 01:46