Regular expression to extract the Amazon ASIN from a URL

1

I'm trying to extract the content of the ASIN code in PHP. And I have no idea what I'm doing wrong so that it does not return the string I want to extract.

In this case, I want to extract the content only from the ASIN code ( B01ETRGE7M ) of:

$url = "/Fire-TV-Stick-Basic-Edition/dp/B01ETRGE7M/ref=zg_bs_electronics_1/257-1105669-2334965?_encoding=UTF8&psc=1&refRID=GC2WM7K4BAH0E68ZWRKS";

function asaber($valor){
    return preg_match("/^\/[dp]{2}\/\w{10}$/", $valor);
}

echo asaber($url);

The ASIN code:

  • It is not always the 3rd data in the URL, it can vary from position as it can vary in the letter that starts, since it can also enter numbers.

  • It will always contain numbers and letters, in uppercase and lowercase.

  • The URL is from Amazon.

  • asked by Fumatamax 29.01.2018 в 13:06
    source

    1 answer

    1

    According to what I read in:

  • Determine if 10 digit string is valid Amazon ASIN
  • Get ASIN from pasted Amazon url
  • How do you write to that match only when there's a slash OR nothing after the match?
  • there is no specific syntax to be able to differentiate it, except that:

    • Start with B and it has exactly 9 more characters in upper case, or is an ISBN.
    • They must be immediately followed by / , ? , # , or the end of the text.
    • Usually is preceded by
      • /o/ , /ASIN/ or /e/ ; or
      • /dp/ or /gp/product/ , optionally with a folder in the middle before the ASIN.

    These are rules inferred from the trial and error, so in the future it might not coincide with any code. Also, they are not very strict rules, so you can expect false positives.


    To match a B followed by 9 alphanumerics in capital letters, or an ISBN:

    ~/\K(?:B[\dA-Z]{9}|\d{9}[\dX])(?=$|[/?#])~
    

    And if we demand that it be preceded by the folders that usually appear:

    ~/(?:[eo]|ASIN|(?:gp|dp/product)(?:/[^/]+)??)/\K(?:B[\dA-Z]{9}|\d{9}[\dX])(?=$|[/?#])~
    


    Code:

    function extraerASIN($url){
        $regex = '~/(?:[eo]|ASIN|(?:gp|dp/product)(?:/[^/]+)??)/\K(?:B[\dA-Z]{9}|\d{9}[\dX])(?=$|[/?#])~';
        if (preg_match( $regex, $url, $match)) {
            return $match[0];
        } else {
            return false; //No se encontró
        }
    }
    


    In any case, this function could give false results. The recommended thing is to use the Amazon API.

        
    answered by 29.01.2018 в 14:51