Python convert a string in CamelCase to separated by dashes


Hi, I'm needing to convert a string in CamelCase to separate scripts, I've been trying a bit of regular expressions but I can not get any part of it, the idea is to enter a string in CamelCase:






asked by Ricardo D. Quiroga 08.04.2017 в 16:29

2 answers


You can use re.sub to subtitle each match (in this case a capital letter inside the chain) by another given string (in this case '-'). To remove capitals you can use the lower method of the class str :

import re

pattP = re.compile(r'(.)([A-Z][a-z]+)')
pattF = re.compile('([a-z0-9])([A-Z])')

def camel_a_guiones(cadena):
    return pattF.sub(r'-', pattP.sub(r'-', cadena)).lower()


Another alternative using re.finditer to separate the words (this is valid also if we want to obtain a list of the words contained in the camel). Having this it is enough to rejoin them using the method join() of str :

import re

patt = re.compile(r'.+?(?:(?<=[a-z0-9])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])|$)')
def camel_a_guiones(cadena):
    return '-'.join( for m in re.finditer(patt, cadena)).lower()


Output of both:



answered by 08.04.2017 / 17:09

Let's try to match the beginning of each word. There are 2 types of words in Pascal Notation :

  • Words started in uppercase, followed by at least one lower case


    In this case, we are only interested in verifying that it is followed by a lowercase letter (it is the only relevant thing to put a hyphen before and lower case).

    Although there could also be digits between the two letters, and we added it:


  • Acronyms (consecutive capital letters).


    Matches 1 upper case, followed by uppercase or digits [A-Z][A-Z\d]* .

    But also, that is followed by another capital letter or the end of the text (?=[A-Z]|$) .
    That way, we avoid consuming the next word. For example,

    • That matches HTML in HTMLFormateado .
    • But also with HTML in FormatoHTML .

  • Putting the two previous expressions together in one, we are left with:


    This expression already matches all cases. If we replace with r"-\g<0>" (a hyphen followed by the text that matched), we have:

    >>> import re
    >>> re.sub(r"[A-Z](?:[A-Z\d]*(?=[A-Z]|$)|\d*[a-z])", r"-\g", "FormatoHTMLConCSS")

    Do not insert scripts at the beginning of the text

    To avoid inserting hyphens at the beginning, we will pass a function as an argument to check, in each replacement, if match.start() is 0 . If it is the first word (it starts at position 0), we do not use a script, otherwise we precede a script.

    Within the function, we use str.lower() to take to lowercase.

    import re
    patron = r"[A-Z]\d*(?:[A-Z\d]*(?=[A-Z]|$)|[a-z])"
    pascal = re.compile(patron)
    def pascal_kebab(cadena):
        def insertar_separador(match):
            return ("-" if match.start() else "") +
        return pascal.sub(insertar_separador, cadena)

    Final code

    Convert from PascalCase to kebab-case.
    We use exactly the same logic as in the last code, with a lambda.

    • When using a single regex, and not relying on lookbehinds, this feature has a better performance (30 % to 100% faster) than commonly used functions.
    import re
    pascal = re.compile(r"[A-Z]\d*(?:[A-Z\d]*(?=[A-Z]|$)|[a-z])")
    def pascal_kebab(cadena):
        return pascal.sub(lambda m: ("-" if m.start() else "") +, cadena)


    pruebas = ['VerHTMLDePag', 'Ver2HTMLDePag', 'Ver2HTMLPag2Info', 'HTMLFomatoPag',
               'HTMLConXML',   'HTML5FomatoPag','HTML5ConXML',      'HTML5ConCSS3',
               'HTML',         'VerQ',          'A2BFormato',       'Formato',
    for prueba in pruebas:
        print("%-16s => %s" % (prueba, pascal_kebab(prueba)))


    VerHTMLDePag     => ver-html-de-pag
    Ver2HTMLDePag    => ver2-html-de-pag
    Ver2HTMLPag2Info => ver2-html-pag2-info
    HTMLFomatoPag    => html-fomato-pag
    HTMLConXML       => html-con-xml
    HTML5FomatoPag   => html5-fomato-pag
    HTML5ConXML      => html5-con-xml
    HTML5ConCSS3     => html5-con-css3
    HTML             => html
    VerQ             => ver-q
    A2BFormato       => a2b-formato
    Formato          => formato
    SFormato         => s-formato



    answered by 08.04.2017 в 17:31