Delete only the final "O" but respect the letter in "HELLO"

1

I am interested in eliminating the O of each word, except that that word is HELLO .

I'm trying this but it does not work for me:

a <- c("HELLO DO","DO HELLO XO","HO")

gsub("[^HELLO]O\>","",a)

[1] "HELLO "  " HELLO " "HO"
    
asked by dogged 05.12.2017 в 05:44
source

1 answer

3

1. Coinciding also with HELLO

The solution for this case is:

  • effectively match the word HELLO , and use it as a replacement text (not replace the end of the page).

    To match the word , we use \<HELLO\> . \< and \> match the position at the beginning and end of a word respectively.

    But, also, we use a group, we put it in parentheses as (\<HELLO\>) to save the text with which it coincided. Then, in the replacement, we use to re-insert it.

  • or match the letter O at the end of the word. That is, O\> .

    When it matches this part of the regex, we are not using parentheses, so no text is saved, and is going to be empty, making the replacement be by "" (deleting that letter).


  • Regular expression:

    (\<HELLO\>)|O\>
    

    Replacement:

    
    

    Code:

    a <- c("HELLO DO","DO HELLO XO","XO","DOO CHELLO HELLOS HELLO BOOOO GO")
    
    gsub("(\<HELLO\>)|O\>","\1",a)
    

    Result:

    [1] "HELLO D"                      "D HELLO X"                   
    [3] "X"                            "DO CHELL HELLOS HELLO BOOO G"
    

    Demo:

    link


    2. Or that is not preceded

    Another alternative is to use a negative backward inspection ( negative lookbehind ) to make sure that the O is not preceded by \<HELL .

    (?<!\bHELL)O\b
    

    But, for this, we have to use the Perl syntax (passing perl=TRUE in the last parameter), which offers much more powerful regular expressions.

    In Perl, there is no \< or \> , but \b matches the start or end of a word.


    Code:

    a <- c("HELLO DO","DO HELLO XO","XO","DOO CHELLO HELLOS HELLO BOOOO GO")  
    gsub("(?<!\bHELL)O\b","",a, perl=TRUE)
    


    What was wrong with what you tried?

    [^HELLO] is a class of characters denied. That implies that it matches 1 character, any character except H , E , L u O . It is not a way to deny how you were trying to do it. And there is no direct way to do it, since regular expressions are rules to match the text, not the other way around. The solutions to these cases are always alternatives like the one raised above or using negative backward inspections ( negative lookbehinds ).

        
    answered by 05.12.2017 / 09:35
    source