Replace rows according to criteria


I would like to find a simpler way to make this script.  I have a dataframe:


 species level:
 Site      Date    Habitat    Season    Year     Taxa
 Q1F    08_09_2015  Oak     Autumn   2015-2016   Adonis_flammea
 Q2F    08_09_2015  Oak     Autumn   2015-2016   Agrimonia_eupatoria
 Q4F    08_09_2015  Oak     Autumn   2015-2016   Ajuga_chamaepitys
 Q1P    08_09_2015  Oak     Autumn   2015-2016   Ajuga
 Q2P    08_09_2015  Oak     Autumn   2015-2016   Allium_sativum

I want to replace the names of the taxa column, which is composed of a genre_species, by only genres. Until now, it was done individually by individual, changing them.


Getting these results:

 species level:
 Site      Date    Habitat    Season    Year     Taxa
 Q1F    08_09_2015  Oak     Autumn   2015-2016   Adonis
 Q2F    08_09_2015  Oak     Autumn   2015-2016   Agrimonia
 Q4F    08_09_2015  Oak     Autumn   2015-2016   Ajuga
 Q1P    08_09_2015  Oak     Autumn   2015-2016   Ajuga
 Q2P    08_09_2015  Oak     Autumn   2015-2016   Allium

Is there a faster way?

asked by Adrián P.L. 15.11.2017 в 20:38

1 answer


Actually what you are looking for is to substitute one value for another and you would not need to use gsub so you can put together a data.frame of replacement values so that it is a little simpler. I will use the data from your previous question as an example:

Specieslevel <- read.table(text="Site Date Habitat Season Year Taxa
Q1F 08_09_2015 Oak Autumn 2015-2016 Artemisia_herba_alta
Q2F 08_09_2015 Oak Autumn 2015-2016 Artemisia_herba_alta
Q4F 08_09_2015 Oak Autumn 2015-2016 Allium
Q1P 08_09_2015 Oak Autumn 2015-2016 Artemisia_herba_alta
Q2P 08_09_2015 Oak Autumn 2015-2016 Amaranthus
Q4P 08_09_2015 Oak Autumn 2015-2016 Anacyclus
Q4P 08_09_2015 Oak Autumn 2015-2016 Asparagus
Q4P 08_09_2015 Oak Autumn 2015-2016 Amaranthus_retroflex", sep=" ", header=TRUE, stringsAsFactors=FALSE)

species = data.frame(value     = c('Artemisia_herba_alta', 'Amaranthus_retroflex'), 
                     replaceby = c('Artemisia', 'Amaranthus'),

matches <- match(Specieslevel$Taxa, species$value, nomatch=0)
Specieslevel$Taxa[matches>0] <- species$replaceby[matches] 

The final result:

  Site       Date Habitat Season      Year       Taxa
1  Q1F 08_09_2015     Oak Autumn 2015-2016  Artemisia
2  Q2F 08_09_2015     Oak Autumn 2015-2016  Artemisia
3  Q4F 08_09_2015     Oak Autumn 2015-2016     Allium
4  Q1P 08_09_2015     Oak Autumn 2015-2016  Artemisia
5  Q2P 08_09_2015     Oak Autumn 2015-2016 Amaranthus
6  Q4P 08_09_2015     Oak Autumn 2015-2016  Anacyclus
7  Q4P 08_09_2015     Oak Autumn 2015-2016  Asparagus
8  Q4P 08_09_2015     Oak Autumn 2015-2016 Amaranthus


  • We create a data.frame species that will contain the search and replacement values. It should work also if the variables are factors. I define it as data.frame to make the code more readable, but eventually it could be a matrix to write a little less.
  • With match(Specieslevel$Taxa, species$value, nomatch=0) we get a vector of the size of Specieslevel where we will have the row of the replacement data or 0 in case of mismatch
  • We apply these matches replacing only those that correspond: Specieslevel$Taxa[matches>0] <- species$replaceby[matches]

What happens if what we want to modify from the dataframe is a Factor ? , well, the previous code does not work since in the same one we operate on the complete column, and now only you should work on levels . The solution is even simpler and faster:

sp <- levels(Specieslevel$Taxa)
matches <- match(sp, species$value, nomatch=0) 
sp[matches>0] <- species$replaceby[matches] 
levels(Specieslevel$Taxa) <- sp

In this case we first create a vector of the levels of the column ( sp <- levels(Specieslevel$Taxa) ) and the match and replace we do on it. Then what we need is to redefine the levels of the column doing levels(Specieslevel$Taxa) <- sp

answered by 15.11.2017 / 21:24