Saturday, March 26, 2022

[SOLVED] How to filter all rows between two values containing a certain pattern for a list of data frames in R?

March 26, 2022 between, filter, grep, pattern-matching, r

Issue

I have a large list of about thousand data frames and want to filter all the rows from between two values which contain a string pattern VALUE1 and VALUE2 in the column Z. Basically from a data frame like this:

weight | height | Z
---------------------------
62      100      NA
65      89       NA
59      88       randomnumbersVALUE1randomtext
66      92       NA
64      90       NA
64      87       randomnumbersVALUE2randomtext
57      84       NA

I would like to get a data frame like this:

59      88       randomnumbersVALUE1randomtext
66      92       NA
64      90       NA
64      87       randomnumbersVALUE2randomtext

So that the na values from before the value in Z column containing a string pattern "VALUE1" and after the value in Z column containing a string pattern "VALUE2" would be filtered. I tried code like this, but didn't get it to work:

for(i in 1:length(df)){
  df[[i]] <- filter(df[[i]], between(Z, grep("VALUE1", Z), grep("VALUE2", Z)))
}

Also as a bonus question, if my data frame has multiple similar sequences/occasions in which column Z has a string pattern containing VALUE1 and VALUE2 (only these two patterns matter), then how can I filter all the rows from between those as well..? For example:

weight | height | Z
---------------------------
62      100      NA
65      89       NA
59      88       randomnumbersVALUE1randomtext
66      92       NA
64      90       NA
64      87       randomnumbersVALUE2randomtext
57      84       NA
68      99       NA
59      82       NA
60      87       srebmunmodnarVALUE1txetmodnar
61      86       NA
63      84       srebmunmodnarVALUE2txetmodnar

And after filtering I would get:

59      88       randomnumbersVALUE1randomtext
66      92       NA
64      90       NA
64      87       randomnumbersVALUE2randomtext
60      87       srebmunmodnarVALUE1txetmodnar
61      86       NA
63      84       srebmunmodnarVALUE2txetmodnar

Solution

Assuming each data.frame has a start and stop value. If not, you can modify this with special handling

lapply(df, function(x){
       start <- which(grepl("VALUE1", x$z))
       end   <- which(grepl("VALUE2", x$z))
       rows  <- unlist(lapply(seq_len(start), function(y){start[y]:end[y]}))
       return(df[rows,])})

Answered By - Daniel V

Answer Checked By - Marilyn (WPSolving Volunteer)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, March 26, 2022

[SOLVED] How to filter all rows between two values containing a certain pattern for a list of data frames in R?

Issue

Solution

Popular Posts

Labels