Issue
I have a large list of about thousand data frames and want to filter all the rows from between two values which contain a string pattern VALUE1 and VALUE2 in the column Z. Basically from a data frame like this:
weight | height | Z
---------------------------
62 100 NA
65 89 NA
59 88 randomnumbersVALUE1randomtext
66 92 NA
64 90 NA
64 87 randomnumbersVALUE2randomtext
57 84 NA
I would like to get a data frame like this:
59 88 randomnumbersVALUE1randomtext
66 92 NA
64 90 NA
64 87 randomnumbersVALUE2randomtext
So that the na values from before the value in Z column containing a string pattern "VALUE1" and after the value in Z column containing a string pattern "VALUE2" would be filtered. I tried code like this, but didn't get it to work:
for(i in 1:length(df)){
df[[i]] <- filter(df[[i]], between(Z, grep("VALUE1", Z), grep("VALUE2", Z)))
}
Also as a bonus question, if my data frame has multiple similar sequences/occasions in which column Z has a string pattern containing VALUE1 and VALUE2 (only these two patterns matter), then how can I filter all the rows from between those as well..? For example:
weight | height | Z
---------------------------
62 100 NA
65 89 NA
59 88 randomnumbersVALUE1randomtext
66 92 NA
64 90 NA
64 87 randomnumbersVALUE2randomtext
57 84 NA
68 99 NA
59 82 NA
60 87 srebmunmodnarVALUE1txetmodnar
61 86 NA
63 84 srebmunmodnarVALUE2txetmodnar
And after filtering I would get:
59 88 randomnumbersVALUE1randomtext
66 92 NA
64 90 NA
64 87 randomnumbersVALUE2randomtext
60 87 srebmunmodnarVALUE1txetmodnar
61 86 NA
63 84 srebmunmodnarVALUE2txetmodnar
Solution
Assuming each data.frame has a start and stop value. If not, you can modify this with special handling
lapply(df, function(x){
start <- which(grepl("VALUE1", x$z))
end <- which(grepl("VALUE2", x$z))
rows <- unlist(lapply(seq_len(start), function(y){start[y]:end[y]}))
return(df[rows,])})
Answered By - Daniel V Answer Checked By - Marilyn (WPSolving Volunteer)