Issue
I have a number of files named in this way:
1223_1_myCount.txt
1223_1_myCount2.txt
1223_2_myStatistic.txt
1223_2_myDiscarded.txt
1223_3_myExample.txt
1223_3_myStatistic.txt
................
For a total of 1000 couple of files. Is there a way to combine the content of the files two by two by matching 1, equally for 2, for 3 that is the only one part of the file name in common in the couple? For each combination of contents I would like also to write it on a file.
Solution
There may be more elegant ways to do this, but the following approach should work:
library(readtext)
library(dplyr)
library(purrr)
library(stringr)
Get file names from your working directory. The following code assumes only files of interest exist there. Otherwise you can filter with
grep(".txt",file_names)
etc.
file_names <- list.files(full.names = F)
Read files from the list
file_names
into a dataframedf
as rows usingmap_dfr
frompurrr
df <- map_dfr(file_names,readtext)
Create a new variable
file_index
to be used withgroup_by
for concatenating text from files with identicalfile_index
value (1, 2, or 3). Usestr_c
to collapse strings. You can change the pattern forcombo_file_name
withinpaste
if you desire a different way to name the files containing the combined text
combo_data <- df %>% mutate(file_index = sapply(strsplit(doc_id, "_"), "[", 2)) %>% group_by(file_index) %>%
summarize(combo_file_name = paste("combo_file",unique(file_index),sep="_") , combo_text = str_c(text, collapse = ", ")) %>%
ungroup() %>% select(combo_file_name,combo_text)
Create a function to write files using
combo_data
as input and save files as combo_1.txt, combo_2.txt etc.
write_file <- function (df_in){
fileConn <- file(paste(df_in[1],".txt",sep=""))
writeLines(df_in[2], fileConn)
close(fileConn)
}
apply(combo_data,1,write_file)
Use
getwd()
to find the working directory where the combined files are saved
Answered By - sachin2014 Answer Checked By - Marilyn (WPSolving Volunteer)