Issue
I have two directories with files that end in two different extensions:
Folder A called profile (1204 FILES)
file.fasta.profile
file1.fasta.profile
file2.fasta.profile
Folder B called dssp (1348 FILES)
file.dssp
file1.dssp
file2.dssp
file3.dssp #<-- odd one out
I have some files in folder B
that are not found in folder A
and should be removed for example file3.profile
would be deleted as it is not found in folder A
. I just want to retain those that are common in their filename, but excluding extension to end up with 1204 files in both
I saw some bash lines using diff but it does not consider this case, where the ones I want to remove are those that are not found in the corresponding other file.
Solution
Python version:
EDIT: now suports multiple extensions
#!/usr/bin/python3
import glob, os
def removeext(filename):
index = filename.find(".")
return(filename[:index])
setA = set(map(removeext,os.listdir('A')))
print("Files in directory A: " + str(setA))
setB = set(map(removeext,os.listdir('B')))
print("Files in directory B: " + str(setB))
setDiff = setA.difference(setB)
print("Files only in directory A: " + str(setDiff))
for filename in setDiff:
file_path = "A/" + filename + ".*"
for file in glob.glob(file_path):
print("file=" + file)
os.remove(file)
Does pretty much the same as my bash version above.
- list files in A
- list files in B
- get the list of differences
- delete the differences from A
Test output, done on Linux Mint, bash 4.4.20
mint:~/SO$ l
drwxr-xr-x 2 Nic3500 Nic3500 4096 May 10 10:36 A/
drwxr-xr-x 2 Nic3500 Nic3500 4096 May 10 10:36 B/
mint:~/SO$ l A
total 0
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file1.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file2.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:14 file3.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:36 file4.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file.fasta.profile
mint:~/SO$ l B
total 0
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:05 file1.dssp
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file2.dssp
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file3.dssp
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:05 file.dssp
mint:~/SO$ ./so.py
Files in directory A: {'file1', 'file', 'file3', 'file2', 'file4'}
Files in directory B: {'file1', 'file', 'file3', 'file2'}
Files only in directory A: {'file4'}
file=A/file4.fasta.profile
mint:~/SO$ l A
total 0
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file1.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file2.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:14 file3.fasta.profile
-rw-r--r-- 1 Nic3500 Nic3500 0 May 10 10:06 file.fasta.profile
Answered By - Nic3500 Answer Checked By - Katrina (WPSolving Volunteer)