Monday, October 10, 2022

[SOLVED] rsync rename duplicated files in dest directory

October 10, 2022 bash, file-transfer, rsync, ubuntu

Issue

I have implemented a rsync based system to move files from different environments to others.

The problem I'm facing now is that sometimes, there are files with the same name, but different path and content.

I want to make rsync (if possible) rename duplicated files because I need and use --no-relative option.

Duplicated files can occur in two ways:

There was a file with same name in dest directory already.
In the same rsync execution, we are transferring file with same name in a different location. Ex: dir1/file.txt and dir2/file.txt

Adding -b --suffix options, allows me to have at least 1 repetition for the first duplicated file's type mentioned.

A minimum example (for Linux based systems):

mkdir sourceDir1 sourceDir2 sourceDir3 destDir;
echo "1" >> sourceDir1/file.txt;
echo "2" >> sourceDir2/file.txt;
echo "3" >> sourceDir3/file.txt;
rsync --no-relative sourceDir1/file.txt destDir
rsync --no-relative -b --suffix="_old" sourceDir2/file.txt sourceDir3/file.txt destDir

Is there any way to achieve my requirements?

Solution

I don't think that you can do it directly with rsync.

Here's a work-around in bash that does some preparation work with find and GNU awk and then calls rsync afterwards.

The idea is to categorize the input files by "duplicate number" (for example sourceDir1/file.txt would be the dup #1 of file.txt, sourceDir2/file.txt the dup #2 and sourceDir3/file.txt the dup #3) and generate a file per "duplicate number" containing the list of all the files in that category.
Then, you just have to launch an rsync with --from-file and a customized --suffix per category file.

Pros

fast: incomparable to firing one rsync per file.
safe: it won't ever overwrite a file (see the step #3 below).
robust: handles any filename, even with newlines in it.

Cons

the destination directory have to be empty (or else it might overwrite a few files).
the code is a little long (and I made it longer by using a few process substitutions and by splitting the awk call into two).

Here are the steps:

0) Use a correct shebang for bash in your system.

#!/usr/bin/env bash

1) Create a directory for storing the generated files.

tmpdir=$( mktemp -d ) || exit 1

2) Categorize the input files by "duplicate number", generate the files for rsync --from-file (one per dup category), and get the total number of categories.

read filesCount < <(

    find sourceDir* -type f -print0 |

    LANG=C gawk -F '/' '
        BEGIN {
            RS = ORS = "\0"
            tmpdir = ARGV[2]
            delete ARGV[2]
        }
        {
            id = ++seen[$NF]
            if ( ! (id in outFiles) ) {
                outFilesCount++
                outFiles[id] = tmpdir "/" id
            }
            print $0 > outFiles[id]
        }
        END {
            printf "%d\n", outFilesCount
        }
    ' - "$tmpdir"
)

3) Find a unique suffix — generated using a given set of chars — for rsync --suffix => the string shall be appended to it.
_{note: You can skip this step if you know for sure that there's no existing filename that ends with _old+number.}

(( filesCount > 0 )) && IFS='' read -r -d '' suffix < <(

    LANG=C gawk -F '/' '
        BEGIN {
            RS = ORS = "\0"
            charsCount = split( ARGV[2], chars)
            delete ARGV[2]
            for ( i = 1; i <= 255; i++ )
                ord[ sprintf( "%c", i ) ] = i
        }
        {
            l0 = length($NF)
            l1 = length(suffix)
            if  ( substr( $NF, l0 - l1, l1) == suffix ) {
                n = ord[ substr( $NF, l0 - l1 - 1, 1 ) ]
                suffix = chars[ (n + 1) % charsCount ] suffix
            }
        }
        END {
            print suffix
        }
    ' "$tmpdir/1" '0/1/2/3/4/5/6/7/8/9/a/b/c/d/e/f'
)

4) Run the rsync(s).

for (( i = filesCount; i > 0; i-- ))
do
    fromFile=$tmpdir/$i
    rsync --no-R -b --suffix="_old${i}_$suffix" -0 --files-from="$fromFile" ./ destDir/
done

5) Clean-up the temporary directory.

rm -rf "$tmpdir"

Answered By - Fravadona

Answer Checked By - Timothy Miller (WPSolving Admin)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0