Wednesday, January 10, 2024

[SOLVED] Replace substrings (fields) of lines in file2 by substrings (fields) of corresponding lines of file1

January 10, 2024 awk, sed

Issue

I need to modify large files as described below:

Content of file1 is like this:

> /make output fileID 1234 name "key string 1" first 1 begin
> /make output fileID 567 name "other key string" middle 1 continue
> /make output fileID 890 name "last key string" final 1 end

Content of file2 is like this:

dummyline0
somestring1 fileID AAA name "key string 1" first 1 begin
dummyline1
somestring2 fileID BBBBB name "other key string" middle 1 continue
dummyline2
dummyline3
somestring2 fileID CCCCCC name "last key string" final 1 end

For each line of file1 I want to find corresponding line in file2 which has identical part behind 'name' keyword and then I need to make following manipulation in file2:

duplicate line found (there now will be two instances of line)
comment out the first instance with # char at the begining followed by 'commentTExt: '
modify the second instance of line: the fileID in file2 replace with fileID from file1

Result in file2 should looks like this:

dummyline0
#commentText: somestring1 fileID AAA name "key string 1" first 1 begin
somestring1 fileID 1234 name "key string 1" first 1 begin
dummyline1
#commentText: somestring2 fileID BBBBB name "other key string" middle 1 continue
somestring2 fileID 567 name "other key string" middle 1 continue
dummyline2
dummyline3
#commentText: somestring2 fileID CCCCCC name "last key string" final 1 end
somestring2 fileID 890 name "last key string" final 1 end

Note: Lines in file1 and lines to be modified in file2 has the same formatting (fields position is still the same for every line in given file). Only one occurence of string behind 'name' keyword exists in file1 & file2. If it would be too complcated, line duplication & adding comment in file2 may be ommited.

Preferably with AWK or sed ... Could anybody help please? Thanks

Solution

A Perl idea.

#!/usr/bin/perl
use strict;
use warnings;

die "Usage: $0 file1 file2\n" unless @ARGV==2;

my ($re, $id, %h);

# regex builder to avoid repetition
sub mkre { qr/^(.* fileID )(\S+)(.* name ($_[0]).*)$/ }

# process file1
$re = mkre(qr/"[^"]+"/);
while (<>) {
    # look for id/name pairs
    # convert name to RE, quoting metacharacters
    # store RE=>id pairs in hash for later use
    $h{ mkre(qr/\Q$4\E/) } = $2 if m/$re/;

    # terminate loop after processing file1
    last if eof;
}

# process file2
while (<>) {
    while ( ($re,$id) = each %h ) {
        # if substitution succeeded, we're done with this line
        if ( s/$re/#commentTExt: $&\n$1$id$3/ ) {
            # there can be only one match,
            # so this regex won't be needed again
            delete $h{$re};
            last;
        }
    }
    print;

    # reset "each" iterator
    keys %h;
}

AWK is slightly more long-winded.

awk '
    # process file1
    NR==FNR {
        # extract id
        for (i=1;i<=NF;i++)  
            if ($i=="fileID") { id=$(++i); break; }

        # extract name
        split($0,a,/"/)
        name = "\"" a[2] "\""

        if (name in h) printf "warning: duplicate name: %s\n", name

        # store for later lookup
        h[name] = id
        next
    }

    # process file2
    {
        # attempt substitution
        for (name in h) {
            if ($0 ~ name) {
                # matched - output comment and prep new version
                print "#commentTExt: " $0
                t = " fileID " h[name] 
                sub(/ fileID [A-Z]+/,t)
                break
            }
        }

        # output possibly modified line
        print
    }
' file1 file2

Answered By - jhnc

Answer Checked By - David Goodson (WPSolving Volunteer)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, January 10, 2024

[SOLVED] Replace substrings (fields) of lines in file2 by substrings (fields) of corresponding lines of file1

Issue

Solution

Popular Posts

Labels