Issue
I need to modify large files as described below:
Content of file1 is like this:
> /make output fileID 1234 name "key string 1" first 1 begin
> /make output fileID 567 name "other key string" middle 1 continue
> /make output fileID 890 name "last key string" final 1 end
Content of file2 is like this:
dummyline0
somestring1 fileID AAA name "key string 1" first 1 begin
dummyline1
somestring2 fileID BBBBB name "other key string" middle 1 continue
dummyline2
dummyline3
somestring2 fileID CCCCCC name "last key string" final 1 end
For each line of file1 I want to find corresponding line in file2 which has identical part behind 'name' keyword and then I need to make following manipulation in file2:
- duplicate line found (there now will be two instances of line)
- comment out the first instance with # char at the begining followed by 'commentTExt: '
- modify the second instance of line: the fileID in file2 replace with fileID from file1
Result in file2 should looks like this:
dummyline0
#commentText: somestring1 fileID AAA name "key string 1" first 1 begin
somestring1 fileID 1234 name "key string 1" first 1 begin
dummyline1
#commentText: somestring2 fileID BBBBB name "other key string" middle 1 continue
somestring2 fileID 567 name "other key string" middle 1 continue
dummyline2
dummyline3
#commentText: somestring2 fileID CCCCCC name "last key string" final 1 end
somestring2 fileID 890 name "last key string" final 1 end
Note: Lines in file1 and lines to be modified in file2 has the same formatting (fields position is still the same for every line in given file). Only one occurence of string behind 'name' keyword exists in file1 & file2. If it would be too complcated, line duplication & adding comment in file2 may be ommited.
Preferably with AWK or sed ... Could anybody help please? Thanks
Solution
A Perl idea.
#!/usr/bin/perl
use strict;
use warnings;
die "Usage: $0 file1 file2\n" unless @ARGV==2;
my ($re, $id, %h);
# regex builder to avoid repetition
sub mkre { qr/^(.* fileID )(\S+)(.* name ($_[0]).*)$/ }
# process file1
$re = mkre(qr/"[^"]+"/);
while (<>) {
# look for id/name pairs
# convert name to RE, quoting metacharacters
# store RE=>id pairs in hash for later use
$h{ mkre(qr/\Q$4\E/) } = $2 if m/$re/;
# terminate loop after processing file1
last if eof;
}
# process file2
while (<>) {
while ( ($re,$id) = each %h ) {
# if substitution succeeded, we're done with this line
if ( s/$re/#commentTExt: $&\n$1$id$3/ ) {
# there can be only one match,
# so this regex won't be needed again
delete $h{$re};
last;
}
}
print;
# reset "each" iterator
keys %h;
}
AWK is slightly more long-winded.
awk '
# process file1
NR==FNR {
# extract id
for (i=1;i<=NF;i++)
if ($i=="fileID") { id=$(++i); break; }
# extract name
split($0,a,/"/)
name = "\"" a[2] "\""
if (name in h) printf "warning: duplicate name: %s\n", name
# store for later lookup
h[name] = id
next
}
# process file2
{
# attempt substitution
for (name in h) {
if ($0 ~ name) {
# matched - output comment and prep new version
print "#commentTExt: " $0
t = " fileID " h[name]
sub(/ fileID [A-Z]+/,t)
break
}
}
# output possibly modified line
print
}
' file1 file2
Answered By - jhnc Answer Checked By - David Goodson (WPSolving Volunteer)