Issue
I have text:
Fishing tour.png
Fishing tour2.png
One Day Tour.png
Sydney Adventure 3 Days Package.png
Sydney Northern Beaches Scenic Flight by Seaplane.png
Week trip.png
Weekend trip to Baku.png
Fishing tour a0HR000000B4wMUMAZ
One Day Tour a0HR000000B4wMVMAZ
Sydney Adventure 3 Days Package a0HR000000B4wMWMAZ
Sydney Northern Beaches Scenic Flight by Seaplane a0HR000000B4wSuMAJ
Week trip a0HR000000B4wMTMA
Weekend trip to Baku a0HR000000B4wMSMAZ
I want to make two conditions.
- Replace identical words without extension by words with extension, can be any extension (
'Fishing tour'
replace on'Fishing tour.png'
) and remain Id as it is. - Words with appended number(0-9) put on new line with id matching to previous word identical to it (
'Fishing tour'
replace on'Fishing tour2.png'
and remain id of'Fishing tour'
, so this id happens twice in text).
So, desired output should be:
Fishing tour.png a0HR000000B4wMUMAZ
Fishing tour2.png a0HR000000B4wMUMAZ(id is identical as on the line above)
One Day Tour.png a0HR000000B4wMVMAZ
Sydney Adventure 3 Days Package.png a0HR000000B4wMWMAZ
Sydney Northern Beaches Scenic Flight by Seaplane.png a0HR000000B4wSuMAJ
Week trip.png a0HR000000B4wMTMAZ
Weekend trip to Baku.png a0HR000000B4wMSMAZ
Please, help If someone know how to do this using awk
or sed
or any other tool that's available in Bash.
My code attempt:
sed -E 's/(.+([0-9])?\..+)|.+[[:space:]]([a-zA-Z0-9]+)/\1\3/g'
Solution
$ cat tst.awk
{
if ( match($0,/[0-9]*\.[^.]+ *$/) ) {
key = substr($0,1,RSTART-1)
vals[key,++numVals[key]] = $0
}
else {
key = $0
sub(/ +[^ ]+ *$/,"",key)
ids[key] = $NF
}
}
END {
for ( key in ids ) {
for ( i=1; i<=numVals[key]; i++ ) {
print vals[key,i], ids[key]
}
}
}
$ awk -f tst.awk file
Weekend trip to Baku.png a0HR000000B4wMSMAZ
Fishing tour.png a0HR000000B4wMUMAZ
Fishing tour2.png a0HR000000B4wMUMAZ
Week trip.png a0HR000000B4wMTMA
One Day Tour.png a0HR000000B4wMVMAZ
Sydney Northern Beaches Scenic Flight by Seaplane.png a0HR000000B4wSuMAJ
Sydney Adventure 3 Days Package.png a0HR000000B4wMWMAZ
If the output order matters it's an easy tweak.
The above assumes that all of the key values can fit in memory.
Answered By - Ed Morton Answer Checked By - Willingham (WPSolving Volunteer)