Thursday, February 3, 2022

[SOLVED] How can I grep/sed taking find/replace pairs from a file?

Issue

I want to run a find and replace using a series of value pairs taken from a file (or two files, if that makes the task any easier). The find and replace strings are literal ones, not regexes in the practical sense. At the moment the file is tab-delimited, findstring \t replacestring, one pair per line, but I can change that as required.

I know a little about regex but with Unix commands I really need clear "copy and paste" instructions. Earlier in this project I was pleased to discover grep -f to get find strings from a file, but it seems that grep can't do the same thing for the replace strings.

Can I do this with a mixture of grep, sed and so on? The thread above explains how to pipe grep to sed, but then I need to tell sed how to read replace strings from the file.

I'm on macOS (with homebrew) if that makes a difference.


Solution

You can make a file with a list of sed commands like this in a file called commands.sed:

s|cat|cats|g
s|dog|dogs|g
s|person|people|g

and run it on some input with:

echo "House mouse cat dog person dog person" | sed -f commands.sed

and it will replace cat with cats, dog with dogs and person with people producing:

House mouse cats dogs people dogs people

So we want to turn your file with substitutions into a command file like that - using sed! So, if your replacements file subs.txt contains lines like this with the two words on each line separated by a TAB:

cat cats
dog dogs
person  people

That would be:

sed -e 's/^/s|/' -e $'s/\t/|/' -e 's/$/|g/' subs.txt > commands.sed

and then you can apply it with:

sed -f commands.sed SomeFile

Rather than creating a file with the commands in, we can run a process substitution like this to dynamically generate them, and do it all in one go with:

echo "House mouse cat dog person dog person" | sed -f <(sed -e 's/^/s|/' -e $'s/\t/|/' -e 's/$/|g/' subs.txt)


Answered By - Mark Setchell
Answer Checked By - Mildred Charles (WPSolving Admin)