Thursday, February 3, 2022

[SOLVED] sed find and replace, but only in text blocks between matching patterns

Issue

I have a SQL dump file. There are many tables, one of which contains a column I would like to set to null before final archiving.

The dump file has a structure like:

COPY public.my_target_table (id, name, content_type_id, codename) FROM stdin;
1   Can add association 1   add_association
2   Can change association  1   change_association
<..................>


COPY public.table_2 (id, name, content_type_id, codename) FROM stdin;
1   Can add association 1   add_association
2   Can change association  1   change_association

The values are tab-separated. My initial idea was to sed find/replace between the appropriate indices, but I realized I will need to only do so for one specific table (otherwise I am wiping data from all tables, at that index).

Is there a way to tell sed to only operate once it has passed the string my_target_table, and disable when it has reached the next blank line?

edit for clarity

Using the example above, the goal is to output:

COPY public.my_target_table (id, name, content_type_id, codename) FROM stdin;
1   \N  1   add_association
2   \N  1   change_association
<..................>


COPY public.table_2 (id, name, content_type_id, codename) FROM stdin;
1   Can add association 1   add_association
2   Can change association  1   change_association

...assuming my memory is correct and \N means null. Ideally the answer can be extended to any target table and column index (or even better, target table and column name :) ). Also, I'm not limited to sed - I have heard awk might be a useful tool here as well?


Solution

With sed how about:

tab=$'\t'
sed "
/my_target_table/ { n
:l
s/^\\([^$tab]*$tab\\)[^$tab]*/\\1\\\\N/
n
/[^[:blank:]]/b l
}" file

Output for the provided sample:

COPY public.my_target_table (id, name, content_type_id, codename) FROM stdin;
1       \N      1       add_association
2       \N      1       change_association
<..................>


COPY public.table_2 (id, name, content_type_id, codename) FROM stdin;
1       Can add association     1       add_association
2       Can change association  1       change_association
  • Considering the posix sed which does not support the \t notation, assign a shell variable tab to a tab character at the beginning.
  • if the pattern /my_target_table/ is found, execute the sed command between { and }.
  • :l creates a label for b command.
  • s/^\\([^$tab]*$tab\\)[^$tab]*/\\1\\\\N/ replaces the second field of the line separated by tab with a string \N.
  • n prints out the current pattern space and read the next line.
  • /[^[:blank:]]/b l loops to the label l until the line is blank.

Hope this helps.



Answered By - tshiono
Answer Checked By - Candace Johnson (WPSolving Volunteer)