Issue
Let's consider 2 text file, one 'main_list', and one 'ignore_list'. For each line in the ignore_list, I want to remove the line starting with that string in the main_line.
basically something doable with sed and a while loop.
E.g.
while read line; do echo ^$line; sed -i "/^$line/d" ./main_list; done < ./ignore_list
In a better way, I wanted to first create the sed pattern and then run it once:
while read line; do
if [ $SED_PATTERN="" ]; then
SED_PATTERN="^$line"
else
SED_PATTERN=$SED_PATTERN"\|^$line"
fi
done < ./ ignore_list
echo $SED_PATTERN
sed -i "/$SED_PATTERN/d" ./main_list
unfortunately, because of the sub shell used by the while loop, it does not work.
A variable modified inside a while loop is not remembered and https://mywiki.wooledge.org/BashFAQ/024 are giving worthful explanations and workaround. I haven't managed it yet to get one working in a simple way.
Ideally, I want to use the sh shell (the script will run in a gitlab pipeline with a simple alpine image)
Any idea to keep it simple before I move to a python script (and use a fat image instead of alpine - in between, I can also use one with bash)
Maybe another approach than sed and the while loop?
Thanks.
edit: some more context about the content of both files: I am dealing with a list of debian packages installed from a build step. The main_list is then an output of a dpkg-query command (see below), so should not contain too fancy characters. The ignore_list contains the packages I want to ignore for another post processing step, containing internal components not relevant for my output.
Here a small extract of both files
main_list
e2fsprogs|1.46.2-2|e2fsprogs|1.46.2-2
ebtables|2.0.11-4|ebtables|2.0.11-4
edgeonboarding-config|0.1|edgeonboarding-config|0.1
efibootguard|0.13+cip|efibootguard|0.13+cip
ethtool|1:5.9-1|ethtool|1:5.9-1
for the ignore_list
edgeonboarding-config
You can generate the main_list on any linux system by running
dpkg-query -f '${source:Package}|${source:Version}|${binary:Package}|${Version}\n' -W > main_list
and for the ignore_list, just pick-up a few string from the main_list (begining of the lines)
EDIT2: anyway, my initial idea with a while loop is not necessary. I just need
- one sed command over ignore_list to replace any line $myline and return carriage with ^$myline|
- set the output as SED_PATTERN
- and set run another sed command: sed -i "/$SED_PATTERN/d" ./main_list
Solution
Using any POSIX awk given the input/output you've recently added to your question:
awk -F'|' '
NR==FNR {
sub(/[[:space:]]+$/,"")
ign[$0]
next
}
!($1 in ign)
' ignore_list main_list
That is doing a literal full string comparison against just the first |
-separated field of each line.
If you were to use sed and/or grep for this then you'd need to escape all possible regexp metachars in ignore_list
first, see is-it-possible-to-escape-regex-metacharacters-reliably-with-sed.
Original answer before you showed us sample input/output:
Using any POSIX awk (untested due to no sample input/output provided):
awk '
NR==FNR {
sub(/[[:space:]]+$/,"")
ign[$0]
next
}
{
for ( str in ign ) {
if ( index($0,str) == 1 ) {
next
}
}
}
' ignore_list main_list
That is doing a literal substring string comparison against just the start of each line.
If you were to use sed and/or grep for this then you'd need to escape all possible regexp metachars in ignore_list
first, see is-it-possible-to-escape-regex-metacharacters-reliably-with-sed.
Answered By - Ed Morton Answer Checked By - Candace Johnson (WPSolving Volunteer)