Issue
In my script, I need to store the contents of TEXT('...') from file $CURFILEPATH into a bash variable named $SRCTEXT.
The TEXT('...') variable is included in various files that contain IBM i CLLE commands.
In CLLE, the +
is a continuation character, so ignore that at the end of the line.
The TEXT('...') target might also contain doubled single quotes, like this: TEXT('Bob O''Malley''s favorite DTAARA'). It might also contain other characters like
(,
)`,
Here is a straightforward example of a file where the $SRCTEXT to extract is on a single line:
/* Create and set data area for PHP binary location - 1.0.24 */
CRTDTAARA DTAARA(PHPPATH) TYPE(*CHAR) LEN(255) +
VALUE(' ') TEXT('Path to PHP Binaries')
For that file $SRCTEXT should be "Path to PHP Binaries".
And here is a more difficult example where the TEXT('...') variable stretches across multiple lines, via continuation character +
.
/* Create and set data area for Python binary location - 1.05 */
CRTDTAARA DTAARA(PYPATH) TYPE(*CHAR) LEN(255) +
VALUE('/QOpenSys/pkgs/bin') TEXT('Path to +
Python Binaries')
For that file $SRCTEXT should be "Path to Python Binaries"
Additional edge case example file that uses ''
s and ()
s in the TEXT('...') target
/* Create and set data area for Python binary location - 1.05 */
CRTDTAARA DTAARA(PYPATH) TYPE(*CHAR) LEN(255) +
VALUE('/QOpenSys/pkgs/bin') TEXT('Path to +
Python Language''s Binaries (this is an edge case)')
For that file $SRCTEXT should be "Path to Python Language''s Binaries (this is an edge case)"
Note that the quotes should remain doubled.
Though unlikely, the TEXT('...') variable could stretch across 3 or more lines with continuation characters. It would be nice to handle that, but a 2 line solution is acceptable.
Any Bash solution using awk, sed, grep, etc... is acceptable.
ChatGPT gave me something like grep -oP "(?<=TEXT ')[^']+" $CURFILEPATH
but that wasn't working.
Solution
Using GNU awk for multi-char RS
and RT
(and the \<
word boundary):
$ awk -v RS='\\<TEXT[(]\047(([^\047]|\047\047)+)\047[)]' 'RT{$0=RT; gsub(/^[^\047]+\047|\047[^\047]+$/,""); gsub(/\+\n/,""); gsub(/\047\047/,"\047"); print}' file1
Path to PHP Binaries
$ awk -v RS='\\<TEXT[(]\047(([^\047]|\047\047)+)\047[)]' 'RT{$0=RT; gsub(/^[^\047]+\047|\047[^\047]+$/,""); gsub(/\+\n/,""); gsub(/\047\047/,"\047"); print}' file2
Path to Python Binaries
$ awk -v RS='\\<TEXT[(]\047(([^\047]|\047\047)+)\047[)]' 'RT{$0=RT; gsub(/^[^\047]+\047|\047[^\047]+$/,""); gsub(/\+\n/,""); gsub(/\047\047/,"\047"); print}' file3
This is JDubbTX's Text
$ awk -v RS='\\<TEXT[(]\047(([^\047]|\047\047)+)\047[)]' 'RT{$0=RT; gsub(/^[^\047]+\047|\047[^\047]+$/,""); gsub(/\+\n/,""); gsub(/\047\047/,"\047"); print}' file4
Path to Python Language's Binaries (this is an edge case)
I just noticed you said:
Note that the quotes should remain doubled.
That's not how scripts like this are usually required to work but it's trivial to do if you really want that - if you want the doubled single quotes from the input, Language''s
, to remain doubled instead of single, Language's
, in the output then just remove gsub(/\047\047/,"\047");
from the code.
See https://www.gnu.org/software/gawk/manual/gawk.html#gawk-split-records for information on RS
and RT
, and http://awk.freeshell.org/PrintASingleQuote for what \047
means.
To save the output of any of the above in a shell variable you can do:
$ srctext=$(awk -v RS='\\<TEXT[(]\047(([^\047]|\047\047)+)\047[)]' 'RT{$0=RT; gsub(/^[^\047]+\047|\047[^\047]+$/,""); gsub(/\+\n/,""); gsub(/\047\047/,"\047"); print}' file3)
$ echo "$srctext"
This is JDubbTX's Text
just like you'd save the output of any other Unix command. Don't use all upper case for non-environment (exported) shell variables by the way, see Correct Bash and shell script variable capitalization.
The above was run on these input files:
$ head file1 file2 file3 file4
==> file1 <==
/* Create and set data area for PHP binary location - 1.0.24 */
CRTDTAARA DTAARA(PHPPATH) TYPE(*CHAR) LEN(255) +
VALUE(' ') TEXT('Path to PHP Binaries')
For that file $SRCTEXT should be "Path to PHP Binaries".
==> file2 <==
/* Create and set data area for Python binary location - 1.05 */
CRTDTAARA DTAARA(PYPATH) TYPE(*CHAR) LEN(255) +
VALUE('/QOpenSys/pkgs/bin') TEXT('Path to +
Python Binaries')
==> file3 <==
CRTDTAARA DTAARA(JWEIRICH1/MYDTA) TYPE(*CHAR) LEN(30) TEXT('This is JDubbTX''s Text')
==> file4 <==
/* Create and set data area for Python binary location - 1.05 */
CRTDTAARA DTAARA(PYPATH) TYPE(*CHAR) LEN(255) +
VALUE('/QOpenSys/pkgs/bin') TEXT('Path to +
Python Language''s Binaries (this is an edge case)')
Answered By - Ed Morton Answer Checked By - Mary Flores (WPSolving Volunteer)