Issue
nginx access.log. It is delimited by 1) white space 2) [ ] and 3) double quotes.
::1 - - [12/Oct/2021:15:26:25 +0530] "GET / HTTP/1.1" 200 1717 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"
::1 - - [12/Oct/2021:15:26:25 +0530] "GET /css/custom.css HTTP/1.1" 200 202664 "https://localhost/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"
after parsing it supposed to look like
$1 = ::1
$4 = [12/Oct/2021:15:26:25 +0530] or 12/Oct/2021:15:26:25 +0530
$5 = "GET / HTTP/1.1"
$6 = 200
$7 = 1717
$8 = "-"
$9 = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"
I tried some options like awk -F'[],] *'
awk -f [][{}]
, but they doesn't work with full line.
nginx access.log shared here is just an example. I am trying to understand how to parse with mix of such delimiters for usages in other complex logs.
Solution
If you can use gnu-awk
you can make use of FPAT to specify the column data:
awk -v FPAT='\\[[^][]*]|"[^"]*"|\\S+' '{
for(i=1; i<=NF; i++) {
print "$"i" = ", $i
}
}' file
The pattern matches:
\\[[^][]*]
Match from an opening[
till closing]
using a negated character class|
Or"[^"]*"
Match from an opening till closing double quote|
Or\\S+
1 or more non whitespace chars
Output
$1 = ::1
$2 = -
$3 = -
$4 = [12/Oct/2021:15:26:25 +0530]
$5 = "GET / HTTP/1.1"
$6 = 200
$7 = 1717
$8 = "-"
$9 = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"
Answered By - The fourth bird Answer Checked By - Candace Johnson (WPSolving Volunteer)