Issue
Here is a file.txt
that has the following data:
|ip: 2607:5300:0061:0eda:0000:0000:0000:0000 |abuse_score: 80 |isp: OVH Hosting Inc. |usage_type: Data Center/Web Hosting/Transit |domain: ovh.com |country_name: Canada |country_code: CA |total_report: 27 |distinct_report: 14 |last_report: 2020-10-31T16:01:49+00:00 |time: Sun Nov 1 01:49:53 +08 2020
|ip: 2001:0000:1234:0000:0000:C1C0:ABCD:0876 |abuse_score: 19 |isp: Teredo RFC4380 |usage_type: Reserved |domain: can.com |country_name: United States of America |country_code: USA |total_report: 2 |distinct_report: 0 |last_report: Unknown |time: Sun Nov 1 01:54:28 +08 2020
|ip: 1.1.1.1 |abuse_score: 6 |isp: Teredo RFC4380 |usage_type: Search Engine Spider |domain: gooday.com |country_name: China |country_code: CN |total_report: 0 |distinct_report: 6 |last_report: Unknown |time: Sun Nov 1 01:54:28 +08 2020
|ip: 1.1.1.11 |abuse_score: 7 |isp: Teredo RFC4380 |usage_type: Hacking |domain: wwww.com |country_name: Rusia |country_code: RU |total_report: 3 |distinct_report: 4 |last_report: Sun Nov 1 01:54:28 +08 2020 |time: Sun Nov 1 01:54:28 +08 2020
|ip: 1.1.1.111 |abuse_score: 8 |isp: Teredo RFC4380 |usage_type: Gaming |domain: whyme.com |country_name: Rusia |country_code: RU |total_report: 3 |distinct_report: 8 |last_report: Unknown |time: Sun Nov 1 01:54:28 +08 2020
|ip: 1.1.1.15 |abuse_score: 90 |isp: Teredo RFC4380 |usage_type: IPS |domain: youknowthat.com |country_name: Rusia |country_code: RU |total_report: 100 |distinct_report: 99 |last_report: Sun Nov 1 01:54:28 +08 2020 |time: Sun Nov 1 01:54:28 +08 2020
|ip: 1.1.1.153 |abuse_score: 19 |isp: Teredo RFC4380 |usage_type: Commercial |domain: mynic.com |country_name: Malaysia |country_code: MY |total_report: 8 |distinct_report: 12 |last_report: Unknown |time: Sun Nov 1 01:54:28 +08 2020
So, I need to assign each value after |name:
into variable for example to get the ip
, abuse_score
value, this is what I did:
ip=$(awk '{ print $2 }') # ip is perfectly assigned
abuse_score=$(awk '{ print $4}') # abuse_score is perfectly assigned
But then, to get the other one like isp
(which has string that might contain one or more spaces), I could not use the above code because it will only return one word. The ip
and abuse_score
will never have space. For example to assign isp
for the first record from the text file, if I use this
isp=$(awk '{ print $6 }')
This will only assign isp=OVH
.
The value of isp
should be assigned as OVH Hosting Inc.
How to deal this data that has spaces and assign each of this data into separate variable easily ?
Solution
You may use this generic solution using awk
:
cat srch.awk
BEGIN{ FS = "[[:blank:]]*[|]" }
{
for (i=1; i<=NF; ++i) {
if (match($i, /^[_[:alnum:]]+: /) && substr($i, 1, RLENGTH-2) == fld) {
print ( substr($i, RLENGTH+1) )
next
}
}
}
Then use it as:
awk -v fld='isp' -f srch.awk file.txt
OVH Hosting Inc.
Teredo RFC4380
Teredo RFC4380
Teredo RFC4380
Teredo RFC4380
Teredo RFC4380
Teredo RFC4380
Or else:
awk -v fld='usage_type' -f srch.awk file.txt
Data Center/Web Hosting/Transit
Reserved
Search Engine Spider
Hacking
Gaming
IPS
Commercial
Answered By - anubhava