Issue
i have file contains urls plus params like following
https://example.com/endpoint/?param1=123¶m2=1212
https://example.com/endpoint/?param3=123¶m1=98989
https://example.com/endpoint/endpoint3/?param2=123
https://example.com/endpoint/endpoint2/?param1=123
https://example.com/endpoint/endpoint2/
https://example.com/endpoint/endpoint5/"//i.example.com/00/s/Nzk5WDEwMjQ=/z/47IAAOSwBu5hXIKF
and i need to filter only urls with unique params the desired output
http://example.com/endpoint/?param1=123¶m2=1212
https://example.com/endpoint/?param3=123¶m1=98989
https://example.com/endpoint/endpoint3/?param2=123
i managed to filter only urls with params with grep
grep -E '(\?[a-zA-Z0-9]{1,9}\=)'
but i need to filter params in the same time so i tried with awk with the same regex but it gives error
awk '{sub(\?[a-zA-Z0-9]{1,9}\=)} !seen[$0]++'
update
i am sorry for editing the desired output but when i tried the scripts i figured out that their a lot of carbege in my file need to filter too. i tried @James Brown with some editing and it looks good till the end line it dose not filter it unfortunately
awk -F '?|&' '$2&&!a[$2]++'
and to be more clear why the that output is good for me
it chosed the 1 st line because it has at least param1
2nd line because it has at least param3
3 line because it has at least param2
the comparison method here is choose just unique parameter whatever it concatenate with others with &
char or not
Solution
Edited version after the reqs changes some:
$ awk -F? '{ # ? as field delimiter
split($2,b,/&/) # split at & to get whats between ? and &
if(b[1]!=""&&!a[b[1]]++) # no ? means no $2
print
}' file
Output as expected. Original answer was:
A short one:
$ awk -F? '$2&&!a[$2]++' file
Explained: Split records at ?
(-F?
) and if there is a second field ($2
) and (&&
) it is unique this far by counting the instances of the parameters in the array a
(!a[$2]++
), output it.
Answered By - James Brown