Sunday, February 20, 2022

[SOLVED] Remove duplicate lines based on starting pattern using bash

Issue

I'm trying to remove duplicates in a list of Jira tickets that follow the following syntax:

XXXX-12345: a description

where 12345 is a pattern like [0-9]+ and the XXXX is constant. For example, the following list:

XXXX-1111: a description
XXXX-2222: another description
XXXX-1111: yet another description

should get cleaned up like this:

XXXX-1111: a description
XXXX-2222: another description

I've been trying using sed but while what I had worked on Mac it didn't on linux. I think it'd be easier with awk but I'm not an expert on any of them.

I tried:

sed -r '$!N; /^XXXX-[0-9]+\n\1/!P; D' file

Solution

This simple awk should get the output:

awk '!seen[$1]++' file

XXXX-1111: a description
XXXX-2222: another description


Answered By - anubhava
Answer Checked By - Mildred Charles (WPSolving Admin)