Issue
I have a csv file like this: (named test2.csv)
lastname,firstname,83494989,1997-05-20,2015-05-07 15:30:43,Sentence Skills 104,Sentence Skills 104,Elementary Algebra 38,Elementary Algebra 38,Sentence Skills 104,Sentence Skills 104,Elementary Algebra 38,Elementary Algebra 38,
I want to remove the duplicate entries
The closest I have got is the following awk command
awk '{a[$0]++} END {for (i in a) print RS i}' RS="," test2.csv
it works but causes new problems, it take the values out of order and puts them in rows like this:
,Elementary Algebra 38
,2015-05-07 15:30:43
,Sentence Skills 104
,FirstName
,LastName
,1997-05-20
,83494989
I need to keep the order they are in and keep them in one line ( I can fix the row issue, but don't know how to fix the order issue)
Update with Solution:
The answer from anubhava worked great, I added a question about removing the time from the date and Ed Morton helped out with that, here is the full query
awk 'BEGIN{RS=ORS=","} {sub(/ ..:..:..$/,"")} !seen[$0]++' test2.csv
Solution
You can just use this awk:
awk 'BEGIN{RS=ORS=","} !seen[$0]++' test2.csv
lastname,firstname,83494989,1997-05-20,2015-05-07 15:30:43,Sentence Skills 104,Elementary Algebra 38,
Answered By - anubhava Answer Checked By - Senaida (WPSolving Volunteer)