Issue
I'm trying to find a match of a ~500 character long DNA sequence from a few megabyte large CSV file containing different sequences. Before each sequence in the CSV file, there is some metadata I would like to have. Each sequence and sequence metadata take up exactly one line. I've tried
grep -B 1 "extremelylongstringofDNATACGGCATAGAGGCCGAGACCTAGGATTAACGTTACTGACGAT" csvfile.csv
However that returns filename too long
An interesting and frustrating thing I bumped into was when I tried to find the line count of the csv file by using
wc -l csvfile.csv
it returned
0 csvfile.csv
And without the -l
flag, it returned
0 161410 41507206 csvfile.csv
This is the result even after I added a line between the end of each sequence and the start of the following metadata of the next sequence.
Solution
The issue was that the file had CR line terminators and GNU tools were not detecting any line endings and therefore was reading the file as one huge line. I solved the issue by using mac2unix to convert the file to make it GNU line-ending readable.
Thanks to Etan Reisner for providing the hint
Answered By - riv