Issue
I'm running Mint Xfce and attempting to grep from terminal using the following:
grep -E -o '^[A-Za-z]{1,}\s[A-Za-z]{1,}\s[0-9]{1,}' sourcefile.txt | sort -f > newfile.txt
The source file is a text file where each line looks like
<string><space><string><tab><number><tab><number><tab>...
where the strings have letters, numbers, punctuation, and special characters and the numbers are integers.
My goal is to extract the two strings and first number for just the lines where the strings contain only English letters (a-z, upper or lower case).
The above command leaves out strings with punctuation and numbers, but lines where the strings have special letters like u umlauts (Ü) are somehow getting through and being sent to newfile.txt. I feel like I'm missing something obvious, but a ton of Googling only gives me back discussions on how to grep for special letters. I've tested the regex at https://regex101.com/ and umlauts don't get matched, which makes me think the problem isn't with my regex.
Thanks for any help you can provide!
Solution
You have to temporary change Locale. Try:
LC_ALL="C" grep -E -o '^[A-Za-z]{1,}\s[A-Za-z]{1,}\s[0-9]{1,}' sourcefile.txt | sort -f > newfile.txt
It worked for me on Ubuntu. To switch back to your Locale simply close the console window.
Answered By - Grzegorz Górkiewicz Answer Checked By - Katrina (WPSolving Volunteer)