Issue
The text file is like this,
#एक
1के
अंकगणित8IU
अधोरेखाunderscore
$thatऔर
%redएकyellow
$चिह्न
अंडरस्कोर@_
The desired text file should be like,
#
1
8IU
underscore
$that
%redyellow
$
@_
This is what I have tried so far, using awk
awk -F"[अ-ह]*" '{print $1}' filename.txt
And the output that I am getting is,
#
1
$that
%red
$
and using this awk -F"[अ-ह]*" '{print $1,$2}' filename.txt
and I am getting an output like this,
#
1 े
ं
ो
$that
%red yellow
$ ि
ं
Is there anyway to solve this in bash script?
Solution
Using perl:
$ perl -CSD -lpe 's/\p{Devanagari}+//g' input.txt
#
1
8IU
underscore
$that
%redyellow
$
@_
-CSD
tells perl that standard streams and any opened files are encoded in UTF-8. -p
loops over input files printing each line to standard output after executing the script given by -e
. If you want to modify the file in place, add the -i
option.
The regular expression matches any codepoints assigned to the Devanagari script in the Unicode standard and removes them. Use \P{Devanagari}
to do the opposite and remove the non-Devanagari characters.
Answered By - Shawn Answer Checked By - Candace Johnson (WPSolving Volunteer)