Thursday, April 7, 2022

[SOLVED] Bash changing a series of senteces and making them into single line ones

Issue

What i would like to do, is load from a text file, containing series of sentences and creating an array containing each sentence as a separate index with some possible grep conditions, such as contains a string. Here's what i got. The reason it's in array is because I would like it to count later the number of lines, but that I can do with simple for cycle if its in array, so i would like to keep it that way

#!/bin/bash
location=$(pwd)
file="${location}/text"
cat $file

string=$(cat $file |sed 's/./.*/g' | tr '*' '\n' |sed 's/?/?*/g' | tr '*' '\n' |sed 's/!/!*/g' | tr '*' '\n')

int this part i opened a file , and based on what i understand i replaced a . with .* and than replaced * with \n and did the same with ?!. So now i should have a string containing each sentence separated with a new line

echo $string

array=( $($string | grep "hello" | grep "!") )

echo $array

now it should put the string into array, with condition that there is a word hello and is a command sentence. But the problem is:

echo $string
otuput : . . . . . . . . . . . . etc...

also the line where a create an array is saying that .: .: is a directory... all code not separated below

#!/bin/bash
location=$(pwd)
file="${location}/text"
cat $file

string=$(cat $file |sed 's/./.*/g' | tr '*' '\n' |sed 's/?/?*/g' | tr '*' '\n' |sed 's/!/!*/g' | tr '*' '\n')

echo $string

array=( $($string | grep "hello" | grep "!") )

echo $array

example text

Text text hello! hello? text text.
text text. text hello! text?
hello! text text.

expected outputs:

echo $string : 
Text text hello!
 hello?
 text text.
text text.
 text hello!
 text?
hello!
 text text.

basically one sentence per line (yes sometimes whitespace at the beggining but it doesnt matter), since the string=$(cat $file |sed 's/././g' | tr '' '\n' |sed 's/?/?/g' | tr '' '\n' |sed 's/!/!/g' | tr '' '\n') is supposed to make it so but current output for echo $string :

Text text hello! hello? text text. text text. text hello! text? hello! text text.

as for echo $array

basically the same as $string but each line as separate index in array 

but current output looks the same as $string, prints it as one string rather than each sentence in new line

Please keep it within simple level, i just stared bash and created this script to learn and have fun, I know there are some incredible people using it, but from what I've seen it can get relay crazy fast :)


Solution

Regarding echo $string - please read https://mywiki.wooledge.org/Quotes and why-is-printf-better-than-echo.

Is this what you're trying to do?

$ string=$(sed 's/\([[:punct:]]\) /\1\n/g' file)
$ printf '%s\n' "$string"
Text text hello!
hello?
text text.
text text.
text hello!
text?
hello!
text text.

$ readarray -t -d$'\n' array < <(sed 's/\([[:punct:]]\) /\1\n/g' file)
$ printf '%s\n' "${array[@]}"
Text text hello!
hello?
text text.
text text.
text hello!
text?
hello!
text text.
$ declare -p array
declare -a array=([0]="Text text hello!" [1]="hello?" [2]="text text." [3]="text text." [4]="text hello!" [5]="text?" [6]="hello!" [7]="text text.")

$ string=$(sed 's/\([[:punct:]]\) /\1\n/g' file | grep 'hello!')
$ printf '%s\n' "$string"
Text text hello!
text hello!
hello!

If your sed version doesn't support \n in the replacement then change it to either of these:

sed 's/\([[:punct:]]\) /\1\'$'\n''/g' file

sed 's/\([[:punct:]]\) /\1\
/g' file

If it doesn't support character classes then get a new sed but otherwise change [[:punct:]] to [!?.] and list all the punctuation chars inside the bracket expression or change it to [^][ \ta-zA-Z0-9_-] and list all the chars you do NOT want treated as punctuation inside the bracket expression.



Answered By - Ed Morton
Answer Checked By - David Goodson (WPSolving Volunteer)