Saturday, April 9, 2022

[SOLVED] Regex capture group works in regex101, not sed

Issue

I found some other similarly titled questions but didn't find the answer.

My text is:

##bcftools_mergeCommand=merge --force-samples -m none -O v -o analysis/STUDY1/hg19/exome/merged.vcf --threads 4 analysis/STUDY1/hg19/exome/varscan_norm.vcf.gz analysis/STUDY1/hg19/exome/gatk_norm.vcf.gz analysis/STUDY1/hg19/exome/samtools_norm.vcf.gz analysis/STUDY1/hg19/exome/freebayes_norm.vcf.gz

I want the names of the .vcf.gz files.

Sed gives me:

echo "##bcftools_mergeCommand=merge --force-samples -m none -O v -o analysis/STUDY1/hg19/exome/merged.vcf --threads 4 analysis/STUDY1/hg19/exome/varscan_norm.vcf.gz analysis/STUDY1/hg19/exome/gatk_norm.vcf.gz analysis/STUDY1/hg19/exome/samtools_norm.vcf.gz analysis/STUDY1/hg19/exome/freebayes_norm.vcf.gz" | sed -En 's/\/([^\/]+\.vcf\.gz)/\1/g'

with blank results.

Regex101 gives:

enter image description here

https://regex101.com/r/h3OGvN/1


Solution

Why not using grep ?

$ data='##bcftools_mergeCommand=merge --force-samples -m none -O v -o analysis/STUDY1/hg19/exome/merged.vcf --threads 4 analysis/STUDY1/hg19/exome/varscan_norm.vcf.gz analysis/STUDY1/hg19/exome/gatk_norm.vcf.gz analysis/STUDY1/hg19/exome/samtools_norm.vcf.gz analysis/STUDY1/hg19/exome/freebayes_norm.vcf.gz'
$ echo $data | grep -Eo [^\/]+\.vcf\.gz
varscan_norm.vcf.gz
gatk_norm.vcf.gz
samtools_norm.vcf.gz
freebayes_norm.vcf.gz

  • -E: Interpret patterns as extended regular expressions.
  • -o: Print only the matched (non-empty) parts.


Answered By - Cubix48
Answer Checked By - Marie Seifert (WPSolving Admin)