Issue
This is a follow up question about cleaning a SPARQL-dataset.
The dataset is gained by this cell:
#+name: raw-dataset
#+BEGIN_SRC sparql :url https://query.wikidata.org/sparql :format text/csv :cache yes :exports both
SELECT ?wLabel ?pLabel
WHERE
{
?p wdt:P31 wd:Q98270496 .
?p wdt:P1416 ?w .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY ASC(?wLabel) ASC(?pLabel)
LIMIT 10
#+END_SRC
#+RESULTS[7981b64721a5ffc448aa7da773ce07ea8dbaf8ac]: raw-dataset
| wLabel | pLabel |
|-----------------------------------------------+--------------|
| Q105775472 | NFDI4Health |
| Q1117007 | NFDI4Health |
| Q115254989 | NFDI4Objects |
| Q1205424 | NFDI4Objects |
| Q17575706 | NFDI4Objects |
| Academy of Sciences and Humanities in Hamburg | Text+ |
| Academy of Sciences and Literature Mainz | NFDI4Culture |
| Academy of Sciences and Literature Mainz | NFDI4Memory |
| Academy of Sciences and Literature Mainz | NFDI4Objects |
| Academy of Sciences and Literature Mainz | Text+ |
As a second step I take a look at the data using zsh
:
#+begin_src sh :var data=raw-dataset :shebang "#!/opt/homebrew/bin/zsh" :exports both
echo ${data}
#+end_src
#+RESULTS:
| Q105775472 | NFDI4Health |
| Q1117007 | NFDI4Health |
| Q115254989 | NFDI4Objects |
| Q1205424 | NFDI4Objects |
| Q17575706 | NFDI4Objects |
| Academy of Sciences and Humanities in Hamburg | Text+ |
| Academy of Sciences and Literature Mainz | NFDI4Culture |
| Academy of Sciences and Literature Mainz | NFDI4Memory |
| Academy of Sciences and Literature Mainz | NFDI4Objects |
| Academy of Sciences and Literature Mainz | Text+ |
All fine and I can start with the cleaning part, getting rid of all lines containing Q....
:
#+begin_src sh :var data=raw-dataset :exports both :shebang "#!/opt/homebrew/bin/zsh"
echo ${data} | grep -L -E "Q[1-9]"
#+end_src
#+RESULTS:
Somehow using the -L
does not work.
But without (-L
) I get the results as expected from the code:
#+begin_src sh :var data=raw-dataset :shebang "#!/opt/homebrew/bin/zsh" :exports both
echo ${data} | grep -E "Q[1-9]"
#+end_src
#+RESULTS:
| Q105775472 | NFDI4Health |
| Q1117007 | NFDI4Health |
| Q115254989 | NFDI4Objects |
| Q1205424 | NFDI4Objects |
| Q17575706 | NFDI4Objects |
Question: Why is -L
not working and how can I get rid of the lines starting with Q....
?
Solution
To exclude lines beginning with Q and then a digit:
... | grep -v '^Q[0-9]'
... | grep -Pv '^Q\d'
As mentioned in the comment, -L
is a rare commandline option (--files-without-match
), for excluding entire files. -v
is quite common, for excluding lines matching the pattern or patterns. The v
, as mentioned in man grep
is from --invert-match
Answered By - stevesliva Answer Checked By - Cary Denson (WPSolving Admin)