Issue
I'm trying to download a db file from a GitHub repo using the following code:
library(RSQLite)
library(curl)
url <- "https://github.com/kotartemiy/newscatcher/tree/master/newscatcher/data/package_rss.db"
curl::curl_download(url = url,
destfile = "inst/external-data/package_rss.db",
quiet = TRUE, mode = "wb")
Which works, but downloads the file size between 79Kb and 82Kb (depending on which mode I use). But when I try to access the database file I get the warning:
sqlite.driver <- dbDriver("SQLite")
db <- dbConnect(sqlite.driver,
dbname = "inst/external-data/package_rss.db")
Warning message: Couldn't set synchronous mode: file is not a database Use
synchronous
= NULL to turn off this warning.
Followed by the error:
dbListTables(db)
Error: file is not a database
This can be reproduced using download.file()
and different mode
arguments. However, if I download the file manually it has 376 Kb and the RSQLite code works without any problems. What may be causing the issue? Thanks
Solution
As @27ϕ9 said, you're downloading a webpage, not the file it references.
url <- "https://github.com/kotartemiy/newscatcher/tree/master/newscatcher/data/package_rss.db"
download.file(url, "~/Downloads/package_rss.db")
# trying URL 'https://github.com/kotartemiy/newscatcher/tree/master/newscatcher/data/package_rss.db'
# Content type 'text/html; charset=utf-8' length unknown
# downloaded 82 KB
readLines("~/Downloads/package_rss.db", n=10)
# [1] ""
# [2] ""
# [3] ""
# [4] ""
# [5] ""
# [6] "<!DOCTYPE html>"
# [7] "<html lang=\"en\">"
# [8] " <head>"
# [9] " <meta charset=\"utf-8\">"
# [10] " <link rel=\"dns-prefetch\" href=\"https://github.githubassets.com\">"
If you go to that URL in a browser, you'll see two links on the page:
the "Download" button pushes you to a link under raw.githubusercontent.com (link), so you can hunt for that URL;
There's also a "view raw" link, which takes the same URL you started with, replaces the
/tree/
with/blob/
, and appends?raw=true
(link).
(While it's possible to harvest
the html and get the link programmatically, I think just starting with the correct URL is the preferred route.)
Answered By - r2evans