Issue
I am trying to interact with a SFTP server from inside R. The CURL
package came highly recommended. Not RCURL
but CURL
.
One of the things I am trying to do is get a list of directories/files at an address. I have the code working so far:
# create a new curl handle
han <- new_handle()
# set options for SFTP
handle_setopt(han, verbose = TRUE)
# execute the request
result <- curl_fetch_memory(url = "{SFTP URL here}",handle = han)
# get the response data
response <- rawToChar(result$content)
The SFTP server at this URL does not have passwords. The remote has SFTP protocol version 3
The above code almost does what I am looking for, curl_fetch_memory(url = "{SFTP URL here}",handle = han)
produces a list with among other things result$content
that has the the said list of directories/files but with everything as in file names, dates and permission data all in the chars.
How to customize the request/handle to get the list of files in a cleaner manner? Just a plain list of files akin to
ls
on SFTP servers? If this is at all possible. (copies ofresult
andresponse
attached below.)If customizing the requests is not possible, is there a way to customize
CURL
objects to make them a bit more human readable?
Output for response
$url
[1] "sftp://data.cyverse.org/shared/"
$status_code
[1] 0
$type
[1] NA
$headers
raw(0)
$modified
[1] "2020-02-20 16:05:33 CST"
$times
redirect namelookup connect pretransfer starttransfer
0.000000 0.000029 0.000000 0.230600 0.000000
total
0.230608
$content
[1] 64 72 77 78 72 2d 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20
[26] 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 20 44 65 63 20 33 31 20
[51] 20 31 39 36 39 20 2e 0a 64 72 77 78 72 2d 78 72 2d 78 20 20 20 20 31 20 30
[76] 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30
[101] 20 44 65 63 20 33 31 20 20 31 39 36 39 20 2e 2e 0a 64 72 77 78 72 2d 78 72
[126] 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20
[151] 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20 61 6c
[176] 69 67 6e 6d 65 6e 74 73 5f 61 6e 64 5f 74 72 65 65 73 0a 64 72 77 78 72 2d
[201] 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20
[226] 20 20 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20
[251] 67 65 6e 65 5f 66 61 6d 69 6c 79 5f 65 76 6f 6c 75 74 69 6f 6e 0a 64 72 77
[276] 78 72 2d 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20
[301] 20 20 20 20 20 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30
[326] 32 30 20 6d 61 70 73 5f 73 63 72 69 70 74 73 0a 64 72 77 78 72 2d 78 72 2d
[351] 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 20
[376] 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20 74 72 61
[401] 6e 73 63 72 69 70 74 5f 61 73 73 65 6d 62 6c 69 65 73 0a 64 72 77 78 72 2d
[426] 78 72 2d 78 20 20 20 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20
[451] 20 20 20 20 20 20 20 20 20 20 30 20 46 65 62 20 32 30 20 20 32 30 32 30 20
[476] 77 68 6f 6c 65 5f 67 65 6e 6f 6d 65 5f 64 75 70 6c 69 63 61 74 69 6f 6e 73
[501] 0a 2d 72 77 2d 72 2d 2d 72 2d 2d 20 20 20 20 31 20 30 20 20 20 20 20 20 20
[526] 20 30 20 20 20 20 20 20 20 20 20 20 20 20 20 36 36 39 20 4f 63 74 20 31 32
[551] 20 20 32 30 31 39 20 67 65 6e 65 5f 66 61 6d 69 6c 69 65 73 5f 6f 72 74 68
[576] 6f 66 69 6e 64 65 72 2e 74 78 74 0a 2d 72 77 2d 72 2d 2d 72 2d 2d 20 20 20
[601] 20 31 20 30 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 20 20 20 20 20
[626] 31 32 37 33 20 4f 63 74 20 31 32 20 20 32 30 31 39 20 72 65 61 64 6d 65 2e
[651] 74 78 74 0a
output for result$content
'drwxr-xr-x 1 0 0 0 Dec 31 1969 .\ndrwxr-xr-x 1 0 0 0 Dec 31 1969 ..\ndrwxr-xr-x 1 0 0 0 Nov 7 2020 curated\n'
Solution
You can set CURLOPT_DIRLISTONLY to only list names. Though you can also parse default response as a regular tabular text, i.e. with read.table()
, or readr::read_table()
. Options for curl
package are general libcurl options from upstream, so libcurl documentation can be used as a reference - https://curl.se/libcurl/c/easy_setopt_options.html
Using Rebex demo server as an example:
library(curl)
#> Using libcurl 7.84.0 with Schannel
# https://test.rebex.net/
SFTP_DEMO <- "sftp://demo:[email protected]:22"
han <- new_handle()
# list all libcurl options that include "list"
curl_options("list")
#> cookielist dirlistonly proxy_ssl_cipher_list
#> 10135 48 10259
#> ssl_cipher_list
#> 10083
# set dirlistonly
handle_setopt(han, dirlistonly = TRUE)
# dirlistonly request:
file_list <- curl_fetch_memory(url = SFTP_DEMO, handle = han)[["content"]] |> rawToChar()
cat(file_list)
#> .
#> ..
#> pub
#> readme.txt
read.table(text = file_list)
#> V1
#> 1 .
#> 2 ..
#> 3 pub
#> 4 readme.txt
strsplit(file_list, "\n") |> unlist()
#> [1] "." ".." "pub" "readme.txt"
# you can do the same with detailed file list:
handle_setopt(han, dirlistonly = FALSE)
curl_fetch_memory(url = SFTP_DEMO,
handle = han)[["content"]] |>
rawToChar() |>
read.table(text = _)
#> V1 V2 V3 V4 V5 V6 V7 V8 V9
#> 1 drwx------ 2 demo users 0 Mar 31 17:52 .
#> 2 drwx------ 2 demo users 0 Mar 31 17:52 ..
#> 3 drwx------ 2 demo users 0 Mar 31 17:52 pub
#> 4 -rw------- 1 demo users 405 Dec 17 2021 readme.txt
Created on 2023-05-12 with reprex v2.0.2
Answered By - margusl Answer Checked By - Pedro (WPSolving Volunteer)