Issue
I'm trying to build an equivalent to the following github-specific code that works for finding the latest artifact available for download from href="https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master" rel="nofollow noreferrer">https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master -- the download links look something like https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5901-5db768d8bbb973ba27c81e424aea2910144a3100/fx.tar.xz.
# Working code for github.com, needs to be converted to fivem.net
LOCATION=$(curl -s https://api.github.com/repos/someuser/somerepo/releases/latest \
| grep "tag_name" \
| awk '{print "https://github.com/someuser/somerepo/archive/" substr($2, 2, length($2)-3) ".zip"}') \
; curl -L -o file.zip $LOCATION
The file has an incremental version number but not a sequential number, followed by a completely random hash.
How can I find the latest download link from the HTML page at https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master?
Solution
We can build off the use of lynx dump
, as suggested in Easiest way to extract the urls from an html page using sed or awk only --
#!/usr/bin/env bash
url_re='https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/([[:digit:]]+)-([[:xdigit:]]+)/fx.tar.xz'
newest_link_num=0
newest_link_content=
while read -r _ link; do
[[ $link =~ $url_re ]] || continue
if (( ${BASH_REMATCH[1]} > newest_link_num )); then
newest_link_num=${BASH_REMATCH[1]}
newest_link_content=$link
fi
done < <(lynx -dump -listonly -hiddenlinks=listonly https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master)
echo "Newest link is: $newest_link_content"
As of this writing, it finishes with the following output:
Newest link is: https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5901-5db768d8bbb973ba27c81e424aea2910144a3100/fx.tar.xz
Answered By - Charles Duffy Answer Checked By - Marilyn (WPSolving Volunteer)