wget files matching regex
# Getting http://some/site/*my_pattern* SITE="http://some/site/" PATTERN="my_pattern" wget -nv -O- $SITE | grep $PATTERN | awk -v pat=$PATTERN -F '"' '{ for (i=1; i<=NF; ++i) { if ($i ~ pat) print $i } }' | while read x; do if [[ ! $x = ">" ]] ; then wget $SITE$x &>/dev/null echo $x fi
Here is what the above code is Doing:
1. Get the site’s html
2. Filter out the lines that contain the pattern
3. For each line, print the part of the line that contains the pattern
4. For each line, if it’s not a ‘>’ (which is the html for the next page), then download the file and print the filename