wget files matching regex 1

wget files matching regex

# Getting http://some/site/*my_pattern*
SITE="http://some/site/"
PATTERN="my_pattern"
wget -nv -O- $SITE | 
  grep $PATTERN | 
    awk -v pat=$PATTERN -F '"' '{ for (i=1; i<=NF; ++i) { if ($i ~ pat) print $i } }' |
      while read x; do 
      	if [[ ! $x = ">" ]] ; then
      	  wget $SITE$x &>/dev/null
      	  echo $x
      	fi

Here is what the above code is Doing:
1. Get the site’s html
2. Filter out the lines that contain the pattern
3. For each line, print the part of the line that contains the pattern
4. For each line, if it’s not a ‘>’ (which is the html for the next page), then download the file and print the filename

Similar Posts