wget Regex and File Type Filters to Mirror Websites

Use wget to download everything - quickly!wget is a powerful, command-line tool for file downloading flexibility, usually via HTTP. It allows you to backup entire websites or just the images, for example. It provides a lot more control and performs a lot quicker than other graphical tools or browser plugins such as DownThemAll!.

wget comes installed by default with every server operating system and every desktop operating system I use but if you are using a commercial, consumer desktop OS; follow the instructions for Apple MacOS or Microsoft Windows.

wget Example

wget http://www.misantrof.net/albums/ -m -r -np -A *.zip -R *.html,*.gif -c
-m Mirror Download linked resources like stylesheets and images.
-r Recursively Following Internal Links
-np No Parents Do not move further up into the target’s parent directories.
-A Accept A list of strings that files must match (e.g. *.jpg,*.html)
-R Reject A list of strings for ignoring files (*_thumb.jpg,*.zip)
-c Continue Resume a wget operation on new/incomplete files (i.e. in the event of connection loss).

These features only scratch the surface of what wget is capable of but already they could save hours of manually downloading when a user would otherwise right-click and “Save As” for each file. The wget instruction manual can be daunting but if you know what you’re trying to do, searching the pages with ctrl+f will probably tell you exactly how to do it.

wget Advanced Usage

–limit-rate=20k Limits Speed to 20KiB/s Limit the data rate to avoid impacting other users’ accessing the server.
–spider Check if File Exists For if you don’t want to save a file but just want to know if it still exists.
w Wait Seconds After this flag, add a number of seconds to wait between each request – again, to not overload a server.
–user= Set Username wget will attempt to login using the username provided.
–password= Use Password wget will use this password with your username to authenticate.
–ftp-[user|password]= FTP Credentials Just like the previous settings, wget can login to an FTP server to retrieve files.

Leave a Comment