I used the Linux 'wget' program to crawl the existing servers. Yes, servers plural, because EAA ran one server for www.iac.org and another for members.iac.org. The Linux commands went something like this:
cd /var/www; mkdir old-site; cd old-site; mkdir public; mkdir members cd public wget -r -D www.iac.org http://www.iac.org cd ../members wget -r -D members.iac.org http://members.iac.org/home.html # Needed to specify home.html because index.html is the login/password page, with no links to the content below wget http://members.iac.org # Now pick up the login page too
This grabbed the great bulk of both sites, but wget did miss a few pages. This happened because the Stellent CMS used by EAA's IT dept. buried the links to those pages in JavaScript statements (which wget does not parse) rather than in HTML. So those pages were loaded using individual wget statements.