I was thinking about getting all questions with wget
, without logging in, and removing some unwanted elements (edit/login links, form elements, most/all JS, etc.) with regexes (I see that there’s some read-only mode in Askbot, which would simplify the clean-up). So everything publicly available and licensed under a CC-BY-SA 3.0 and, in case of Askbot, GPLv3. I’ll ask on legal@lists.fedoraproject.org.
On second thought what also might be useful:
- user profiles might be required to satisfy attribution requirements (also only public part);
- disclaimer of it being a read-only archive;
- some email address for complaints (hidden behind captcha);
- information about and link to Discourse and Quick Docs.
Would it be possible to put it under e.g. https://olddiscussion.fedoraproject.org? I’ve got hosting where I can create sub-accounts, but if it takes something more than some random dude’s hosting for a subdomain of fedoraproject.org, I could send ready files elsewhere.
Redirecting (with just few rewrite rules) to this new domain all addresses starting with language codes (the Askbot link structure: /en/question/…) would minimise number of 404s on Discourse (important for site’s reputation in search engines) and land people coming from web searches on questions they were looking for.