Time to retire the old Docs website

When searching the internet for instructions, fixes, or guides for Fedora Linux, somtimes the search results point to the old docs platform (https://docs.fedoraproject.org/en-US/Fedora/26/html/Release_Notes/index.html).

With all the changes in Fedora over the past 5 years, much of the pre-F26 documentation is outdated, documentation doesn’t apply to current releases, and many instructions simply don’t work anymore.

Maybe it’s time to retire and finally unpublish the old docs websites?

None of the releases that is listed there is still supported and as far as I can see, the vast majority of docs has been ported over the “new” Docs system.

So why not kick it? Or at least move it to something like archive.docs.fp.o and disallow google and others to index the site.

2 Likes

Removing obsolete pages from search engine indexation using the robots.txt was already discussed in

The related part of the discussion began with this post.

@darknao I expect you currently have to focus the GitLab migration, but am I right to assume that it is already planned to remove unsupported/obsolete docs pages from search engine indexation anyway?

1 Like

That is already part of our team’s to-do list ( see e.g. Hey docs team — I need a volunteer to drive the removal of old docs - #8 by mattdm ). The current plan is, to pdf the old documentations (f26 and earlier) and store them in the corresponding repository along with the DocBook XML source text. That way, someone can easily retrieve the text if needed for some reason. And we can check, if the text or part of it could be useful for the current documentation. Once that done, we will ask infrastructure to remove the complete web page subtree.

The problem is, there is no volunteer to pick up the task so far. Therefore, the task is still in waiting position (wink, wink).

1 Like

Is there a reason why PDF’s are needed? Generally, it is always good to have them, but given the many open issues and tasks, it might make sense to focus a solution that is less work intensive? Otherwise, I see the risk that it takes another three years to get rid of these pages.

@augenauf 's idea of just making it archive.docs.fp or something like that (which might be done in conjunction with adding the robots.txt to remove it from search engine indexing) might be an easy-&quick-to-implement (interim?) solution.

What do you think?

Well, according to the latest discussion, we don’t want to leave our “history” to the internet archive alone, but preserver the old version(s) retrievable by our own. Therefore, we want to keep the repositories. And we also respect that there is a lot of work in the documentations that we don’t want to just delete and destroy. Very many parts are still up-to-date and can be the basis of a restructured documentation.

But for both purposes it must be quickly accessible. PDF is one solution. There could also be another one, perhaps a simple creation of a readable text via a preconfigured DocBook. Or a way by means of pandoc? Or use wget/curl to create a static local web page. This could even be script based.

Infra would prefer to get rid of it. There are various technical circumstances that make comparatively much work. I don’t remember the details off the top of my head. Therefore, it should be removed from the online content.

@pboy I was just playing a bit with wget to find easier ways to save the old Docs.

However, I found out that the PDF’s already exist on the server: wget identified a /pdf/ dir in Fedora 24, 25, 26 (I have not yet downloaded more). All contained well-formatted PDF files of the respective docs (I have not yet made a deep comparison but at first glance, it looks like all is contained).

The PDF files seem to be in a dedicated folder at each Fedora version, e.g.: https://docs.fedoraproject.org/en-US/Fedora/24/pdf/*

Some examples are https://docs.fedoraproject.org/en-US/Fedora/24/pdf/Installation_Guide/Fedora-24-Installation_Guide-en-US.pdf
and https://docs.fedoraproject.org/en-US/Fedora/24/pdf/System_Administrators_Guide/Fedora-24-System_Administrators_Guide-en-US.pdf.

However, the general access to the */pdf folders is restricted to admins. So a full download of the whole directories should be done by someone with the privileges to ensure that no contained PDF is forgotten and to avoid the massive traffic of wget with many recursions. But it seems that sometime in the past someone already fulfilled the PDF task :slight_smile:

This seems to go back to Fedora 7: e.g., https://docs.fedoraproject.org/en-US/Fedora/7/pdf/Installation_Guide/Fedora-7-Installation_Guide-en-US.pdf , which is the first / oldest Fedora version contained in the old Docs. So with a little bit luck, no PDF work has to be done.

2 Likes

We agreed today that we like the approach of dropping the old docs from the server and pointing people to the old fedora-docs-web repo when they need historical content. Before we do this, we’ll wait a week for comment.

4 Likes

I think this is a good idea to help get the documentation under control.

1 Like

Should or can we talk with Infra to put in 301 Redirects to the /latest/? I think a part of the ask from Matt was that some old docs are doing better in search than current docs, the 301 should help move the new docs up the search results.

That would just work for the installation guide and the administrator’s guide. For all the other old docs, we don’t have a successor or direct replacement. So we have to redirect to the docs start page, I guess.

I believe the PDFs are all safe and publicly accessible in git on pagure, e.g. your https://docs.fedoraproject.org/en-US/Fedora/24/pdf/Installation_Guide/Fedora-24-Installation_Guide-en-US.pdf is at Tree - fedora-docs-web - Pagure.io (for some reason access is slow but it did complete after 2 minutes).

Kill the pages! Death before dishonor! :wink:

I’m aware of that :wink: The PDFs are available but there had been other considerations, too. However, I agree that we should get rid of the old Docs by implementing the “PDF plan” soon, although it is no longer as critical as it used to be: as you said, thanks to robots.txt :wink: I am currently not involved in this topic but I’ll check in the next meeting what the state of this issue is. Or maybe someone else can add some information here.

I suppose it’s me who is responsible for the delay, more precisely whose time budget is responsible.

Never kill something you may want to use later. It’s on my task list to check the documents for parts which are worth to get updated and complete our current documentation. After a short check, I found several. But currently I’m busy with a contributors guide, and when that finished with an updated docs homepage and the installation and administrators guides. And then I hope I can tackle it.

For those with some latin background:
ceterum censeo carthagi… no wait, help with docs work is always welcome.

1 Like