Why are Hadoop and Spark not in the official Fedora repositories? | Orphaning Procedure

Considering that a major use case of Fedora (and Linux in general) is in cloud computing and big data systems, why is the de-facto way of setting up Hadoop and Spark (key frameworks related to big data processing) in 2021 downloading and unpacking archives from upstream, instead of simply fetching the appropriate binary packages from the official Fedora repositories by running an appropriate dnf install command? Unless I’m missing something, I imagine that having such commonly-used frameworks prepackaged for Fedora would bring a number of tangible benefits to a vast user base, such as (but not limited to):

  • Improved integration with the host system
  • Less manual setup and configuration required

P.S. I dug a little bit deeper and found:

  1. Changes/Hadoop - Fedora Project Wiki
  2. Installing Hadoop on Fedora

Especially with (2), I’m not sure how reliable it is since it doesn’t appear to be officially endorsed by Fedora, but anyway, I did a fresh install of Fedora 35 server to see for myself and dnf search hadoop did not turn in any results, while dnf search spark gave mostly unrelated results, the closest being some Spark client for Azure(?). If (2) is indeed true, why was the package removed in Fedora 32?

Applications mostly get kicked out while not be maintained active ( Orphaning Procedure ).

Check here:

https://src.fedoraproject.org/

spark / Created 4 years ago / Package is currently unmaintained
hadoop / Created 4 years ago / Package is currently unmaintained

3 Likes

Tree - rpms/hadoop - src.fedoraproject.org states:

hadoop fails to build from source: https://bugzilla.redhat.com/show_bug.cgi?id=1675096

You can engage and contribute in the Special User Group (SIG) bigdata (SIGs/bigdata/packaging - Fedora Project Wiki), maybe you get it build again. The package was included in Fedora up until F29, see hadoop | Package Info | koji

Other relevant links re your question, seems like hadoop has a huge list dependencies:
https://fedoraproject.org/wiki/Changes/Hadoop

2 Likes

Thank you! So it seems that it was simply a matter of no one stepping up to maintain it due to packaging issues.

Thanks for the detailed links, it does seem that they were abandoned due to packaging difficulties. Maybe I should consider contributing sometime :smile:

3 Likes

Yeah… Java is notoriously difficult to package in Fedora and really Linux distros in general. The language ecosystem has a bunch of norms which don’t really fit with the rpm (or deb) way of doing things. So that added a lot of challenge beyond just “work on the big data stuff we’re actually interested in”.

1 Like