Why are Hadoop and Spark not in the official Fedora repositories? | Orphaning Procedure

donaldsebleung · November 13, 2021, 1:57pm

Considering that a major use case of Fedora (and Linux in general) is in cloud computing and big data systems, why is the de-facto way of setting up Hadoop and Spark (key frameworks related to big data processing) in 2021 downloading and unpacking archives from upstream, instead of simply fetching the appropriate binary packages from the official Fedora repositories by running an appropriate dnf install command? Unless I’m missing something, I imagine that having such commonly-used frameworks prepackaged for Fedora would bring a number of tangible benefits to a vast user base, such as (but not limited to):

Improved integration with the host system
Less manual setup and configuration required

P.S. I dug a little bit deeper and found:

Especially with (2), I’m not sure how reliable it is since it doesn’t appear to be officially endorsed by Fedora, but anyway, I did a fresh install of Fedora 35 server to see for myself and dnf search hadoop did not turn in any results, while dnf search spark gave mostly unrelated results, the closest being some Spark client for Azure(?). If (2) is indeed true, why was the package removed in Fedora 32?

ilikelinux · November 13, 2021, 3:03pm

Applications mostly get kicked out while not be maintained active ( Orphaning Procedure ).

Check here:

https://src.fedoraproject.org/

spark / Created 4 years ago / Package is currently unmaintained
hadoop / Created 4 years ago / Package is currently unmaintained

augenauf · November 13, 2021, 3:20pm

Tree - rpms/hadoop - src.fedoraproject.org states:

hadoop fails to build from source: https://bugzilla.redhat.com/show_bug.cgi?id=1675096

You can engage and contribute in the Special User Group (SIG) bigdata (SIGs/bigdata/packaging - Fedora Project Wiki), maybe you get it build again. The package was included in Fedora up until F29, see hadoop | Package Info | koji

Other relevant links re your question, seems like hadoop has a huge list dependencies:
https://fedoraproject.org/wiki/Changes/Hadoop

donaldsebleung · November 14, 2021, 2:35am

Thank you! So it seems that it was simply a matter of no one stepping up to maintain it due to packaging issues.

donaldsebleung · November 14, 2021, 2:44am

Thanks for the detailed links, it does seem that they were abandoned due to packaging difficulties. Maybe I should consider contributing sometime

mattdm · November 14, 2021, 3:33am

Yeah… Java is notoriously difficult to package in Fedora and really Linux distros in general. The language ecosystem has a bunch of norms which don’t really fit with the rpm (or deb) way of doing things. So that added a lot of challenge beyond just “work on the big data stuff we’re actually interested in”.

Topic		Replies	Views	Activity
Why is rustup not avaiable as a package? Ask Fedora packages	4	2798	May 25, 2023
Arduino IDE missing from Fedora 35 repos? Ask Fedora f35	2	2037	November 7, 2021

Why are Hadoop and Spark not in the official Fedora repositories? | Orphaning Procedure

Related topics