What should we use in the URL slugs in Badges 2.0

Summarizing and continuing the discussion brought up today on IRC/Matrix.

Problem

Currently the badge name is used as a slug for the badge URL. The Trust me, I know what I am doing badge has the following URL: https://badges.fedoraproject.org/badge/trust-me%2C-i-know-what-i-am-doing (the slug is the final part after /badge/). It is derived from the badge name. That name contains a character (,) that needs to be URL-encoded[1]. That results in a URL that looks odd and is less appealing to humans.

On the backend, when referencing the same badge, as I had today, the name of the badge is retained with the special characters. So, I had to run a command like: edit-badge --badge 'trust-me,-i-know-what-i-am-doing' --tags community,infrastructure.

All that is making things needlessly complicated.

Proposed Solutions

Human readable slugs

One option is to make sure special characters, the set that needs to be URL-encoded, are dropped when composing name and slug for the badge.

Staying with the example above and the current modus operandi of replacing spaces with dashes and converting to lower case, given the badge name, the resulting slug should be trust-me-i-know-what-i-am-doing. That slug should then also become the internal name of the badge.

What if

What if a badge needs to be renamed after it has been released and possibly already awarded?

While this does not happen very often, it should be possible without breaking anything and/or needing intervention of a sysadmin. One possible solution would be to redirect from the old URL to the new URL. Since the name is changed in the database, no action is required on the backend, referencing the badge by its new name.

Use UUIDs

Another option is to use UUIDs[2] for the badges. As the name suggests, these are unique identifiers. Therefore they cannot cause conflicts and renaming a badge will retain its UUID and the URL will not change. The URL may look something like: https://badges.fedoraproject.org/badge?id=a2b3fdde7.

Use numeric IDs

This option is similar to UUIDs. It would make use of the badge ID as recorded in the database. Since the ID is an increment field in the database, the IDs are unique and IDs from deleted badges are not re-used. Similarly, renaming a badge, will not change its ID, nor will its URL change. It may look something like: https://badges.fedoraproject.org/badge?id=424242.

Bottom Line

We need to find a solution that is user friendly as well as manageable for developers and admins, taking into account the desire to be able to modify badge details as we see fit, without breaking things or requiring manual intervention.


  1. HTML URL Encoding Reference (for all the gory details: https://www.ietf.org/rfc/rfc3986.txt) ↩︎

  2. Universally unique identifier ↩︎

1 Like

Thanks @gui1ty for this thread. I think this discussion requires more visibility than what we could have afforded to provide it with the chat in our Matrix channel or on our regular meeting calls - and this thread does just that. It also needs to be decided before the next community roundtable to make sure that we can make much progress on the API service side of things.

Here’s my 1.6539665 INR[1] about the topic.

  • I do not like the use of slugs.

    • They fit well with project or repository names on an SCM service but in our case, we have a bunch of badges that have added punctuations that would necessitate delving into extra complexities just to be able to sanitize them.
    • For what it’s worth, there are a bunch of badges that include those punctuations in the slugs themselves like the “Rock The Web!” badge[2] which itself becomes difficult for folks to manually type in if they do not know about the URL encoding[3].
    • There are badges with multiple words with most words under three characters like the one you mentioned, “Trust me, I know what I am doing” badge[4] - making it rather unwieldy to use hyphens over and over again to type the name.
    • As you mentioned, we would not want to not consider the situations where badge renaming is required and as a result, the older slug would be invalidated[5] and would require a sysadmin intervention to set up redirects to the new one.
    • Redirects are possible but I would much rather have a monolithic development-bound solution[6] as then people would “prefer” not to rename badges[7] just because they would probably have to open up a Fedora Infrastructure[8] ticket to request redirects.
    • The added complexity that gets introduced when we require a sysadmin intervention to set up redirects from an older slug to a newer slug whenever a badge is renamed affects the reproducibility of the project[9].
  • I like the use of UUIDs.

    • They are totally unaffected by the inclusion of punctuations in the actual name of the badge and hence, standard sanitizing methods should be more than enough and their sizes can remain constant across the wide range of badges[10].
    • Granted that these UUIDs would be some computer-generated oooga-booga that would be comparatively difficult to remember as compared to the more human-readable slugs[11], I do plan on a copious use of QR codes around.
    • A standard 32-char wide UUID[12] might be a bit too much and we would not even have that many badges[13] to warrant the use of that much “uniqueness” but we can use, say, an 8-char wide UUID[14] and that should make the URL short enough.
    • As the UUIDs are totally unaffected by the actual name of the badge, they can be renamed whenever they are required to be renamed without a sysadmin intervention for setting up redirects as the URL will still end up remaining the same.
    • The use of UUIDs helps me maintain a monolithic development-bound solution[15] and hence, people would be more open (and encouraged) to evolve existing one-time-only badges into series, owing to the fact that it is now easy to do that.
    • The reproducibility of the project remains as-is when we avoid the inclusion of unnecessary workflow complexities for a task as easy as renaming a badge and people would be more open to using this system elsewhere[16], as they see fit.
  • I have mixed feelings about the use of numeric IDs.

    • The URL length consistency goes down the drain. Imagine having a badge with the URL, https://badges.fedoraproject.org/badges/1 and another having a URL https://badges.fedoraproject.org/badges/428 without the presence of a badge with the URL https://badges.fedoraproject.org/badges/2.
    • The default way of accessing the database is asynchronous[17][18] with the newer implementation, so there is no “locking” of the database whenever an entry is made and hence, it is very possible to skip through autoincrement IDs because of errors.
    • This skipping through of numeric IDs causes inconsistency in the continuity of the URLs but ensuring the consistency would also require us to access the database synchronously[19][20]. So there is really no “perfect” way of implementing numeric IDs.
    • We could, of course, not use contiguous numeric IDs in favour of using generating[21] a random 6 or 8-digit long integer that we can use as the URL pattern for accessing each badge. Does not this approach sound very similar to the one with shortened UUIDs[22]?

  1. 0.02 USD as of 07th July 2023 ↩︎

  2. https://badges.fedoraproject.org/badge/rock-the-web! ↩︎

  3. and it is safe to assume that a majority of people do not ↩︎

  4. the slug of which would translate to trust-me%2C-i-know-what-i-am-doing ↩︎

  5. when a newer slug would be generated from the newer name ↩︎

  6. where it is not necessitated to set up redirects at all ↩︎

  7. even when it is needed just because probably a one-time badge now became a part of a badge series ↩︎

  8. https://pagure.io/fedora-infrastructure/issues ↩︎

  9. in, say, some other free and open source community or for folks who are self-hosting this in their homelab infrastructure for either fun or development ↩︎

  10. so URL encoding is not required and with a constant size of the URL pattern, these URLs a lot more wieldy than the slugs ↩︎

  11. that people keep themselves from adding punctuations to owing to the fact that it is possible to make the URLs rather complicated for usage ↩︎

  12. for example, 01441163-ee6b-4726-bce8-5918990e4028 ↩︎

  13. would we, @riecatnor? :stuck_out_tongue_winking_eye: ↩︎

  14. for example, bb342781 ↩︎

  15. where it is not necessitated to set up redirects at all ↩︎

  16. in, say, some other free and open source community or for folks who are self-hosting this in their homelab infrastructure for either fun or development ↩︎

  17. using the asyncpg dialect for postgres ↩︎

  18. more information can be found here fedora / Fedora Websites and Apps / Fedora Badges / Server · GitLab ↩︎

  19. which would slow down the service by quite a lot ↩︎

  20. would we do that, @thunderbirdtr? :stuck_out_tongue_winking_eye: ↩︎

  21. in the server ↩︎

  22. I would much rather use an established UUID library rather than “reinvent the wheel” by generating random numbers by myself ↩︎

I agree that using readable slugs, derived from the badge name, needs some clever solution:

  1. To translate the name to a slug that does not require URL encoding (sanitization)
  2. To make redirects work without admin intervention

Sanitization

If you look at the URL of this thread (https://discussion.fedoraproject.org/t/what-should-we-use-in-the-url-slugs-in-badges-2-0/85289), it does essentially the same. Yes, there is some numeric identifier attached to the URL. But the URL itself conveys information regarding the topic that is being referenced. I find that incredibly useful when looking at the link without context. It tells me what the subject of the discussion is without having to visit the URL.

I think that would be very helpful for badges URLs as well. The many dashes, if you look at the URL of this thread, don’t bother me. It’s easy to translate back to the subject.

Redirects

Redirects must not require admin intervention. Let’s take that as a prerequisite.

Circling back to Discourse: when a topic gets renamed or relocated, I’m sure there’s something clever going on that all the Old URLs remain valid and are being redirected to the new URL. It does not require any admin intervention. I’m not sure how it’s done exactly and what software stack is involved, but somehow the app emits redirects.

Is that not possible to implement in the framework we are using for badges?

Regarding the other options:

UUIDs

If we do decide to use UUIDs, I agree they should be short. I think 8 (4,294,967,296 possible values) or even 6 (16,777,216 possible values) should serve as well until Badges 3.0. :wink:

I can also understand most of the points you make from a developer’s point of view. My motivation, and the purpose of this discussion, is to balance it against the user perspective.

I don’t think the current particularities are keeping people from suggesting badge series or wanting to extend single badges into a series. After all it’s the friendly badges admins that have to implement it.
Not being able to implement a badge series due to lack of rule definitions is more common, I believe. That could be either the source not emitting usable messages on the bus or not emitting messages at all (e.g. https://pagure.io/fedora-badges/issue/21), or the messages or rule definition not being fine grained enough to extract the required information for awarding the badge to the actual person that should be awarded (e.g. PR#806: Retire Long Life to Pagure badges - fedora-badges - Pagure.io).

I’m not sure I follow you on this one. I think it departs from the point of view that redirects require admin intervention. When that is handled internally, Badges 2.0 will be perfectly reproducible.

Numeric IDs

I don’t think there is any requirement for the IDs to be consistently incremental. All that is required is uniqueness. So, as long as the IDs match the ID in the database, we are fine, however many gaps there may be in between.

There may even be an advantage in having a direct 1:1 relation to IDs in the database. UUIDs do not need to be generated. UUIDs do not have to be looked up to derive the IDs as used in the database. All that should reduce load on the server and the database backend. It might not matter much given modern day hardware, but a step not taken is a step that cannot fail or cause problems.


And the price for most footnotes in relation to words written goes to - drumroll - @t0xic0der ! :tada: :grin:

1 Like

Following up on the discussion that @gui1ty, @thunderbirdtr, @lenkaseg, @sumantrom and myself today, we decided to go ahead with the use of UUIDs to identify the badges uniquely and to have them as the URLs for them. As much as we wanted to implement redirects for the badges’ URLs to factor in the renaming of the badges, we decided that it is a “nice-to-have” feature that we, in the shortage of human efforts, are better off not dedicated resources to at the moment.

Folks who are tagged - please feel free to correct me or add stuff that I probably missed out on from this morning/afternoon’s discussion.

2 Likes

My success criteria here is that it should be possible to edit a badge’s title without the old URL either breaking or the new URL being wrong.

3 Likes