Code block highlight is often wrong/unnecessary

The common uses of code blocks here don’t work well with Discourse’s language detection and syntax highlighting.

  • The language detection performs poorly on short samples like shell one-liners.
  • For output only, logs, or other plaintext, highlighting is always wrong. It might look nice by coincidence, but it’s essentially random highlighting of numbers or symbols.
  • For commands + output, even if it is detected or specified as bash, the highlighting is meant for bash scripts, so the output is wrongly highlighted.

I suggest disabling this feature. Users can specify the language to get correct highlighting when needed (rarely, since this is not a programming forum[1]).

It might also have minor savings in page loading since the detection and highlighting is done on client-side JS (?).


Examples

All examples don’t have any language specified after the triple-backticks.

  • This is detected as CSS:

    dnf list --installed kernel
    
  • Lua:

    $ mount --all
    
  • Ruby:

    $ tar xvf foo.tar.gz
    
  • Some common shell built-ins are detected as bash:

    echo foo
    
  • But other shell built-ins aren’t (CSS again):

    time foo
    
  • Journal output, detected as Apache:

    Apr 24 16:07:40 asuja kernel: last_pfn = 0x26f000 max_arch_pfn = 0x400000000
    Apr 24 16:07:40 asuja kernel: x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
    Apr 24 16:07:40 asuja kernel: last_pfn = 0x8f800 max_arch_pfn = 0x400000000
    
  • dnf history, detected as SQL:

    $ dnf history info last
    Transaction ID : 757
    Begin time     : Mon 24 Apr 2023 14:31:26 +08
    Begin rpmdb    : da3a6d93b2dc2d1044bd471a85ed5d2554dc4f59bc01b64a79d8b040d12413a3
    End time       : Mon 24 Apr 2023 14:31:29 +08 (3 seconds)
    

  1. Even on language-specific forums, it doesn’t work well. For example, this thread on Python Discourse; the first post is detected as CSS, and the second post quotes a section from the first post but it’s detected as C++. ↩︎

2 Likes

Hmm. You make a convincing argument. The poor auto-detection is probably something to bring upstream (to the library used), but in the meantime I’ll disable it.

1 Like

I don’t fault it for not working well on one-liners; it’s a guessing game at best. Highlighting mixed code + output is also probably out of scope for highlight.js.

Makes sense. An option for “don’t guess on one-liners, only guess if confidence is high for multi-line” might be nice.

Just wondering if there was a change. New posts still have auto-detection on code blocks.

Test:

if auto-detect is on, this will be highlighted

Stack Exchange sites set the default syntax for untagged blocks based on the discussion tags, which is clever. (But, they have a lot of tags that imply likely source languages, whereas our tagging is often wildly incorrect due to, no surprise, poor automated guessing.)

But you can always explicitly disable highlighting, by fencing a block as ```text explicitly.

$ echo "See? No highlighting."

(You can also, apparently, redundantly overuse the word “explicitly” in a redundant manner.)

The setting is off.

image

But…

if auto-detect is on, this will be highlighted

Ah! I found it!

In addition to the setting above, there is also default code lang, which was set to auto. I’ve changed it to text. ¯\_(ツ)_/¯