Boot into Windows first or else Fedora does not work!

This is a very strange problem. Fedora 32 on my laptop shares space with Windows (dual-boot). Upon powering on the machine, if Fedora is booted first, the login screen does load. After entering the password, the mouse cursor disappears and all that can be seen is a blank screen. Here is where it gets interesting. Suppose, Windows is booted first upon powering the machine. A restart from Windows and a subsequent boot into Fedora, the desktop loads just fine! After entering the password, the mouse cursor disappears and reappears. Always! And I carry on to do my actual work.

What could be the reason? Perhaps Windows starts some service?

By the way, this is Gnome on Xorg. Wayland has never really worked and the screen freezes every time Wayland is chosen. So, Xorg is the only working option for now. This behaviour was seen with other distros running Gnome on Xorg.

The laptop is a HP Pavilion with:
Memory: 8 Gigs
Processor: Intel® Core™ i5-6200U CPU @ 2.30GHz
Graphics: Mesa Intel® HD Graphics 520 (SKL GT2)

When you power on your laptop and you are at the boot menu, press “e” to edit it and remove “rhgb” and “quiet” from the boot parameters. Please report what error messages you see when you boot straight to linux.

Is there a way to log these messages into a text file? The text was moving quite fast and only these things caught my eyes.

[35.983251] pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
[35.983251] pcieport 0000:00:1c.5: AER: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[35.983251] pcieport 0000:00:1c.5: AER: device [8086:9d15] error status/mask=00000001/00000000
[35.983251] pcieport 0000:00:1c.5: AER: [0] RxErr

Similar errors were seen in both attempts to boot (Fedora first, Windows first and Fedora next)

During the restart (first boot attempt to second boot attempt) the above errors seem to fill the screen, kept repeating and scrolling uncontrollably as though it was struck in some kind of loop.

The below error was also seen but, might be unrelated to the actual problem

[Failed] Failed to start Network manager wait online.

In the past I had seen some devices (e.g. sound cards) with buggy firmware which relied on the windows driver for their functionality. When windows shut down, the device entered a state from which it was impossible to recover using another OS. Some of these devices would remain operable when the computer was rebooted (from windows), so linux could use them at the next boot. I’m not sure yours is the same kind of problem, though.

Just to be on the safe side, have you installed the latest BIOS/firmware updates for your laptop?

In any case, this appears to be a known problem for some systems from that generation of hardware:
https://bugzilla.redhat.com/show_bug.cgi?id=1293424
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1521173
https://bbs.archlinux.org/viewtopic.php?id=242182
https://h30434.www3.hp.com/t5/Notebook-Software-and-How-To-Questions/Error-Spam-AER-id-00e5-PCIe-Bus-Error-severity-Corrected/td-p/5933687

In many cases, the boot parameters pci=noaer and pci=nomsi (which I think you can combine into pci=noaer,nomsi) seem to solve the problem. You could try booting with one of them, then the other, then both and see what works for you.

I would start by making sure there is no hardware problem though, e.g. a displaced card, dust or short on some contacts, etc… Is this a fresh installation or did the problem suddenly appear? That could also be an indicator.

You can use journalctl with the -b flag to see the logs from a specific boot, e.g. journalctl -b -1 displays the logs from the previous boot, journalctl -b -2 the logs from the one before that etc… Do that as root or with sudo, so that you get the full system logs.

By the way, if your issue is the same as that mentioned in the threads above, it’s very likely that your logs are taking up a lot of space. The following commands might come in handy:
journalctl --disk-usage
journalctl --vacuum-size=1.5G (or the amount you want)
and if you’ve been powering off the machine manually
journalctl --verify

@alexpl Thank you for exploring this and setting the direction! This is a fresh installation and the issue was seen from day 1. Will go through your suggestions, try updating the BIOS firmware after the work week and circle back to you.

I am making my google search with this, because I think this inform us of the device that cause the problem.

https://pci-ids.ucw.cz/read/PC/8086/9d15 suggest it is the/part of the pci-express port itself.

In https://askubuntu.com/questions/863150/pcie-bus-error-severity-corrected-type-physical-layer-id-00e5receiver-id they suggest:

Try using the pcie_aspm=off boot parameter to see if this stops the messages. Note that this will increase the power consumption of your machine as it disables the power savings.

But https://h30434.www3.hp.com/t5/Notebook-Software-and-How-To-Questions/Error-Spam-AER-id-00e5-PCIe-Bus-Error-severity-Corrected/td-p/5933687 suggest that pci=nomsi is more a solution… pcie_aspm=off was more a suggestion.

@alexpl @pauld
These were the findings after using the suggested commands:

####lspci -n####
00:1c.5 0604: 8086:9d15 (rev f1)

####lspci####
00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 (rev f1)

####lspci -vt####
-[0000:00]-±00.0 Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers
±02.0 Intel Corporation Skylake GT2 [HD Graphics 520]
±04.0 Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
±14.0 Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
±14.2 Intel Corporation Sunrise Point-LP Thermal subsystem
±16.0 Intel Corporation Sunrise Point-LP CSME HECI #1
±17.0 Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode]
±1c.0-[01]----00.0 Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader
±1c.5-[02]----00.0 Realtek Semiconductor Co., Ltd. RTL8723BE PCIe Wireless Network Adapter
±1d.0-[03]----00.0 Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller
±1f.0 Intel Corporation Sunrise Point-LP LPC Controller
±1f.2 Intel Corporation Sunrise Point-LP PMC
±1f.3 Intel Corporation Sunrise Point-LP HD Audio
-1f.4 Intel Corporation Sunrise Point-LP SMBus

####dmesg (error-snippet)####
[ 23.335022] pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
[ 23.335027] pcieport 0000:00:1c.5: AER: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[ 23.335029] rtl8723be: Using firmware rtlwifi/rtl8723befw_36.bin
[ 23.335030] pcieport 0000:00:1c.5: AER: device [8086:9d15] error status/mask=00000001/00000000
[ 23.335031] pcieport 0000:00:1c.5: AER: [ 0] RxErr
[ 23.335207] pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
[ 23.335212] pcieport 0000:00:1c.5: AER: can’t find device of ID00e5

In short, the errors were associated with the wireless network adapter. Not surprising, given the fact that it could be got to work only once in a blue moon. This very issue is noted by few others in the threads. As noted by a guy in one of the threads, “suppressing the warning on start up using grub defaults parameters (pci=nomsi and pci=noaer) may endanger the system if a serious error/problem arises in the future”. Hence, blacklisted the device by adding’blacklist rtl8723be’ in ‘blacklist.conf’. The device is disabled now and the error is not seen.

journalctl --disk-usage gave a size of 4 gb. Does ‘journalctl --vacuum-size=1.5G’ enforce the size restriction forever OR is a regular clean-up needed?

The BIOS was already on the latest version. The problem of having to boot first into windows is something I can continue to hang around with. Maybe it is the same situation as you were speculating…

Your thoughts? Comments?

Thanks again!

Well, if it’s not a silly thing like a screw not sitting right due to a manufacturing defect, I suppose you could try swapping the adapter for a different one and see if it fares better.

That’s a one-off cleanup. I think the default is what you had, 4GB. See man journalctl for other possible operations with the journal.

If you need to change the limits in a permanent way, take a look at this.

@alexpl

Will try swapping the adapter at a convenient time.

Just trying to know, when at the login screen, after entering the password and pressing the enter key, what really happens at that moment in time when

the cursor disappears and reappears

as far as the display server is concerned? What process stops and restarts?

As you mentioned earlier, the dmesg logs may hold the clue. Maybe it is not printing the error. Will have to compare the logs between the boots when the cursor reappears and when the cursor fails to reappear.

Will have a look at them again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.