The ostree compose uses the same packages that are pushed to Fedora stable, so any breakage in SB will also likely be seen in traditional Fedora, too.
toolbox is a relatively new package, so it is not out of the ordinary that problems are being uncovered.
As for CI/test, it really depends on each package. There are some standard tests run against toolbox for example (https://bodhi.fedoraproject.org/updates/FEDORA-2019-52e62c5725) but none that truly test the functionality of the package. I’m sure the maintainer would love to see some tests contributed for the package. 
Silverblue on the whole gets tested by folks just using it every day. Some people are brave enough to run Rawhide and find problems early. Additionally, there is the Silverblue Test Day that gets scheduled before each major release. That is a chance for users to help out with the test of the new major version.