Ignition in azure stopped working on Fedora CoreOS

hello , I had a pretty simple setup of producing an ignition file, and launching the azure vm with terraform. in the middle of the day with no code changes (I know we say that, but literally I created several 5 mins before no problem) it just stopped working. the result is it never completes ‘creating’ and times out.

I’ve narrowed the culprit down to ignition because if I remove it the vm creates no problem and I can log in.

I have reduced my ignition file down to just this

{
  "ignition": {
    "version": "3.4.0"
  },
  "storage": {
    "files": [
      {
        "path": "/etc/motd",
        "mode": 420,
        "contents": {
          "source": "data:,Welcome%20to%20Fedora%20CoreOS%21"
        }
      }
    ]
  },
  "passwd": {
    "users": [
      {
        "name": "core",
        "sshAuthorizedKeys": [
          "ssh-ed25519 XXXXXXXX..... joe@ccsultramarine"
        ]
      }
    ]
  }
}

I also tried to create the vm without terraform and use az command, with same result.

az vm create --name sa-uat-rproxy1z-vm --resource-group srvapp-uat-rg --image /subscriptions/x-xxxxx-xxxx-xxxx-xxxxx/resourceGroups/rg-shared/..../fedora-coreos-ccs-latest-azure.x86_64 --size Standard_D4s_v3 --admin-username core --ssh-key-values "XXX joe@ccsultramarine" --vnet-name XXX --subnet XXX--custom-data @config.ign --no-wait

Azure responds with zero useful info, and can understand why if its ignition file based without an agent.

I don’t know where to go from there?

thanks, Joe

We’d probably need to see some logs from the boot to determine what is going on.

We test every release we do on Azure pretty extensively so I doubt it’s something inside CoreOS that’s causing the issue and more likely Azure’s environment misbehaving (you even said 5 minutes before it was working).

Without seeing any logs I would guess the metadata service is responding saying it has user data, but then somehow we’re not able to retrieve it and Ignition keeps retrying forever (this is the expected behavior).

I would 100% agree its azure. this wouldn’t be first time, but realizing I don’t know what to do now. trying to replicate it locally using virtualization to prove it I guess

looks like Azure has a CHAR limit on the Base64 encoded file it gets in ‘custom_data’. Looking to implement this as a fix. will report back results.

looks like that worked for me initially. Rewriting workflow.

1 Like