April 28, 2016

Farewell 172.17.42.1, you will be missed!

A week ago, continuous beeps from alerts slack channel woke me up early in the morning. Digging up the logs, I saw “System rebooting.” Why? FYI, the machine runs CoreOS. In a while, we came to know CoreOS and the tools it ships are updated automatically as new versions come out. They say “We believe that automatically updating the operating system is one of the best tools to achieve this goal.” Reading more on CoreOS update strategies, the reboot strategy could be defined in cloud-config.

Apr 05 21:57:32 update_engine[608]: [0405/215732:INFO:update_attempter.cc(283)] Processing Done.
Apr 05 21:57:32 update_engine[608]: [0405/215732:INFO:update_attempter.cc(309)] Update successfully applied, waiting to reboot.
Apr 05 21:57:32 update_engine[608]: [0405/215732:INFO:update_check_scheduler.cc(82)] Next update check in 44m35s
Apr 05 21:57:32 locksmithd[640]: LastCheckedTime=1459893385 Progress=0 CurrentOperation="UPDATE_STATUS_UPDATED_NEED_REBOOT" NewVersion=0.0.0.0 NewSize=186494743
Apr 05 21:57:32 systemd-logind[612]: System is rebooting.

You may opt to disable automatic update daemon. Temporary way would be disabling update-engine service manually by doing

sudo systemctl stop update-engine

.. but this service auto starts on machine reboot. Permanent way would be change the config in cloud-config file.

#cloud-config
coreos:
  units:
    - name: update-engine.service
      command: stop
    - name: locksmithd.service
      command: stop

Note: Locksmith is the reboot manager for the CoreOS update engine

In the above case, you may manually update the machine by doing update_engine_client -check_for_update

But, out of all this, the recommended way would be to do the auto updates with a maintenance window where you define a window of time during which a reboot can occur in the cloud-config file.

#cloud-config
 coreos:
  locksmith:
    window-start: Thu 02:00
    window-length: 1h

Note: In this example, the window is defined to be every Thursday between 02:00 and 03:00

Well, with the CoreOS things figured out, this chained us to another problem. The host was rebooted and switched over to a later version of docker. After digging deep, we figured, the default docker0 IP is not 172.17.42.1 for new installs (> v1.9), anymore. That was just a magic undocumented IP which FWIW we hardcoded it in a few places.

After the change of this IP in docker update, someone even sent a PR to docker/libnetwork - https://github.com/docker/libnetwork/pull/649/files which has been closed without merging. Obviously! No more magics, they say!

To fix this up, you may follow any of the following ways

  • Use bip flag to provide CIDR notation address for the dynamically created bridge (docker0) sudo docker daemon --bip="172.17.42.1/16"

  • Look up your default gateway IP inside the container and whatever IP that is, is the IP that you can use to talk to the host.

In case you are confused with anything I wrote above, do comment. All hail hydra!

Copyleft Jaipradeesh