Debugging ethernet before NFS boot

I’m trying to boot Linux from U-boot on an embedded ARM board using a filesystem on a remote machine served via NFS. It appears that the ethernet connection is not coming up correctly, which results in a failure to mount the NFS share. However, I know that the ethernet hardware works, because U-boot loads the kernel via TFTP.

How can I debug this? I can try tweaking the kernel, but that means recompiling the kernel for every iteration, which is slow. Is there a way that I can make the kernel run without being able to mount an external filesystem?

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

You can compile a initrd image into kernel (General Setup -> Initial RAM filesystem and RAM disk (initramfs/initrd) support -> Initramfs source file(s)). You specify file in special format like (my init for x86):

dir /bin                                    0755 0 0
file    /bin/busybox                        /bin/busybox    0755 0 0
file    /bin/lvm                        /sbin/lvm.static0755 0 0
dir /dev                                    0755 0 0
dir /dev/fb                                 0755 0 0
dir /dev/misc                               0755 0 0
dir /dev/vc                                 0755 0 0
nod /dev/console                                0600 0 0    c  5   1
nod /dev/null                               0600 0 0    c  1   3
nod /dev/snapshot                               0600 0 0    c 10 231
nod /dev/tty1                               0600 0 0    c  4   0
dir /etc                                    0755 0 0
dir /etc/splash                             0755 0 0
dir /etc/splash/natural_gentoo                      0755 0 0
dir /etc/splash/natural_gentoo/images                   0755 0 0
file    /etc/splash/natural_gentoo/images/silent-1680x1050.jpg  /etc/splash/natural_gentoo/images/silent-1680x1050.jpg  0644 0 0
file    /etc/splash/natural_gentoo/images/verbose-1680x1050.jpg /etc/splash/natural_gentoo/images/verbose-1680x1050.jpg 0644 0 0
file    /etc/splash/natural_gentoo/1680x1050.cfg        /etc/splash/natural_gentoo/1680x1050.cfg        0644 0 0
slink   /etc/splash/tuxonice                    /etc/splash/natural_gentoo              0755 0 0
file    /etc/splash/luxisri.ttf                 /etc/splash/luxisri.ttf                 0644 0 0
dir /lib64                                  0755 0 0
dir /lib64/splash                               0755 0 0
dir /lib64/splash/proc                          0755 0 0
dir /lib64/splash/sys                           0755 0 0
dir /proc                                   0755 0 0
dir /mnt                                    0755 0 0
dir /root                                   0770 0 0
dir /sbin                                   0755 0 0
file    /sbin/fbcondecor_helper                 /sbin/fbcondecor_helper                 0755 0 0
slink   /sbin/splash_helper                 /sbin/fbcondecor_helper                 0755 0 0
file    /sbin/tuxoniceui_fbsplash               /sbin/tuxoniceui_fbsplash               0755 0 0
file    /sbin/tuxoniceui_text                   /sbin/tuxoniceui_text                   0755 0 0
dir /sys                                    0755 0 0
file    /init                           /usr/src/init   0755 0 0

I haven’t used it on ARM but it should work. /init is file you are can put startup commands. Rest are various files needed (like busybox etc.).

Solution 2

A few things that come to mind:

  • Use tcpdump, wireshark or other Ethernet packet inspector to see whether the board is sending packets to the wrong address or not sending anything at all.
  • What do you have on the serial console (if there is one)?
  • Try connecting a remote kernel debugger.
  • Try running inside a simulator, if you have a simulator that you can reproduce your problem in.
  • Instead of just fetching a kernel, put a boot-and-root filesystem in flash memory, or load a root filesystem to a RAM disk.

Solution 3

This post is regarding the network issues brought up in the question, not about kernel debugging.

If your switch supports Spanning Tree Protocol (STP), keep in mind that STP may not activate the the Ethernet port on the switch for 6 seconds or more while STP does it’s work. This delay may start over every time the host resets the Ethernet port on the host, which can happen multiple times between power-up, the DHCP request, when the Kernel loads the network drivers, etc. This can interfere with NFS boots for diskless systems, DHCP, kickstart, etc. and has caused plenty of headaches for many sysadmins. For some examples, see RedHat Bug 189795 – DHCP timeouts during Kickstart , and this PXE Guide.

Most high end switches such as Cisco switches and HP ProCurve switches do support STP, and it’s enabled for all ports out of the box.

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply