ext_105594 ([identity profile] unclebooboo.livejournal.com) wrote in [personal profile] rmd 2008-11-12 01:29 pm (UTC)

The link level CRC was a 16 bit bit checksum. With only a 16 bit CRC, every time a packet gets corrupted with noise you've got about a 1 in
65536 chance that the bad packet will pass the CRC test.

By using a logic analyzer to trigger the "stop" on the protocol analyzer, I was able to capture the moment. The last packet received before the crash was always a packet with a "good" CRC, but its actual contents were garbled.

The protocol used by the stat mux assumed that the link level CRC would be adequate so it didn't do any further error checking on the contents of the packet. Since the product was on its last legs, my boss approved the kludge of adding an extra one byte checksum inside the link level packet- this turned the once a week crash into a once every five years crash.

Now, the newer link level protocols all use 32 bit CRC's. Furthermore, TCP (which was developed in the bad old days of 16 bit CRC's) includes its own checksum.

Post a comment in response:

(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org