rmd: (fightclubanimated)
rmd ([personal profile] rmd) wrote2008-11-11 09:14 pm

un. fucking. believable.

so, i've been having some problems with the world's most troublesome 10M connection

i found the problem. after lots and lots and lots of testing.

The patch cable was bad.
So was the one we tried replacing it with.
So was the one we tried replacing that one with.
So was the one we tried replacing the third one with.
So was the one we tried replacing the fourth one with.
So was the one we tried replacing the fifth one with.
So was the one we tried replacing the sixth one with.

at that point, they didn't have any other ST/LC cables that were long enough for this patch.

SEVEN. SEVEN BAD CABLES.

most of which were still in their factory-sealed bags at the time.

the internet told me that microcenter had two in stock. i went there tonite, bought an ST/LC MMF cable, brought it to the data center, and the tech installed it.

I have my 10M link.

and now, i do believe i'm going to go home and have a drink. or possibly seven. one to toast each of the bad cables.

[identity profile] unclebooboo.livejournal.com 2008-11-12 01:29 pm (UTC)(link)
The link level CRC was a 16 bit bit checksum. With only a 16 bit CRC, every time a packet gets corrupted with noise you've got about a 1 in
65536 chance that the bad packet will pass the CRC test.

By using a logic analyzer to trigger the "stop" on the protocol analyzer, I was able to capture the moment. The last packet received before the crash was always a packet with a "good" CRC, but its actual contents were garbled.

The protocol used by the stat mux assumed that the link level CRC would be adequate so it didn't do any further error checking on the contents of the packet. Since the product was on its last legs, my boss approved the kludge of adding an extra one byte checksum inside the link level packet- this turned the once a week crash into a once every five years crash.

Now, the newer link level protocols all use 32 bit CRC's. Furthermore, TCP (which was developed in the bad old days of 16 bit CRC's) includes its own checksum.

[identity profile] unclebooboo.livejournal.com 2008-11-12 01:35 pm (UTC)(link)
Oh yeah- this was all happening at the blazing fast speed of 19,200 bits per second. I think that puts Regis's slow 10Mbps connection in perspective...