un. fucking. believable.
Nov. 11th, 2008 09:14 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
so, i've been having some problems with the world's most troublesome 10M connection
i found the problem. after lots and lots and lots of testing.
The patch cable was bad.
So was the one we tried replacing it with.
So was the one we tried replacing that one with.
So was the one we tried replacing the third one with.
So was the one we tried replacing the fourth one with.
So was the one we tried replacing the fifth one with.
So was the one we tried replacing the sixth one with.
at that point, they didn't have any other ST/LC cables that were long enough for this patch.
SEVEN. SEVEN BAD CABLES.
most of which were still in their factory-sealed bags at the time.
the internet told me that microcenter had two in stock. i went there tonite, bought an ST/LC MMF cable, brought it to the data center, and the tech installed it.
I have my 10M link.
and now, i do believe i'm going to go home and have a drink. or possibly seven. one to toast each of the bad cables.
i found the problem. after lots and lots and lots of testing.
The patch cable was bad.
So was the one we tried replacing it with.
So was the one we tried replacing that one with.
So was the one we tried replacing the third one with.
So was the one we tried replacing the fourth one with.
So was the one we tried replacing the fifth one with.
So was the one we tried replacing the sixth one with.
at that point, they didn't have any other ST/LC cables that were long enough for this patch.
SEVEN. SEVEN BAD CABLES.
most of which were still in their factory-sealed bags at the time.
the internet told me that microcenter had two in stock. i went there tonite, bought an ST/LC MMF cable, brought it to the data center, and the tech installed it.
I have my 10M link.
and now, i do believe i'm going to go home and have a drink. or possibly seven. one to toast each of the bad cables.
no subject
Date: 2008-11-12 02:45 am (UTC)Care to share the brand name? (not that I have a need for cables like that)
no subject
Date: 2008-11-12 02:56 am (UTC)no subject
Date: 2008-11-12 08:23 am (UTC)T.
F.
I'm not sure I would have tried 7 times. Sheesh.
no subject
Date: 2008-11-12 02:47 am (UTC)One of the six video planes in the mixer was flipping to pink every once in awhile. Techs had no clue, I told them to check the crimps. Major presentation, and during so the screen went pink. Techs said it was programming: "If it was a crimp the video would drop if I did this" as he wiggled a cable.
We screamed NO! Video went away. Tech 2 tried to rertoute to plane 4, however he had a football game going on plane 4's monitor. So our audience was treated to the power point going out, then a football game for 5 seconds, then the presentation.
Needless to say they fixed the crimp.
CZ
no subject
Date: 2008-11-12 02:56 am (UTC)no subject
Date: 2008-11-12 03:40 am (UTC)no subject
Date: 2008-11-12 04:40 am (UTC)This reminds me of a story that only readers of this posting would be likely to appreciate. I once spent a couple of months debugging a problem with a Codex stat mux that would crash once a week or so at one customer site in Italy. The field service guy recognized that the PTT had provided a very noisy circuit, so we started by plugging in a noise generator to see if we could recreate the problem. Sure enough, it was the line noise that provoked the crash, but we still didn't know why the box was crashing. Watching on a protocol analyzer, I figured out that the crash happened about once for every 60,000 bad packets! I woke up in the middle of the night with the solution. Wanna guess?
no subject
Date: 2008-11-12 08:55 am (UTC)no subject
Date: 2008-11-12 01:29 pm (UTC)65536 chance that the bad packet will pass the CRC test.
By using a logic analyzer to trigger the "stop" on the protocol analyzer, I was able to capture the moment. The last packet received before the crash was always a packet with a "good" CRC, but its actual contents were garbled.
The protocol used by the stat mux assumed that the link level CRC would be adequate so it didn't do any further error checking on the contents of the packet. Since the product was on its last legs, my boss approved the kludge of adding an extra one byte checksum inside the link level packet- this turned the once a week crash into a once every five years crash.
Now, the newer link level protocols all use 32 bit CRC's. Furthermore, TCP (which was developed in the bad old days of 16 bit CRC's) includes its own checksum.
no subject
Date: 2008-11-12 01:35 pm (UTC)no subject
Date: 2008-11-12 11:07 am (UTC)the ST fiber connection has a bayonet lock like a coax connection, with the barrel connector that you have to twist to lock down into place.
the design of the failing cables was that the exterior part of the connector just turned, and the interior (the fiber) was on some kind of spring-loaded thing and it pushed down into the connector when you secured it. which is kind of a questionable design choice, i'd say.
the one i got had the fiber stationary and the exterior of the connector pulled forward with spring resistance without changing anything about the fiber.
as for the 60K packets, was it an internal counter?
no subject
Date: 2008-11-12 06:41 am (UTC)no subject
Date: 2008-11-12 02:07 pm (UTC)no subject
Date: 2008-11-12 02:10 pm (UTC)the data center guys had a cable tester, but it didn't have a female ST connector on it, so they couldn't test the patch cable.
i pushed them to do at least some troubleshooting with a laser pointer. i think that's around when they started swapping in every cable they had.
no subject
Date: 2008-11-12 03:25 pm (UTC)But THE NEXT ONE STAYED OOP!
no subject
Date: 2008-11-13 01:23 am (UTC)no subject
Date: 2008-11-14 06:55 am (UTC)