Big Bubbles (no troubles)

What sucks, who sucks and you suck

No-Reboot, No Kidding

Seen on Solaris Central: > JNI Ships ‘No-Reboot’ Solaris Driver

This takes me back. The first and last time I had experience with a Solaris driver from JNI, it too was “no-reboot”. As in, the system would no longer boot once the driver was installed.

I took crash dumps. I gathered Explorer outputs. I called Sun and opened a call. My friend and former colleague Clive kindly had the call diverted to him and proceeded to plunge headfirst into the dumps. Coming up for air a short time later, he averred that the problem “seemed” to lie in the JNI driver code because the Solaris kernel execution around it had been fine.

We called JNI in New York with our findings. They said no, it worked on their test system and therefore it was a Solaris issue. Um, we said, are you sure you wouldn’t mind looking at this d… No, it’s a Solaris problem, they insisted. (In the background, there was the low rumble of the client getting antsy, almost drowned out by the shrill screeches of their reseller panicking and wanting me to be on site every day so I could stare at the server as it crashed again. This died down when their account manager managed to get himself banned from the site. Ha. Ha. Haw.)

A pleasant young lady from Sun got in touch to ask me for my feelings (as a third party Sun consultant, it was a rarity ever to be asked for a view on anything by Sun). She apologised for coming into the call raw, but had been off with back problems. I sympathised and said that this call was giving me pains in the lower back region too. I said that I was about as unsure as everyone else at this point, but that Clive had examined the crash dumps and argued a strong case for the problem lying with the JNI code, complete with relevant extracts from the stack trace. I didn’t understand his argument (not being fluent in kernelese) but at least he had backed up his claim. The JNI guys, on the other hand, felt it was merely sufficient to deny responsibility and point the finger at Solaris. The crash dump? No, it’s a Solaris problem. But the crash happening after their driver initialised? It’s a Solaris problem. (We couldn’t argue with their incisive logic…well, we might have done if they’d stayed on the phone long enough.)

Stasis. I moved on to other clients. A short while later, I heard that JNI had found a problem with their driver when loaded in systems with less than the 2Gb of RAM that their test system had installed.

(Disclaimer: I have no idea what their latest “No-Reboot” driver is or the general quality level of their product line. Who knows, it may just work. My point is that, in a four-way cluster-fuck between a client, a reseller, a systems vendor and a third party supplier, trust no one but if you have to trust anyone, put your money on the guy who shows you what he’s talking about.)