Frederik's Blog

Random thoughts of a Linux sysadmin

Today there was a leap second at 23:59:60 UTC. On one of my systems, this caused a high CPU load starting from around 02h00 GMT+2 (which corresponds with the time of the leap second). ksoftirqd and some java (glassfish) process where using lots of CPU time. This system was running Debian Squeeze with kernel 2.6.32-45. The problem is very easy to fix: just run

# date -s "`date`"

and everything will be fine again. I found this solution on the Linux Kernel Mailing List: http://marc.info/?l=linux-kernel&m=134113389621450&w=2. Apparently a similar problem can happen with Firefox, Thunderbird, Chrome/Chromium, Java, Mysql, Virtualbox and probably other processes.

I was a bit suprised that this problem only happened on this particular machine, because I have several other servers running similar kernel versions.

I just got a second monitor at home and wanted to configure the two monitors with my NVidia graphics card. You can set up TwinView in the Nvida Settings application, however I did not like that solution: the next time I restarted X, all the settings were lost and the second monitor powered off. Also GNOME did not seem to behave correctly when the monitors went on stand by and I unlocked the desktop. The desktop appeared to be shifted over the monitors. The latter might be a bug of gnome-settings-daemon 3.2 and not Nvidia’s however.

However since the NVidia proprietary driver version 330 beta series, it finally supports Randr 1.3 so that you can configure dual screen with the configuration tools provided with your desktop. This driver is currently available in Debian Experimental. To install it (make sure you have experimental in your apt sources.list first, of course), run this command:

# apt-get install -t experimental xserver-xorg-video-nvidia

I also pulled in gnome-settings-daemon and gnome-control-center version 3.4 which appeared in Debian Sid today:

# apt-get install -t unstable gnome-settings-daemon gnome-contol-center

Now reboot your system (to be sure the new Nvidia kernel and X drivers are loaded), and then go System Tools – Preferences – System Settings (gnome-control-center in a terminal window) – Display. Enable the wo monitors, set the optimal (highest) resolution and drag them in the right position, click Apply, and confirm everything is working fine. Now you have a nice multi-monitor setup without needing to mess with NVidia’s twin view and without having to create a script to get the right settings applied automatically when X is started.

Personally I absolutely do not like the gnome-shell in GNOME 3. I actually even hate it: it is slow, messy and cumbersome to use and I have the feeling that developers are not listening to criticism. Obvious and trivial design bugs which are well known, are totally ignored (bug 662738 is an example).

For that reason, I went looking for an alternative desktop. KDE is way too bloated for a netbook with 1 GB of RAM, while XFCE is not as polished as a traditional GNOME 2.32 desktop. The best alternative I could find out right now, was to just replace the GNOME Shell by a custom panel or dock implementation. In the end I chose cairo-dock: it is written in C, so it is probably not as memory hungry as AWN (which uses Python) and Docky (which uses Mono, which I also consider as a possible patent minefield). Cairo-dock is also actively maintained. I paired cairo-dock with the compiz window manager to get some nicely looking desktop.
continue reading…

The Artificial Intelligence lab has opened its new website a few days ago. The website is based on Drupal and a PostgreSQL database. Unfortunately, some modules (such as the biblio module) have some bugs when being used with a PostgreSQL datable instead of MySQL, but I hope that the last bugs will be fixed in the near future.

Recently I installed a server with a Supermicro SMC2108 RAID adapter, which is actually a LSI MegaRAID SAS 9260. LSI created a command line utility called MegaCLI for Linux to manage this adapter. You can download it from their support pages. The downloaded archive contains an RPM file. I installed mc and rpm on Debian with apt-get, and then extracted the MegaCli64 binary (for x86_64) to /usr/local/sbin, and the libsysfs.so.2.0.2 from the Lib_utils RPM to /opt/lsi/3rdpartylibs/x86_64/ (that’s the location where MegaCli64 looks for this library).

Here are some useful commands:

View information about the RAID adapter

For checking the firmware version, battery back-up unit presence, installed cache memory and the capabilities of the adapter:

# MegaCli64 -AdpAllInfo -aAll

View information about the battery backup-up unit state

# MegaCli64 -AdpBbuCmd -aAll

View information about virtual disks

Useful for checking RAID level, stripe size, cache policy and RAID state:

# MegaCli64 -LDInfo -Lall -aALL

View information about physical drives

# MegaCli64 -PDList -aALL

Patrol read

Patrol read is a feature which tries to discover disk error before it is too late and data is lost. By default it is done automatically (with a delay of 168 hours between different patrol reads) and will take up to 30% of IO resources.

To see information about the patrol read state and the delay between patrol read runs:
# MegaCli64 -AdpPR -Info -aALL

To find out the current patrol read rate, execute
# MegaCli64 -AdpGetProp PatrolReadRate -aALL

To reduce patrol read resource usage to 2% in order to minimize the performance impact:
# MegaCli64 -AdpSetProp PatrolReadRate 2 -aALL

To disable automatic patrol read:
# MegaCli64 -AdpPR -Dsbl -aALL

To start a manual patrol read scan:
# MegaCli64 -AdpPR -Start -aALL

To stop a patrol read scan:
# MegaCli64 -AdpPR -Stop -aALL

You could use the above commands to run patrol read in off-peak times.

Migrate from one RAID level to another

In this example, I migrate the virtual disk 0 from RAID level 6 to RAID 5, so that the disk space of one additional disk becomes available. The second command is used to make Linux detect the new size of the RAID disk.

# /usr/local/sbin/MegaCli64 -LDRecon -Start -r5 -L0 -a0
# echo 1 > /sys/block/sda/device/rescan

Create a new RAID 5 virtual disk from a set of new hard drives

First we need to now the enclosure and slot number of the hard drives we want to use for the new RAID disk. You can find them out by the first command. Then I add a virtual disk using RAID level 5, followed by the list of drives I want to use, specified by enclosure:slot syntax.

# MegaCli64 -PDList -aALL | egrep 'Adapter|Enclosure|Slot|Inquiry'
# MegaCli64 -CfgLdAdd -r5'[252:5,252:6,252:7]' -a0

View reconstruction progress

When reconstructing a RAID array, you can check its progress with this command.
# MegaCli64 -LDRecon ShowProg L0 -a0

(replace L0 by L1 for the second virtual disk, and so on)

Configure write-cache to be disabled when battery is broken

# MegaCli64 -LDSetProp NoCachedBadBBU -LALL -aALL

Change physical disk cache policy

If your system is not connected to a UPS, you should disable the physical disk cache in order to prevent data loss.

# MegaCli -LDGetProp -DskCache -LAll -aALL

To enable it (only do this if you have a UPS and redundant power supplies):

# MegaCli -LDGetProp -DskCache -LAll -aALL

More information

http://ftzdomino.blogspot.com/2009/03/some-useful-megacli-commands.html
https://twiki.cern.ch/twiki/bin/view/FIOgroup/DiskRefPerc
http://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS
http://kb.lsi.com/KnowledgebaseArticle16516.aspx

Today I was getting this error when installing a new kernel on a server running Debian:

/usr/sbin/grub-probe: error: Couldn't find PV pv2. Check your device.map.

The error can be reproduce by running the update-grub command.

The day before, a new RAID disk was added to this server, so I suspected this could be the cause. The file /boot/grub/device.map contained a reference to the first RAID disk as (hd0) but did not contain a reference to the new RAID disk. I ran

# ls -l /dev/disk/by-id/

to find out which SCSI ID referred to sdb (the new RAID disk), and then added the following line to device.map:


(hd1) /dev/disk/by-id/scsi-3600304800087c4f015fb4f2e4cc7a8e5

Now installing the new kernel works fine!

So, how did the almost two month lasting struggle with HP’s support end (see part 1, part2)?

On Wednesday evening, I received a mail that a 160 GB SSD was sent and we received it on Thursday morning. Also during the same week, we received a 500 GB 7200 RPM hard drive, which was meant to be a temporary replacement until the 160 GB SSD was available again.

So things are finally solved for good. I am just surprised that the 160 GB SSD suddenly became available so quickly now (it was pretty useless to send a 500 GB disk if the SSD would arrive only a few days later). Is this just coincidence or did the complaining convince HP to finally make a real effort to find a replacement quickly? We will probably never know.

In Belgium, we can fill out our tax form online on the Tax-on-web site using a smartcard reader and our electronic identity card. Unfortunately, things are rather complicated to set up, partly because the eID authentication is based on SSL renegotiation, a feature which is disabled by default in recent Firefox versions because it can be insecure. It is a bit disappointing that we have to rely on potentially vulnerable technologies to authenticate with our eID, but there is not much choice if you do not want to fill out the paper forms (or are too late, so that the electronic way is the only option).

First we need to make sure the smartcard reader works. I have a Dell Latitude E6400 laptop with a Broadcom smartcard reader which is supported by the ccid driver and required by the pcscd package in Debian. Note that the Broadcom 5880 as delivered by Dell in its Latitude laptops have a buggy firmware by default. You will need to update it by running some Windows tool. More information can be found on the ccid driver website or on the eID website. Note that also Windows is suffering from this problem, so even if you use Windows, you might need to install this update.

If you are using the traditional USB smartcard reader distributed by the government, which is an ACS ACR38, you will need the acr38u driver.

# apt-get install pcscd pcsc-tools libacr38u

To verify that the smartcard reader is working correctly, start up pcsc_scan and insert a smartcard (your eID or even a credit card is fine). Some diagnostic information about the card you inserted should appear automatically in your console. Press ctrl-C to exit pcsc_scan.

Now that the smartcard reader is working, we need to install the middleware and the Firefox plug-in:

# apt-get install beidgui beid-mozilla-plugin

Start up Firefox and open the menu Tools – Preferences. Click on the Advanced section and load the Encryption tab. Now click on Security Devices and click on the Load button. Enter a name (for example beid), and enter the path to the beid pkcs11 module. On Debian Wheezy it is: /usr/lib/libbeidpkcs11.so.3.5.2 . Be sure to check the filename, it might be different if you are using another version. If you cannot find it, try to run in a terminal:

# find / -name "*beidpkcs11*"

This command can also be used on Mac OS X, where the configuration procedure is actually similar to Linux.

To check whether the middleware is working correctly, you can load up beidgui and let it read your eID.

Now because tax-on-web uses SSL renegotiation, which is disabled by default in newer Firefox versions, we need to add an exception to Firefox’ configuration. Type about:config in the URL bar, confirm that you will be careful, and look for the setting security.ssl.renego_unrestricted_hosts. Double click on it, and enter the value ccff02.minfin.fgov.be

Now we need to make Firefox identify itself with version 3.5, otherwise the tax-on-web site will still complain that your browser is unsupported. Install the User Agent Switcher add-on, then in the tools menu, under User Agent Switcher, click on Edit user agents and then on New user agent. Type Firefox 3.5 as description and in the user agent replace Firefox/5.0 by Firefox/3.5 and in the app version 5.0 by 3.5. Now go to taxonweb.be, and then go to the Tools menu and change your user agent to Firefox 3.5. Now you should be able to identify yourself with our eID card. After using the tax-on-web site, do not forget to set your user agent back to the default user agent.

Health insurance CM with eID

The health insurance organisation CM also offers the possibility to log in to its website by the eID. To make it work, you use the same procedure as above, with one difference: the security.ssl.renego_unrestricted_hosts setting should also contain online.cm.be now. You can add multiple hosts by separating them by a comma, so you can set it to ccff02.minfin.fgov.be,online.cm.be

Two weeks ago I wrote about my struggle with HP’s customer service. To summarize: HP was unable to replace a failed 160 GB SSD because it was not in stock and was unable to provide me any other alternative even after one month. In the end, a 250 GB SSD was promised, but it also was not delivered.

  • Friday morning, 1 July, I call back HP’s support service. It seems that they still need approval from the Customer Relations Team (CRT) Belgium to send me a 250 GB SSD instead of the 160 GB one but were unable to get an answer from CRT. The guy on the phone trieq different times to call CRT while I am waiting, but the call is always dropped after one minute. In the end, he can not do anything more than contact CRT by e-mail.
  • Friday afternoon, I receive an e-mail from CRT. CRT Belgium & Luxembourg seems to be located in Sofia (Bulgaria), but they are answering me in Dutch. They approve the replacement of the 160 GB SSD by a 250 GB one and apologize for the long delay. Finally I start having some hope than things will be fixed now.
  • Friday evening, I take a look at the case log on HP’s support site. I feel big consternation when reading that a few hours afters CRT approved replacement by a 250 GB SSD, it appears that the 250 GB SSD is also unavailable! The case log mentions that I was informed about the delay but I had not had any contact with HP anymore after Friday morning, so the only way I discover the new delay, is by logging into HP’s site and reading the case log.
  • Monday morning, 4 July, I reply to CRT and to the support case that I do not accept the new delay and I demand an immediate solution. I receive a message in which they apologize for the delays and they inform me that people of superior departments are looking for a solution “with appropriate priority”. I also receive a message from our HP distributor asking whether this problem is still pending. I confirm them on Wednesday 6 July that this is still the case and they will transfer my complaint to HP Belgium.
  • I finally get a reaction from HP on Friday 8 July. They inform that they will send me a 500 GB 7200 RPM disk instead of the 160 GB SSD which is not deliverable. The disk will arrive on Monday 11 July. I answer them that I do not accept this as a final solution to the problem because a 7200 RPM disk is much slower and much more inexpensive than the 160 GB SSD this machine was bought with. In the afternoon I get the answer that the 500 GB 7200 RPM disk will be sent as a temporary replacement then, and that a 160 GB SSD will be ordered too and sent as soon as available.
  • As of Sunday 10 July, I have no indication that the 500 GB disk has been sent, so I am quite skeptical that the disk will be there on 11 July. I also have my doubts if and when I will finally receive an SSD.

To be continued…

One month ago, on the 25 May, I contacted HP support because a HP EliteBook 8540p (NU486AV) notebook had a broken 160 GB SSD disk (which is actually an Intel X25-M disk). The hard drive was not recognized anymore: both the BIOS and a Linux rescue cd could not find any connected hard drive. This machine was only a few months old and was bought with an 3 year HP eCare Pack for Next Business Day warranty support. Today, 30 June, HP still has not provided me any solution, even not a temporary one.

Here is a summary of what happened:

  • When calling HP’s customer service on the phone on 25 May, I was promised to receive a replacement SSD the next day. The helpdesk guy explicitly checked whether the disk was not out of stock, and apparently it was not. In the case log this is written as: “Part is NOT on CRT TOP shortage list , Part can be ordered”.
  • The next day I get a mail stating: “Your ordered part is delayed, the delivery date is not yet known.”
  • I do not hear anything from customer support for more than a week. On 6 June, I ask for a status update via HP’s support website. The same day someone calls me back to inform me that the SSD is out of stock. He only offers me a 80 or 120 GB SSD disk as an alternative, which I obviously do not agree with: I want a disk of at least 160 GB.
  • I hear nothing from customer support for almost 3 weeks. On 20 June I contact them back via the support site, demanding an immediate solution. In the case log this triggers this  cryptic message: “PSL Status requested by email to PS”. I do not get any reply.
  • Later that same week I ask my local HP distributor whether they can do something to trigger a solution. I do not get any reply.
  • On 28 June I let HP customer support know via the support site that I am unhappy with their lack of initiative to provide a solution. I do not get any reply.
  • On 29 June I try the HP support chat function. Before entering the chat I have to select my country from a list and provide the serial number of the machine. The chat support guy first asks some details about my identity and of the machine. He excuses for the fact that I had to wait for more than a month for a solution and starts to look at the case log. After looking at the case log he suddenly says that HP chat support is only available for the UK and Ireland. Now why do they even let me enter the chat after I chose Belgium as a country then and why did he ask all details about the case?
  • I call customer support again by phone. While waiting on the phone, a recorded message recommends to try HP’s chat support!
  • The guy on the phone proposes to send me a 250 GB SSD disk. He still needs confirmation from the technical service whether this model is compatible with that laptop. If I did not get any message the same day that would mean that all was OK and then I would have the new SSD the next day.
  • The case log shows these entries after I called:
    Sub-case comment added: Jun 29, 2011 1:15:27 PM
    again no answer at CRT
    consult PS
    Sub-case comment added: Jun 29, 2011 11:19:31 AM
    tried to call CRT belgium >> no answer
    try again in 1 hour after lunch
    Sub-case comment added: Jun 29, 2011 10:50:52 AM
    Cu called back
    595756-001 250GB solid-state drive (SSD) – SATA interface, 2.5-inch form factor
    This part IS supported for this notebook
  • On 30 June, I still do not have a new SSD and nobody contacted me. It seems the whole case is stuck and forgotten again and I will have to call back once again to get the whole process unstuck.

So for the second time, things seem to be stuck at “waiting for PS”. I do not know who or what is this “PS”, but it is clear to me that it is not doing its job.

I can only conclude that HP’s customer support is just worthless. Cases are not followed up and the customer is never informed about the status. HP’s customer service takes no initiative to propose an alternative solution. Instead the customer repeatedly has to take the initiative to make any progress. And even then when customer support is reminded of the problem, they do not do anything to prevent it from getting stuck again. Chat support is totally useless even though it is recommended by HP.

The last few months, I have had contact with HP support 3 times for other small problems. Things were all fixed in a reasonable matter, although it always took more than 1 business day to get a replacement, which is a pity.  However, what is happening now is simply unacceptable.

Almost all systems I bought the last few years were HP systems. I will definitely re-evaluate this, because reasonable customer support is simply essential with systems used in production in business. I am very unhappy and dissatisfied with HP support so I will consider alternatives in the future.

What are other people’s experiences with HP customer support in Belgium/The Netherlands?