Frederik's Blog

Random thoughts of a Linux sysadmin

The Artificial Intelligence lab has opened its new website a few days ago. The website is based on Drupal and a PostgreSQL database. Unfortunately, some modules (such as the biblio module) have some bugs when being used with a PostgreSQL datable instead of MySQL, but I hope that the last bugs will be fixed in the near future.

Recently I installed a server with a Supermicro SMC2108 RAID adapter, which is actually a LSI MegaRAID SAS 9260. LSI created a command line utility called MegaCLI for Linux to manage this adapter. You can download it from their support pages. The downloaded archive contains an RPM file. I installed mc and rpm on Debian with apt-get, and then extracted the MegaCli64 binary (for x86_64) to /usr/local/sbin, and the libsysfs.so.2.0.2 from the Lib_utils RPM to /opt/lsi/3rdpartylibs/x86_64/ (that’s the location where MegaCli64 looks for this library).

Here are some useful commands:

View information about the RAID adapter

For checking the firmware version, battery back-up unit presence, installed cache memory and the capabilities of the adapter:

# MegaCli64 -AdpAllInfo -aAll

View information about the battery backup-up unit state

# MegaCli64 -AdpBbuCmd -aAll

View information about virtual disks

Useful for checking RAID level, stripe size, cache policy and RAID state:

# MegaCli64 -LDInfo -Lall -aALL

View information about physical drives

# MegaCli64 -PDList -aALL

Patrol read

Patrol read is a feature which tries to discover disk error before it is too late and data is lost. By default it is done automatically (with a delay of 168 hours between different patrol reads) and will take up to 30% of IO resources.

To see information about the patrol read state and the delay between patrol read runs:
# MegaCli64 -AdpPR -Info -aALL

To find out the current patrol read rate, execute
# MegaCli64 -AdpGetProp PatrolReadRate -aALL

To reduce patrol read resource usage to 2% in order to minimize the performance impact:
# MegaCli64 -AdpSetProp PatrolReadRate 2 -aALL

To disable automatic patrol read:
# MegaCli64 -AdpPR -Dsbl -aALL

To start a manual patrol read scan:
# MegaCli64 -AdpPR -Start -aALL

To stop a patrol read scan:
# MegaCli64 -AdpPR -Stop -aALL

You could use the above commands to run patrol read in off-peak times.

Migrate from one RAID level to another

In this example, I migrate the virtual disk 0 from RAID level 6 to RAID 5, so that the disk space of one additional disk becomes available. The second command is used to make Linux detect the new size of the RAID disk.

# /usr/local/sbin/MegaCli64 -LDRecon -Start -r5 -L0 -a0
# echo 1 > /sys/block/sda/device/rescan

Create a new RAID 5 virtual disk from a set of new hard drives

First we need to now the enclosure and slot number of the hard drives we want to use for the new RAID disk. You can find them out by the first command. Then I add a virtual disk using RAID level 5, followed by the list of drives I want to use, specified by enclosure:slot syntax.

# MegaCli64 -PDList -aALL | egrep 'Adapter|Enclosure|Slot|Inquiry'
# MegaCli64 -CfgLdAdd -r5'[252:5,252:6,252:7]' -a0

View reconstruction progress

When reconstructing a RAID array, you can check its progress with this command.
# MegaCli64 -LDRecon ShowProg L0 -a0

(replace L0 by L1 for the second virtual disk, and so on)

Configure write-cache to be disabled when battery is broken

# MegaCli64 -LDSetProp NoCachedBadBBU -LALL -aALL

Change physical disk cache policy

If your system is not connected to a UPS, you should disable the physical disk cache in order to prevent data loss.

# MegaCli -LDGetProp -DskCache -LAll -aALL

To enable it (only do this if you have a UPS and redundant power supplies):

# MegaCli -LDGetProp -DskCache -LAll -aALL

More information

http://ftzdomino.blogspot.com/2009/03/some-useful-megacli-commands.html
https://twiki.cern.ch/twiki/bin/view/FIOgroup/DiskRefPerc
http://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS
http://kb.lsi.com/KnowledgebaseArticle16516.aspx

Today I was getting this error when installing a new kernel on a server running Debian:

/usr/sbin/grub-probe: error: Couldn't find PV pv2. Check your device.map.

The error can be reproduce by running the update-grub command.

The day before, a new RAID disk was added to this server, so I suspected this could be the cause. The file /boot/grub/device.map contained a reference to the first RAID disk as (hd0) but did not contain a reference to the new RAID disk. I ran

# ls -l /dev/disk/by-id/

to find out which SCSI ID referred to sdb (the new RAID disk), and then added the following line to device.map:


(hd1) /dev/disk/by-id/scsi-3600304800087c4f015fb4f2e4cc7a8e5

Now installing the new kernel works fine!

So, how did the almost two month lasting struggle with HP’s support end (see part 1, part2)?

On Wednesday evening, I received a mail that a 160 GB SSD was sent and we received it on Thursday morning. Also during the same week, we received a 500 GB 7200 RPM hard drive, which was meant to be a temporary replacement until the 160 GB SSD was available again.

So things are finally solved for good. I am just surprised that the 160 GB SSD suddenly became available so quickly now (it was pretty useless to send a 500 GB disk if the SSD would arrive only a few days later). Is this just coincidence or did the complaining convince HP to finally make a real effort to find a replacement quickly? We will probably never know.

In Belgium, we can fill out our tax form online on the Tax-on-web site using a smartcard reader and our electronic identity card. Unfortunately, things are rather complicated to set up, partly because the eID authentication is based on SSL renegotiation, a feature which is disabled by default in recent Firefox versions because it can be insecure. It is a bit disappointing that we have to rely on potentially vulnerable technologies to authenticate with our eID, but there is not much choice if you do not want to fill out the paper forms (or are too late, so that the electronic way is the only option).

First we need to make sure the smartcard reader works. I have a Dell Latitude E6400 laptop with a Broadcom smartcard reader which is supported by the ccid driver and required by the pcscd package in Debian. Note that the Broadcom 5880 as delivered by Dell in its Latitude laptops have a buggy firmware by default. You will need to update it by running some Windows tool. More information can be found on the ccid driver website or on the eID website. Note that also Windows is suffering from this problem, so even if you use Windows, you might need to install this update.

If you are using the traditional USB smartcard reader distributed by the government, which is an ACS ACR38, you will need the acr38u driver.

# apt-get install pcscd pcsc-tools libacr38u

To verify that the smartcard reader is working correctly, start up pcsc_scan and insert a smartcard (your eID or even a credit card is fine). Some diagnostic information about the card you inserted should appear automatically in your console. Press ctrl-C to exit pcsc_scan.

Now that the smartcard reader is working, we need to install the middleware and the Firefox plug-in:

# apt-get install beidgui beid-mozilla-plugin

Start up Firefox and open the menu Tools – Preferences. Click on the Advanced section and load the Encryption tab. Now click on Security Devices and click on the Load button. Enter a name (for example beid), and enter the path to the beid pkcs11 module. On Debian Wheezy it is: /usr/lib/libbeidpkcs11.so.3.5.2 . Be sure to check the filename, it might be different if you are using another version. If you cannot find it, try to run in a terminal:

# find / -name "*beidpkcs11*"

This command can also be used on Mac OS X, where the configuration procedure is actually similar to Linux.

To check whether the middleware is working correctly, you can load up beidgui and let it read your eID.

Now because tax-on-web uses SSL renegotiation, which is disabled by default in newer Firefox versions, we need to add an exception to Firefox’ configuration. Type about:config in the URL bar, confirm that you will be careful, and look for the setting security.ssl.renego_unrestricted_hosts. Double click on it, and enter the value ccff02.minfin.fgov.be

Now we need to make Firefox identify itself with version 3.5, otherwise the tax-on-web site will still complain that your browser is unsupported. Install the User Agent Switcher add-on, then in the tools menu, under User Agent Switcher, click on Edit user agents and then on New user agent. Type Firefox 3.5 as description and in the user agent replace Firefox/5.0 by Firefox/3.5 and in the app version 5.0 by 3.5. Now go to taxonweb.be, and then go to the Tools menu and change your user agent to Firefox 3.5. Now you should be able to identify yourself with our eID card. After using the tax-on-web site, do not forget to set your user agent back to the default user agent.

Health insurance CM with eID

The health insurance organisation CM also offers the possibility to log in to its website by the eID. To make it work, you use the same procedure as above, with one difference: the security.ssl.renego_unrestricted_hosts setting should also contain online.cm.be now. You can add multiple hosts by separating them by a comma, so you can set it to ccff02.minfin.fgov.be,online.cm.be

Two weeks ago I wrote about my struggle with HP’s customer service. To summarize: HP was unable to replace a failed 160 GB SSD because it was not in stock and was unable to provide me any other alternative even after one month. In the end, a 250 GB SSD was promised, but it also was not delivered.

  • Friday morning, 1 July, I call back HP’s support service. It seems that they still need approval from the Customer Relations Team (CRT) Belgium to send me a 250 GB SSD instead of the 160 GB one but were unable to get an answer from CRT. The guy on the phone trieq different times to call CRT while I am waiting, but the call is always dropped after one minute. In the end, he can not do anything more than contact CRT by e-mail.
  • Friday afternoon, I receive an e-mail from CRT. CRT Belgium & Luxembourg seems to be located in Sofia (Bulgaria), but they are answering me in Dutch. They approve the replacement of the 160 GB SSD by a 250 GB one and apologize for the long delay. Finally I start having some hope than things will be fixed now.
  • Friday evening, I take a look at the case log on HP’s support site. I feel big consternation when reading that a few hours afters CRT approved replacement by a 250 GB SSD, it appears that the 250 GB SSD is also unavailable! The case log mentions that I was informed about the delay but I had not had any contact with HP anymore after Friday morning, so the only way I discover the new delay, is by logging into HP’s site and reading the case log.
  • Monday morning, 4 July, I reply to CRT and to the support case that I do not accept the new delay and I demand an immediate solution. I receive a message in which they apologize for the delays and they inform me that people of superior departments are looking for a solution “with appropriate priority”. I also receive a message from our HP distributor asking whether this problem is still pending. I confirm them on Wednesday 6 July that this is still the case and they will transfer my complaint to HP Belgium.
  • I finally get a reaction from HP on Friday 8 July. They inform that they will send me a 500 GB 7200 RPM disk instead of the 160 GB SSD which is not deliverable. The disk will arrive on Monday 11 July. I answer them that I do not accept this as a final solution to the problem because a 7200 RPM disk is much slower and much more inexpensive than the 160 GB SSD this machine was bought with. In the afternoon I get the answer that the 500 GB 7200 RPM disk will be sent as a temporary replacement then, and that a 160 GB SSD will be ordered too and sent as soon as available.
  • As of Sunday 10 July, I have no indication that the 500 GB disk has been sent, so I am quite skeptical that the disk will be there on 11 July. I also have my doubts if and when I will finally receive an SSD.

To be continued…

One month ago, on the 25 May, I contacted HP support because a HP EliteBook 8540p (NU486AV) notebook had a broken 160 GB SSD disk (which is actually an Intel X25-M disk). The hard drive was not recognized anymore: both the BIOS and a Linux rescue cd could not find any connected hard drive. This machine was only a few months old and was bought with an 3 year HP eCare Pack for Next Business Day warranty support. Today, 30 June, HP still has not provided me any solution, even not a temporary one.

Here is a summary of what happened:

  • When calling HP’s customer service on the phone on 25 May, I was promised to receive a replacement SSD the next day. The helpdesk guy explicitly checked whether the disk was not out of stock, and apparently it was not. In the case log this is written as: “Part is NOT on CRT TOP shortage list , Part can be ordered”.
  • The next day I get a mail stating: “Your ordered part is delayed, the delivery date is not yet known.”
  • I do not hear anything from customer support for more than a week. On 6 June, I ask for a status update via HP’s support website. The same day someone calls me back to inform me that the SSD is out of stock. He only offers me a 80 or 120 GB SSD disk as an alternative, which I obviously do not agree with: I want a disk of at least 160 GB.
  • I hear nothing from customer support for almost 3 weeks. On 20 June I contact them back via the support site, demanding an immediate solution. In the case log this triggers this  cryptic message: “PSL Status requested by email to PS”. I do not get any reply.
  • Later that same week I ask my local HP distributor whether they can do something to trigger a solution. I do not get any reply.
  • On 28 June I let HP customer support know via the support site that I am unhappy with their lack of initiative to provide a solution. I do not get any reply.
  • On 29 June I try the HP support chat function. Before entering the chat I have to select my country from a list and provide the serial number of the machine. The chat support guy first asks some details about my identity and of the machine. He excuses for the fact that I had to wait for more than a month for a solution and starts to look at the case log. After looking at the case log he suddenly says that HP chat support is only available for the UK and Ireland. Now why do they even let me enter the chat after I chose Belgium as a country then and why did he ask all details about the case?
  • I call customer support again by phone. While waiting on the phone, a recorded message recommends to try HP’s chat support!
  • The guy on the phone proposes to send me a 250 GB SSD disk. He still needs confirmation from the technical service whether this model is compatible with that laptop. If I did not get any message the same day that would mean that all was OK and then I would have the new SSD the next day.
  • The case log shows these entries after I called:
    Sub-case comment added: Jun 29, 2011 1:15:27 PM
    again no answer at CRT
    consult PS
    Sub-case comment added: Jun 29, 2011 11:19:31 AM
    tried to call CRT belgium >> no answer
    try again in 1 hour after lunch
    Sub-case comment added: Jun 29, 2011 10:50:52 AM
    Cu called back
    595756-001 250GB solid-state drive (SSD) – SATA interface, 2.5-inch form factor
    This part IS supported for this notebook
  • On 30 June, I still do not have a new SSD and nobody contacted me. It seems the whole case is stuck and forgotten again and I will have to call back once again to get the whole process unstuck.

So for the second time, things seem to be stuck at “waiting for PS”. I do not know who or what is this “PS”, but it is clear to me that it is not doing its job.

I can only conclude that HP’s customer support is just worthless. Cases are not followed up and the customer is never informed about the status. HP’s customer service takes no initiative to propose an alternative solution. Instead the customer repeatedly has to take the initiative to make any progress. And even then when customer support is reminded of the problem, they do not do anything to prevent it from getting stuck again. Chat support is totally useless even though it is recommended by HP.

The last few months, I have had contact with HP support 3 times for other small problems. Things were all fixed in a reasonable matter, although it always took more than 1 business day to get a replacement, which is a pity.  However, what is happening now is simply unacceptable.

Almost all systems I bought the last few years were HP systems. I will definitely re-evaluate this, because reasonable customer support is simply essential with systems used in production in business. I am very unhappy and dissatisfied with HP support so I will consider alternatives in the future.

What are other people’s experiences with HP customer support in Belgium/The Netherlands?

Two years ago I wrote an article presenting some Linux performance improvements. These performance improvements are still valid, but it is time to talk about some new improvements available. As I am using Debian now, I will focus on that distribution, but you should be able to easily implement these things on other distributions too. Some of these improvements are best suited for desktop systems, other for server systems and some are useful for both. continue reading…

Some news about GNOME 3 and GNOME Shell:

  • The minimize and maximize window decoration buttons are now removed. It is estimated that these buttons are not useful actually, and users should be using Alt-Tab, the dock or different workspaces to switch between different applications, and maximize windows by double clicking on the title bar. As this will also make the desktop more difficult to access, I guess this also means that there are no plans to re-implement desktop icons.
  • The problem with the ellipsis of long application names has been fixed by enlarging the icons in the application browser.
  • On the #gnome-shell IRC channel there was a discussion earlier today about the implementation of shutdown in GNOME Shell. Several developers were in favour of just suspending to RAM by default and not showing a real shut down button by default. After 30 minutes, the system would wake up again and suspend to disk. Several developers did not seem to care about the risks of waking up a laptop while it’s being transported in a bag. Or about the fact suspend is not working properly on all systems.

I am extremely disappointed by these three things. When writing my previous GNOME Shell article, I still had some hopes that things would improve for the better, but I am giving up all hope: the GNOME Shell in GNOME 3.0 will definitely not be something I will like to use. I think it is also unacceptable that such important, drastic changes are made just before or even after the UI freeze. I have the feeling that GNOME Shell is purely the work of a few developers and designers who made some radical changes without any feedback or testing by real end users. The user community seems to be completely forgotten in the GNOME 3.0 development process. As only a few distributions are shipping live CDs, which are often rather unstable and rarely have a completely up to date GNOME Shell, only a very small amount of users is actually able to test and give feedback.

What will I do now? Skip GNOME 3.0 and hope that GNOME 3.2 will be better, once developers have taken into account users reactions? But that means that I will not benefit for more than another 6 months of any improvements to many of my preferred applications. Or use GNOME 3.0 with the old GNOME Panels (but will that give back my desktop icons)? Or shall I finally switch to KDE? Time will tell.

Update: the changes I described here can be seen in screenshots on Webupd8.

Now that I am on the subject of improving performance, I configured some performance improvements for a Mediawiki installation here:

  • Make sure you run the latest Mediawiki version. Mediawiki 1.16 introduced a new localisation caching system which is supposed to improve performance, so you definitely want this to get the best performance.
  • Create a directory where Mediawiki can store the localisation cache (make sure it is writable by your web server). By preference store it on a tmpfs (at least if you are sure it will be big enough to store the cache), and configure it in LocalSettings.php:
    $wgCacheDirectory = "/tmp/mediawiki";
    Iif /tmp is on a tmpfs, you might add creation of this directory with the right permissions to /etc/rc.local, so that it still exists after a reboot.
  • Enable file caching in Mediawiki’s LocalSettings.php:
    $wgFileCacheDirectory = "{$wgCacheDirectory}/html";
    $wgUseFileCache = true;
    $wgShowIPinHeader = false;
    $wgUseGzip = true;
  • Make sure you have installed some PHP accelerator for caching. I have APC installed and configured it in Mediawiki’s LocalSettings.php:
    $wgMainCacheType = CACHE_ACCEL;

Here is a benchmark before implementing the above configuration (with CACHE_NONE, but APC still installed):

$ ab -kt 30 http://site/wiki/index.php/Page
This is ApacheBench, Version 2.3 < $Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking site (be patient)
Finished 255 requests

Server Software: Apache/2.2.16
Server Hostname: site
Server Port: 80

Document Path: /wiki/index.php/Page
Document Length: 12750 bytes

Concurrency Level: 1
Time taken for tests: 30.084 seconds
Complete requests: 255
Failed requests: 0
Write errors: 0
Keep-Alive requests: 0
Total transferred: 3344070 bytes
HTML transferred: 3251250 bytes
Requests per second: 8.48 [#/sec] (mean)
Time per request: 117.978 [ms] (mean)
Time per request: 117.978 [ms] (mean, across all concurrent requests)
Transfer rate: 108.55 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 3 6 2.8 7 21
Processing: 88 112 11.1 112 163
Waiting: 66 90 9.1 89 125
Total: 95 118 11.9 118 170

Percentage of the requests served within a certain time (ms)
50% 118
66% 122
75% 125
80% 127
90% 132
95% 138
98% 145
99% 156
100% 170 (longest request)

And here a benchmark after implementing the changes:

ab -kt 30 http://site/wiki/index.php/Page
This is ApacheBench, Version 2.3 < $Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking site (be patient)
Finished 649 requests

Server Software: Apache/2.2.16
Server Hostname: site
Server Port: 80

Document Path: /wiki/index.php/Page
Document Length: 12792 bytes

Concurrency Level: 1
Time taken for tests: 30.015 seconds
Complete requests: 649
Failed requests: 0
Write errors: 0
Keep-Alive requests: 0
Total transferred: 8538244 bytes
HTML transferred: 8302008 bytes
Requests per second: 21.62 [#/sec] (mean)
Time per request: 46.248 [ms] (mean)
Time per request: 46.248 [ms] (mean, across all concurrent requests)
Transfer rate: 277.80 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 3 9 3.7 8 29
Processing: 23 37 6.0 37 62
Waiting: 13 23 4.9 24 41
Total: 28 46 7.8 45 82

Percentage of the requests served within a certain time (ms)
50% 45
66% 47
75% 49
80% 50
90% 56
95% 62
98% 68
99% 73
100% 82 (longest request)

So Mediawiki can deal with more than 2,5 times as much requests now.

Some people use Apache’s mod_disk_cache to cache Mediawiki pages, but I prefer Mediawiki’s own caching system because it is more standard and does not require patching Mediawiki, even if it might not get as much benefit as a real proxy or mod_disk_cache.