Expanding the system volume in Windows Server 2003

You sometimes read about tasks that sound simple but if they go wrong you know that you have little chance of unpicking it. Resizing disk partitions is one such task. You trust to the commands and hope – debugging, let alone fixing, a broken partition table isn’t something most of us get a lot of experience of. This is the time when sys admins love virtual machines and the ease with which you can back them up and restore them.

Resizing a Windows Server 2003 system volume (usually the c-drive) is particularly awkward. I only look after two WS 2003 machines, both are virtual, both are due for replacement next year but, as sometimes happens, one of them needed more disk space. The issue is that you can’t do an on-line resize of the system volume on WS 2003. The recommended approach with a virtual WS 2003 machine is to:

  1. Shut down the VM.
  2. Use vmkfstools (I use ESXi 4.1 at present) or an equivalent to expand the appropriate virtual disk file.
  3. Shutdown another WS 2003 VM (called a “helper”) and add the expanded virtual disk to that guest.
  4. Boot the helper VM and mount the expanded disk as a data disk (i.e. not the system volume), which can then have its file system expanded to use the newly added space.

That would be fine for me except I don’t have a spare WS 2003 VM. The only other WS 2003 I have is the peer domain controller and shutting both down, even for a short time, isn’t an option unless the whole infrastructure is going down. The last time that happened was when we moved office premises and even then we considered keeping one up over a WAN link.

So what to do? In the end I settled for using a gparted live image which became my helper O/S. The process is listed in outline below. I first tested the process on a spare WS 2012 VM which was running in an ESXi 5.5 technology demonstration environment I’m building. WS 2012 allows on-line expansion of the system volume, but I figured using gparted would at least allow me to walk through the steps and gain more confidence in the process – which it did.

The process I used was:

  1. Shutdown the guest.
  2. Backup the guest at VM level.
  3. Connect the gparted iso to the guest’s virtual CD drive and set to connect at boot.
  4. Edit the guest’s settings to force it to boot into the BIOS.
  5. Boot the guest into the BIOS and set the virtual CD drive ahead of the normal boot device.
  6. Boot the guest – it comes up running gparted.
  7. gparted was simple to use…
    1. At boot time I selected my British/qwerty keyboard, British English as language and allowed gparted to boot into graphical mode.
    2. The interface shows the disks and the spare space. When selected, the disk to grow has a slider which you can use to expand to the full space available.
    3. Apply the changes.
    4. To exit gparted I had to right click the desktop and select from the shutdown menu. There seemed to be icons on the desktop but they didn’t respond to mouse clicks.
  8. When the guest has shutdown, modify its settings to boot into the BIOS again. Return the boot order to its original configuration, i.e. put the CD drive below the normal boot device.
  9. Disconnect the gparted iso from the virtal CD drive.
  10. Boot the guest up as normal.
  11. Windows Server 2003 recognised that the disk needed checking and ran chkdsk. chkdsk ran without reporting errors and then caused the guest to reboot. (WS 2012 didn’t need to do this).
  12. After reboot the guest started OK but reported a device change had occurred and a reboot was needed. I let that happen and the guest then came up cleanly.

As an aside: the ease with which you can backup and restore virtual machines at “virtual bare metal” level has transformed how I approach many tasks. I am, by nature, very cautious about taking risks with core or even peripheral parts of the infrastructure I look after. If it breaks I’m pretty well on my own and enough things fail without me helping them along. The ability to easily and very quickly backup and restore whole machines has allowed me to try out things I’d never tackle on a physical server – even if I had spare hardware, which I rarely every do. It’s a benefit of virtualisation that I suspect is largely overlooked when cost/benefit analyses are done.


A very restricted shell

I was recently asked to set up access for someone so that he could connect from the internet to a machine running on our company network. Securing the machine on the company network was easy enough but I needed to create a route through to it.
The approach I took was to build a CentOS 6 machine running sshd which was accessible via a NAT rule on our outside firewall to act as a gateway to our network. This meant that the computer to which the user wanted to connect wasn’t directly accessible via the internet which made me happier.

The user had to forward a local port to the required port on the computer he wanted to access on our network. In other words, conventional ssh tunnelling and port forwarding. As a further security restriction, on the gateway computer I used the local iptables firewall to restrict not just the inbound traffic, but the outbound as well, only opening the ports that were needed.

I also decided I wanted to restrict the shell that the user was given on the computer he connected to via ssh. The restricted bash option seemed reasonable (i.e. bash -r option) but even that allowed more than was needed. My solution was to write a very simple c program that acted as the most restricted shell I could think of, i.e. it understands one command only – “exit”.

File: smallsh.c

#include <stdio.h>
#include <string.h>

/* Very simple command shell. Only supports one command "exit". */
/* Output is inelegant if the user types more than 60 chars as command */

/* Version 1.0 14/10/2014 */

  const char cmdEXIT[5] = "exit\n";
  char userCmd[60];
  printf ("Type exit when done\n$ ");
  while (fgets(userCmd, sizeof userCmd, stdin)) {
    if ( ! strncmp( cmdEXIT, userCmd, 5 ) ) {
    } else {
      printf ("I only understand 'exit'\n$ ");

When this was compiled, I moved the result (smallsh) to /usr/local/bin. The program is simple, and if the user types more than 60 characters in a single command at the input prompt the output doesn’t look elegant, but I, and he, can live with that!

The suitable user account was created on the gateway machine with:

useradd -m -d /home/<username> -s /usr/local/bin/smallsh <username>

The sshd_config file then needed the following settings changed from the default and the sshd service needed restarting.

PermitTunnel yes
AllowUsers <usernname>
PermitRootLogin no

(OK, I’ll accept that “PermitRootLogin no” is not needed, but as far as I’m concerned it is).

The end of XP

Smugness isn’t an emotion I often feel about my work, but I’m feeling pretty smug today. I took on the job of getting rid of Windows XP a few months ago and I finished yesterday. Compared to a lot of companies, I’m sure I had an easy job but the 80/20 rule certainly applied. The 80% were easy to work out how to upgrade, the last 20% much less so. Everyone in the company is now using Windows 7, mostly 64bit with the odd 32bit installation here and there for historic reasons. (Personally, I’m using CentOS 6.4, 64bit – but that’s another story.)

Mostly, user’s desktops and laptops had been replaced sufficiently recently that they were already on Windows 7. Vista didn’t exist for us. Any machines bought in that short, unpleasant period were downgraded to XP. In those cases a Windows 7 installation was simple enough. Most Windows machines benefit from a fresh installation every so often anyway, although because no two machines were the same, each had to be installed and then updated individually. That’s a dreadful process, even when you start from the most recent service pack. When you add Office as well, it can be over 10 reboots to get the machine fully patched.

The stability and excellence of VMWare was the cause of some of XP machines I had to deal with. A number of physical machines running XP had been sucked into VMWare over the years as the hardware they ran on started to fall apart. Rather than scrapping the machines and moving the software and services they ran elsewhere, it was just so much easier to virtualise them. Of course, this means they would virtually (if you’ll excuse the pun) run for ever, occasionally moving from ESXi host to ESXi host as the hosts themselves got replaced.

I was told something very early on in my working life by a very diligent, hard working colleague. It is that if you put something in your in-tray and leave it for a while, a good number of tasks just go away without you ever having to look at them. As I got to know this particular person, I couldn’t imagine him ever ignoring something in his in tray, but what he said certainly turned out to be true about a number of the old XP machines that had been turned into virtual machines. When I challenged people to explain what they needed a particular machine for, often it turned out the case for keeping them was remarkably thin or in some cases forgotten completely.

I did have to build a new print server but I was able to get rid of a few ancient printers that it turned out no one had used for ages. People might complain about having to walk further to collect a printout but mostly people like to use the printer that prints best. The nearer, rather crusty printer that gives poorer quality printing tends just to get ignored. The “spare/broken” hardware room is quite full now. The print server build turned out to be quite straight forward because the number of desktop O/S variants is as low as it will ever be. This was definitely the time to learn how to build a print server.

As I got down the the last XP virtual machine I knew it was going to a challenge. It ran some software to monitor and manage a particular piece of hardware. The hardware is still a current model (something we had provided, not something we specified). The software was written in Visual Basic, probably version 6, but quite possibly earlier. It uses DAO and an Access version 3.5 database (yeah, really). I contacted the hardware vendor directly and they supplied a Windows 7/8, 64bit version of the software. It took 3 hours and 6 emails each way but we finally got it working. It’s still an awful bit of software but by the time I got to the end of the project I just wanted this stuff to run, pretty didn’t matter.

So what about conclusions? Well everyone will say the same thing – don’t leave this to the last minute. However, I’d say don’t fear it either. For us, it turned out to be a very good clearing out exercise. Very much more so than I imagined it would be. It made me dig into all the quiet corners of our infrastructure and understand just about everything we have running. As of today, the infrastructure is a lot better understood and documented than it has ever been. The challenge now is to keep it that way.

The effect of predicates in outer joins

Inner joins in SQL are well understood. You relate two tables on a given set of columns and where you get matching values on both tables the rows from both tables are joined. What columns you choose to return are up to you. The tables are of equal precedence in the join.

Outer joins are different in that one table is said to be dominant and every row from that table is returned from the join process. The significance of this rule is the subject of this posting. Where the other, subordinate, table has matching values on the joined columns then the rows are returned as they would be in an inner join. Where no matching values are found in the subordinate table, the rows in the dominant table are returned and the subordinate table column values are returned as null.

Now, I imagine most people familiar with SQL will be broadly familiar with all of that. To illustrate the point of this posting, I need a couple of simple tables, each with one row.

create table tb1
( colint int
, coldate date

create table tb2
( colchar char(1)
, colint int

insert into tb1 values (1,'01/01/2013');
insert into tb2 values ('a',1);

The above syntax will work for most database engines, I’ve tested with SQL Server 2012 and DB2 V9.7.

The data types are arbitrary. I’ve not added any referential constraints or indexes to keep the example simple. Feel free to add constraints if you wish, it makes no difference to the results.

Let’s start with a simple inner join:

select *
from tb1
inner join tb2
on tb1.colint = tb2.colint

As you would expect, the results are:

colint coldate colchar colint
1 2013-01-01 a 1

So far so good, so now let’s try as an outer join with tb1 as the dominant table:

select *
from tb1
left join tb2
on tb1.colint = tb2.colint

As we would expect, the results looks like the same as the inner join.

colint coldate colchar colint
1 2013-01-01 a 1

Now let’s apply a predicate in a WHERE clause:

select *
from tb1
left join tb2
on tb1.colint = tb2.colint
where tb1.coldate is null

This time we get no rows returned. This is important. The outer join returned a row from tb1 as it must to satisfy the rule that in an outer join all the rows from the dominant table are returned, but the WHERE clause is applied after the join has completed. Thus the one row that was returned by the join is filtered out by the WHERE clause.

Now for the more interesting case, where we apply the predicate in the join clause:

select *
from tb1
left join tb2
on tb1.colint = tb2.colint
and tb1.coldate is null

The result this time is:

colint coldate colchar colint
1 2013-01-01 NULL NULL

So what’s happening here? On the face of it we have created a row that doesn’t exist and one that contradicts the predicate.

The database engine has performed the outer join and it has filtered out the one row that was returned and crucially that row was filtered out as part of the join process because that’s where the predicate was applied – as part of the join.

So where has the row come from? It’s come from the definition of an outer join, that the all rows from the dominant table are returned. The join process must return every row from the dominant table. To avoid the contradiction that the join and the predicate applied within the join filtered out our one row, the row from the dominant table is returned without reference to the join or the join predicate. The subordinate table columns are set to null because this part of the processing has nothing to do with that table.

Of course what looks even stranger is that we have applied in the query a predicate that says tb1.coldate is null but we actually have a none null value for tb1.coldate in our result set.

Both queries are logically and syntactically valid. Both have valid occasions for use, but it’s a good idea to know which you want and what the differences are.

Referentially valid table listing from SQL Server

A regular task when working with test and development databases is loading test data into tables where referential constraints are present. Whilst you can often drop or suspend the constraints whilst populating tables it’s often just easier to work out a referentially valid order so that parent tables get populated before their children. This usually only applies to relatively small volumes of data because larger volumes are generally loaded via utilities which don’t go through the SQL processing engine and so ignore the referential constraints. In this case the constraints usually need to be rechecked to reactivate them before the tables can be used. The process also usually avoids logging the rows added so a backup is normally required as well.

For small volumes of data, populating tables with regular insert statements is generally easier but it does require tables to be populated in a referentially valid order.  This is something I do just about every day, copying fairly small amounts of test data around for the development team.  Years ago I wrote for DB2 a Perl script that extracts from the catalog a list of the tables in a given schema in a referentially valid order. What I didn’t have was an equivalent for SQL Server and a recent job I had needed such a list. The data model only consisted of about 120 tables but that was enough for me to write the following  function.

alter function riorder(@schema nvarchar(128))
returns @riorderTabs table
( RILevel     smallint not null
, tablename nvarchar(128) not null
-- This function returns a table of RILevel and tablename for the specified schema
-- RILevel starts at 0 for tables with no RI Parents, and increases by 1 for each
-- 'generation' of tables.
-- It is possible to have "Referential loops", i.e. tabA is a parent of tabB, tabB
-- is a parent of tabC, tabC is a parent of tabA. This is unresolvable
DECLARE @schemaTabs table (tbname nvarchar(128));
DECLARE @level smallint;
DECLARE @tabsAtCurrentLevel table (tbname nvarchar(128));

-- start by getting al list of all the tables in the schema
insert into @schemaTabs
from information_schema.tables
where TABLE_SCHEMA = @schema
and table_type = 'BASE TABLE';

-- @level tracks the current RI level, we start at 0 for tables with no RI parent
set @level = 0;

-- We iterate through the levels until either
-- there are no tables left in @schemaTabs
-- or we find no tables at the current level (an
-- unresolvable referential loop)

declare @candidateTab as nvarchar(128);
declare candidateTabs_cursor cursor forward_only read_only for
select tbname from @schemaTabs;

declare @parentTablesFound as smallint;
declare @tablesAtCurrentLevel smallint;

-- set this to a sensible value so we drop into the main loop
select @tablesAtCurrentLevel = count(*) from @schemaTabs;
while @tablesAtCurrentLevel > 0
   set @tablesAtCurrentLevel = 0;

   -- now we iterate for each table @schemaTabs
   -- and find which have no RI parent table in the @schemaTabs
   -- list of tables
   open candidateTabs_cursor;
   fetch next from candidateTabs_cursor
   into @candidateTab;
   while @@FETCH_STATUS = 0
      -- We are only interested in whether @candidateTab has parent tables
      -- in @schemaTabs, not what they are
       select @parentTablesFound = count(*)
         -- get the parent table
             and constr.UNIQUE_CONSTRAINT_NAME = prnt.CONSTRAINT_NAME
               -- get the child table
               on constr.CONSTRAINT_CATALOG = chld.CONSTRAINT_CATALOG
               and constr.CONSTRAINT_SCHEMA = chld.CONSTRAINT_SCHEMA
               and  constr.CONSTRAINT_NAME = chld.CONSTRAINT_NAME
                     inner join @schemaTabs scTabs
                         on scTabs.tbname = prnt.TABLE_NAME
           where chld.TABLE_NAME = @candidateTab
           and chld.CONSTRAINT_SCHEMA = @schema;

           if @parentTablesFound = 0
              -- no parent tables in @schemaTabs so we note this table name.
                  -- We don't remove it from @schemaTabs yet because if we found
                  -- RI children of this table later in this loop we would think
                  -- these were at the current RI level
                  insert into @tabsAtCurrentLevel
                  values (@candidateTab);
                  set @tablesAtCurrentLevel = @tablesAtCurrentLevel + 1;
           -- fetch next table
           fetch next from candidateTabs_cursor
       into @candidateTab;

   close candidateTabs_cursor;
   -- Update the output list of tables for the current level
   insert into @riorderTabs select @level, tbname from @tabsAtCurrentLevel;
   -- Now we can remove the tables we just found
   -- from @schemaTabs
   delete from @schemaTabs where tbname in (select tbname from @tabsAtCurrentLevel);
   -- and clear down @tabsAtCurrentLevel
   delete from @tabsAtCurrentLevel;
   set @level = @level + 1;


Using the function is simply a matter of passing it the schema of the tables you are interested in:

select * from riorder('<schema>') order by RILevel, tablename

Because the first column is the ‘level’ you can get a reversed order (which is the valid order for deleting data) by using the DESCENDING  word in an ORDER BY clause.

select * from riorder('<schema>') order by RILevel, tablename order by RILevel descending

This script can fail to provide a complete answer if your data model contains a “referential loop”, i.e. table_A is a parent of table_B, table_B is a parent of table_C and table_C is a parent of table_A. This can happen perfectly legitimately but it’s generally a source of pain for your hardworking DBA trying to set up test data. If the above script finds this situation it recognises the loop and stops trying to go any further.

DB2 V9.7 on RHEL 5 as a data source for a Linked Server in SQL Server 2012

(Aside – it’s been far too long since I last posted anything here. As I’m doing more R&D at the moment I’ll try to post a bit more).

I recently wanted to connect DB2 V9.7 running on RHEL 5 to SQL Server 2012 using SQL Server’s linked server technology. At one time I used Linked Servers to connect to a wide range of data sources, but I hadn’t used them for a while. Thinking the world would have moved on a bit, I looked around the Web to see what the current thinking was but largely drew a blank. The only postings I could find seemed to be from people who described connecting to DB2 as difficult but they didn’t elaborate.

I’d be the first to admit that the process is not straightforward, but it’s not made easier by a lot of what is posted on the Web, which is often misleading at best or just wrong. Most references detail connecting to DB2 running on iSeries or z/OS via DB2 Connect. For DB2 running on Linux or Windows you don’t use DB2 Connect, you connect directly to the DB2 instance so the process is a little different.

I will assume the reader is familiar with Linked Servers in general. If not, have a play. I would suggest you start with something simple, like SQL Server talking to SQL Server. This combination is simple to set up and is reliable. I will also assume you know enough about DB2 to be able to catalog a node and database on the Windows server.

To connect to DB2, there are three OLE DB providers that you can use without paying for third party solutions. These are:

The Microsoft OLEDB Provider for DB2.

The IBM OLEDB Provider for DB2.

The Microsoft OLEDB Provider for ODBC using an ODBC connection to DB2.

From painful experience, if you want to use a third party OLEDB provider, insist on getting a free trial before paying for it. I recall some were no better, and indeed were sometimes worse, than the free versions.

The Microsoft OLEDB Provider for DB2 (DB2OLEDB).

This is installed from the SQL Server Feature Pack. It’s not part of a normal SQL Server installation. Quite simply, I can not get this to work. The installation looks OK, but I can not work out a combination of parameters to get a connection from SQL Server to DB2. The examples I can find all look like they are trying to talk to DB2 Connect. Possibly that’s all this Provider can do?

The IBM OLEDB Provider for DB2 (IBMDADB2).

This appears to work. It does need some effort to set it up but it is probably your best option.

Begin by installing the DB2 Runtime Client (RTC) on the server running SQL Server. The Runtime Client is the minimum DB2 client and just includes the network libraries and the basic command line interface. If you’re looking for the download, at the moment, it seems to be called “IBM Data Server Runtime Client for InfoSphere Federation Server V9.7 Multiplatform Multilingual eAssembly”. I promise I’m not making that up, I couldn’t if I’d tried.

Once the RTC is installed, catalog the source DB2 node and database. As the RTC has no graphical interface, you’ll need to use the db2 command line processor. Make sure this is working before you go any further. Be sure you can connect to your database and can select some data back. Both the OLE DB provider and, if you use it, the ODBC connections use your local database directory. Of course what this means is that both the OLE DB provider and the ODBC driver are just thin layers on top of the DB2 connection layer. This is good news as the DB2 connection via the RTC is reliable.

Next, start up SQL Server Management Studio and you should see in the Linked Servers/Providers list IBMDADB2.<your SQL Server instance>.

This next bit is vital – right click the provider and open the properties list. Make sure the “Allow inprocess” is ticked. Not only will this not work if you don’t have that ticked, it will actually crash the SQL Server service when you try to use the provider.

(Just as an aside, if you have SQL Server in a clustered environment, this would be a really good way to test fail over. Finding relatively harmless ways of crashing a database instance to test fail over isn’t always easy.)

Now you create your Linked Server. This is very simple because all the connectivity details are encapsulated in the DB2 database catalog entry you created earlier. All you need to specify for the linked server is

  • The Provider (from the drop down list).
  • Your database name as the Data Source.
  • Some user credentials in the Linked Server Security tab.

To check all is well, click on the linked server, expand Catalog/default/Tables/System Tables and you should see the DB2 catalog tables for your chosen database. In case you’re not so familiar with the DB2 system catalog, there’s no Master database in DB2. Each database has its own set of catalog tables and is completely isolated from the other databases in the instance (think SQL Server 2012 Contained databases).

Queries to this form of Linked Server use a 4 part name in the same way that Linked Servers to SQL Server do. For example:

select * from <Linked Server>.<database>.SYSCAT.TABLES

Be careful with the object name case. Object names are case sensitive. I know in the native DB2 client object names are case insensitive, but through a linked server they are. If you want to avoid 4 part names, create a local view and access the table through that.

So this works for me. As always with linked servers, if you’re planning on using them a lot, test them to death. In particular test the full range of data types that you want to process. Data type translations in Linked Servers are notoriously unreliable, often caused by poor metadata from the OLE DB Provider.


The good news about this type of Linked Server is, when joining tables on the DB2 instance, you can see SQL Server pushing the join down to DB2. The easiest way to test this is to check what SQL statements are being executed in the DB2 instance. I’d recommend doing this on a quiet system so you can pick out your statement easily.

On the DB2 Instance, if the Statement Monitor is off, connect to your source database and start a Statement Monitor for your connection:

$ db2 update monitor switches using statement on

Get a snapshot baseline of the prepared statements and write the snapshot to a file.

$ db2 get snapshot for dynamic sql on <database> > snp.1

Run your query, and get a second snapshot of prepared statements written to a different file.

$ db2 get snapshot for dynamic sql on <database> > snp.2

Now you can compare the two snapshots (I tend to just use diff) to find the statements that were run. Ideally you’ll be quick enough that the second snapshot only differs from the first as a result of your statement.

From a simple join such as:

select *

from <LinkedServer>.<Database>.<Schema>.TB1 t1

inner join <LinkedServer>.<Database>.<Schema>.TB2 t2

on t1.col1 = t2.col2

where t1.col2 like ‘XXX%’

The snapshot comparison shows 3 statements have been executed. The first 2 are of this form:

select * from “<database>”.”<Schema>”.”TB1″ FETCH FIRST 1 ROWS ONLY

select * from “<database>”.”<Schema>”.”TB2″ FETCH FIRST 1 ROWS ONLY

The third statement is a transformed version of your statement. It looks as if SQL Server makes a simple query against each table in your query to return returning one row to get the result set metadata. Then it prepares and runs your actual query. The single row, initial queries should execute very quickly and only add a very small overhead.

If you are using Linked Servers I would very strongly recommend that you check what is being passed down to the source databases. Against database engines such as DB2, Oracle or mySQL it’s simple enough to do and the time spent might well be a great deal less than the time you spend waiting for poorly optimised statements to return results.

The Microsoft OLEDB Provider for ODBC using an ODBC connection to DB2.

ODBC was supposed to have been superseded so long ago that it seems bizarre to be writing about it, but for completeness here we go. This works in part. As with the IBM OLE DB Provider, you need to start by installing the DB2 Runtime Client and again you should catalog the remote node and database that you are interested in.

Next use the IBM DB2 ODBC Driver associated with your local DB2 client instance to create an ODBC System DSN. Use the Test Connection option to make sure you’re getting through to the DB2 instance.

The final stage is to create the linked server using OLE DB Provider for ODBC (MSDASQL). Here the only parameters you need to specify are:

  • The Provider (from the drop down list).
  • Your ODBC DSN as the Data Source.
  • Some user credentials in the Linked Server Security tab.

The only way I can get this to work is by using the openquery function, e.g.

select * from openquery(<ODBC DSN>,’select count(*) from SYSCAT.TABLES’)

Any attempt at using the 4 part name syntax fails with:

Msg 7313, Level 16, State 1, Line 1

An invalid schema or catalog was specified for the provider “MSDASQL” for linked server “<ODBC DSN>”.

Installing IBM Data Studio Version 3.1.1 with Firefox 10 and above on Linux

The recent decision by the developers of Firefox to increase the major version number at an extraordinary rate has, I’m sure, caused problems for lots of developers trying to test for compatible versions. At my employer I know the developers now simply test for old versions rather than new ones.

IBM seem to have fallen foul of the changes too. I’ve just tried to install IBM Data Studio Version 3.1.1 on my desktop which runs a fairly standard CentOS 5.8. CentOS 5.8 comes with Firefox 10.0.4 at present. This is reasonably up to date so I didn’t imagine there would be issues so long as I got past any RHEL vs. CentOS tests in the installer. In fact, the whole install process broke very early. After much digging through the nested installation scripts, I found that the installer knew I was running Firefox but the version test reads:

case "$*" in
*Firefox\ [1-9].*) return 0;;
*Firefox/[1-9].*) return 0;;
*Firefox*) return 1;;
*rv:1.[7-9]*) return 0;;
*rv:[2-9].*) return 0;;
*rv:*) return 1;;
Mozilla*\ 1.[7-9]*) return 0;;
Mozilla*\ [2-9].[0-9]*) return 0;;
*) return 1;;

The problem of course is that this works for Firefox Version 1 through to 9, but fails for Version 10. I didn’t believe that Version 1 is OK but Version 10 isn’t!

If you hit this issue, expand the installer zip file and look for a file called <installerdir>/launchpad/browser.sh and edit the above subroutine. I simply changed it to allow any Firefox version:

case "$*" in
*Firefox\ [1-9].*) return 0;;
*Firefox/[1-9].*) return 0;;
*Firefox*) return 0;; #### this line was changed
*rv:1.[7-9]*) return 0;;
*rv:[2-9].*) return 0;;
*rv:*) return 1;;
Mozilla*\ 1.[7-9]*) return 0;;
Mozilla*\ [2-9].[0-9]*) return 0;;
*) return 1;;

The installation then runs through successfully.

DB2 – Needing more support?

IBM’s recent fanfare to announce the release of DB2 10 and the more muted news that DB2 9.1 reaches End of Support on the 30th of April this year, reminded me that the last time I looked, there was no End of Support date for DB2 9.7. Well, now there is a date; it is September 30, 2014. That’s pretty disappointingly close. It means that the most widely used version of DB2 9, and the one we use throughout our company, will get just over 5 years of support.

Today, 5 years of support for a major piece of software really isn’t very long at all. Most people wait for (at least) the first fix pack before even trying a test installation. There are often ‘a’ fix packs for DB2 which contain fixes for major bugs (I suspect there are often caused by the latest fix pack). You are well advised to wait a couple of months after a fix pack has been released before you try it just to see if there are fixes for the fixes. That reduces the useful support life from around 5 years to more like 4. From past experience, if you read the hype and put a major release of DB2 live within the first year of its General Availability (GA) then you’ll probably just about be the first person to do so.

With DB2 Version 8.1, as you applied fix pack 8 you reached Version 8.2 automatically. To get the full support life for DB2 Version 8 you simply started with Version 8.1 and kept applying fix packs as and when you were able or felt you needed them (fix packs are cumulative so you don’t have to take them all). With DB2 Version 9 all that changed. Going from DB2 Version 9.1 to 9.5 and then to 9.7 is a much bigger process. Each is a fresh install of DB2. So in terms of upgrade effort you can think of DB2 Version 9.1 as Version 9, Version 9.5 as Version 10 and Version 9.7 as Version 11 and so on.

I doubt very many people actually used Versions 9.1, 9.5, 9.7 and 9.8 on the same system. Version 9.5 seemed to get a fairly cool reception as it came only 15 months after Version 9.1, but Version 9.7 seems to have had a good take up, at least according to a DB2 Night Show poll I listened to last year. Version 9.8 is of less interest to most people because it’s major enhancement (the pureScale feature) is fairly specific.

Someone installing DB2 Version 9.7 just 3 months ago gets less than 3 years support. That’s considerably less than most people keep server hardware. If the database server is virtualised then there’s no reason for the server to ever need upgrading for the sake of hardware. Operating system vendors have really taken this to heart. Red Hat (our operating system of choice for DB2) are now promising 10 years of production support for RHEL 5. That gives you over 6 years of support *after* the release of RHEL 6. If IBM were to do the same, that would mean IBM extending DB2 Version 9.7 support by at least another 3 years.

Unless you use DB2 Express-C, DB2 is an expensive bit of software. It is certainly the most expensive software we use. An upgrade cycle for DB2 for all our systems takes about 8 to 9 months (I’ve done it twice). The effort involved can only be justified when we are refreshing hardware and we can do full parallel testing. It is a huge effort for us that is planned up to a year in advance. In a virtualised environment where hardware refresh ceases to be an issue, I just can’t imagine potential customers choosing to use software with such a short life expectancy.

Loss of network interfaces after applying CentOS kernel 2.6.18-308.4.1 on ESXi 4.1 Update 2

One of our hosting company requires us to keep up to date with RHEL 5 kernels on a server they support. As this is a production machine it means that when a new kernel is available I start applying it to development machines and migrate the upgrade through to production over a period of 3-4 weeks.


The first stage of testing is to apply the kernel on various ESXi hosted VMs and a couple of real servers, most of which run CentOS 5. The most recent kernel 2.6.18-308.4.1 seems to work OK, but on all of the ESXi hosts there seems to be an issue with the virtual network interfaces. In all cases I use the VMXNET 3 adapter type. The hosts are running ESXi 4.1 Update 2 with a number of HP provided bundles.


After the application of the updates and the subsequent reboot, only the local loopback network interface starts up. I’ve seen various suggestions as to what may be the cause but I can’t comment on those. To fix the issue, my initial test was to re-install the VMWare Tools which seems to do the trick. Having got that far, on a different guest I tried just rerunning vmware-config-tools.pl. That seems to solve the problem immediately (just restart the network service). The fix appears to be persistent across a reboot.

Using APCUPSD in a small ESXi 4 environment

Like most small office environments, where I work depends on mains electricity without any form of battery/generator backup. For a number of years our supply seemed remarkably unreliable compared with my home 15 miles away. It eventually transpired that one of the three phases into the office was overloaded. After a rebalancing of the phases was carried out by the supply company things improved greatly. By that time though, pretty well device we had was run through a desk side UPS and we had become pretty adept at setting them up.

At the time all our virtual servers were running VMWare Server Version 1, which was probably the best bit of free software I’ll ever use. We ran the hypervisor on CentOS 5. That meant that we had a full Linux OS available to run such things as the UPS monitoring software. It also mean we had a full scripting environment to handle the shut down and start up of the virtual machines either side of a power failure.

All our UPS units are made by APC, so the obvious Linux service to run was APCUPSD. The UPS units were all attached via USB to their relevant hosts. We configured the UPS units so that as soon as a power failure was detected, APCUPSD gave a 3 minute grace period to see if the power would come back. In our experience, if the power failed for more than 30 seconds it might be off for hours so 3 minutes was more than enough to decide if a full shut down was needed.

When we recently moved from VMWare Server 1 to ESXi 4, UPS support was a real headache. With no host OS on which to install APCUPSD we had to sort out how to make a guest monitor the host’s UPS and then manage the shut down of the host and all of the associated guests. This rather felt like sitting on a branch and sawing through the wrong side because the final act of the guest has to be to shut down the host the guest is actually running on and then tell to UPS to shut down. The solution we came up with runs like this:

The UPS monitoring guest consists of:

  • CentOS 5, which gives us a fully featured Linux OS.
  • The vCLI so we can control the host (and thus the other hosted guests).
  • APCUPSD so we can communicate with and control the UPS.
  • A custom APCUPSD doshutdown script in etc/apcupsd/.
  • A custom script that finds and shuts down every guest on a specified ESXi host via vCLI commands.

The UPS is connected to the host via USB and the guest communicates with the UPS via a virtual USB connection. That’s all pretty straightforward, until we get a power failure. At which point we hope things go like this…

  • APCUPSD on the UPS monitoring guest gets notified of the UPS status change to On Battery.
  • After 3 minutes on battery power, the UPS monitoring guest runs the custom doshutdown script which
    • calls a custom local script that sends a shut down signal via the vCli to all of the other guests.
    • sleeps for 45 seconds to ensure all the guests are down. 45 seconds is enough for our guests, but you would need to test this in your own environment if you were trying to do the same.
  • That leaves us with just the UPS monitoring guest running on the host and the host itself.
  • We must now stop the APCUPSD service so that the USB kill power signal is sent. It took me days to realise you must stop the APCUPSD service else the APCUPSD killpower signal is ignored 😦
  • The UPS monitoring guest then runs the APCUPSD killpower command which tells the UPS to shut down, but (and this is the crucial part) the UPS gives you DSHUTD seconds of power before it does so. We set DSHUTD to 90 seconds.
  • The UPS monitoring guest tells its own host to shut down.
  • The host is configured to shut down any running guests, of which the UPS monitoring guest should now be the only one.
  • The UPS monitoring guest gets the shut down signal, which it obeys.
  • The host shuts down. All this has to happen in DSHUTD seconds, which in our case it does easily.
  • Finally, after DSHUTD seconds the UPS shuts down.

As each command runs in the custom doshutdown script, it writes what it’s doing to a log file in /tmp on the UPS monitoring guest. You can’t see this at the time the shut down is happening, but after the event and, in particular, for testing, it’s very good to be able to see that all went well (or not) and how much time each stage took.

Be very careful if, like us, you have multiple ESXi hosts that use the same scripts. The UPS monitoring guests send the shut down commands via the network so they are more than capable of shutting down the wrong host if you give them the wrong set of credentials. Keep the credentials in separate files from the scripts so you can propagate updated versions of the scripts to all interested guests without the risk of them all trying to shut down the same host!