Troels' blog

Saturday, October 07, 2023

(Un)safe English locales

I recently spent a frustrating amount of time troubleshooting a Java application which had stopped working properly after an upgrade: Certain features of the product were broken due to the application running on a Linux server which had been configured with the "en_DK" locale.

"en_DK" had been chosen expecting to have system with messages in English, but with currency symbols etc. suitable for Denmark. This makes sense, because it's not uncommon for IT systems to provide more precise (error) messages in English, compared to a small language like Danish.

Unfortunately, there is not universal agreement about locales. Even within Linux distributions, there is not 100% consistency.

The GNU C Library (Glibc) seems to have the longst list of recognized locales, including 19 English ones. Java's list of supported locales is somewhat shorter and includes only 11 English locales. Interestingly, Java has an English locale for Malta (en_MT) which glibc does not have.

I haven't been able to find MacOS's list of locales, but forum posts suggest it has only 6 English locales: en_AU, en_CA, en_GB, en_IE, en_NZ, en_US.

Ignoring Mac for a moment, these are unsafe English locales, i.e. not supported by both glibc and Java:

Locale	Country
en_AG	Antigua and Barbuda
en_BW	Botswana
en_DK	Denmark
en_HK	Hong Kong
en_IL	Israel
en_MT	Malta
en_NG	Nigeria
en_SC	Seychelles
en_ZM	Zambia
en_ZW	Zimbabwe

On the other hand, the following locales should be safe:

Locale	Country	Safe even om Mac
en_AU	Australia	🍎
en_CA	Canada	🍎
en_GB	Great Britain	🍎
en_IE	Ireland	🍎
en_IN	India
en_NZ	New Zealand	🍎
en_PH	Philippines
en_SG	Singapore
en_US	USA	🍎
en_ZA	South Africa

Looking beyond unix/POSIX-like systems: Windows recognizes more than 100 English locale identifiers. Windows' list does invalidate the above safe-list.

For Danes, I suggest using one of the following locales:

da_DK (and sometimes accept poor error messages)
en_IE, as the Irish are sane enough to use a 24-hour date format, etc
C, which is the fall-back "POSIX system default" locale

Tuesday, July 11, 2023

Unbootable RHEL 9 when using BP-028

When installing Red Hat Enterprise Linux 9, you may choose to apply a security profile, such as ANSSI-BP-028 High.

I've recently seen two VMware-virtualized RHEL 9.2 servers not being able to boot properly when installed with the ANSSI-BP-028 High profile. Instead, they booted into emergency mode.

The way to fix it:

Add file /etc/modules-load.d/for_uefi.conf containing a single line:

vfat

Then run the following two commands:

dracut -f /boot/initramfs-$(uname -r).img $(uname -r)
reboot

Friday, March 21, 2014

parallel_ntp_scan

NTP-based DDoS attacks are fashionable, currently.

I've coded a little application which quickly scans a network for NTP servers. For those found, it rates them according to their susceptibility to being implicated in an amplification DDoS attack.

Saturday, October 01, 2011

What a dying SFP looks like

Fibre channel (FC) storage is handy, and generally very reliable, in my experience. I certainly do not miss the days of messing around with disks in a server-room. And I like the fact that RAIDs may be cut up into slices (LUNs) which may be shared by many servers, resulting in very efficient use of the disks (if so wanted).

One part about FC that I dislike (in addition to the price tags): SFPs. Why on earth are transceivers not an integral part of a Fibre Channel switch? Having the transceivers be separate units means more electrical contact points, and a potential support mess (it's not hard to imagine a situation where the support contract of an SFP has run out, while the switch itself is still covered).

Anyway: Today, I experienced an defunct SFP, for the first time. The following observations may give a hint of how to discover that an SFP is starting to malfunction. The setup is an IBM DS4800 storage system where port 2 on controller B is connected to port 0 on an IBM TotalStorage SAN32B FC switch (which is an IBM-branded Brocade 5100 switch).

Friday morning at 07:49, in syslog: A few messages like this from the FC switch:

raslogd: 2011/09/30-07:49:07, [SNMP-1008], 2113, WWN 10:... | FID 128, INFO, IBM_2005_B32_B,  The last device change happened at : Fri Sep 30 07:49:01 2011

At the same time the storage system started complaining about "Drive not on preferred path due to ADT/RDAC failover", meaning that at least one server had started using a non-optimal path, most likely due to I/O timeouts on the preferred path. And a first spike in the bad_os count occurred for the FC switch port:

bad_os is a counter which exists in Brocade switches, and possibly others. Brocade describes it as the number of invalid ordered sets received.

At 10:55, in syslog:

raslogd: 2011/09/30-10:55:02, [FW-1424], 2118, WWN 10:... | FID 128, WARNING, IBM_2005_B32_B, Switch status changed from HEALTHY to MARGINAL

At the same time, there was a slightly larger spike in the bad_os graph.
Coinciding: The storage system sent a mail warning about "Data rate negotiation failed" for the port.

At 17:00: The count for bit-traffic flat-lined (graph not shown). I.e.: All traffic had ceased.

At no point did the graphs for C3 discards, encoding errors or CRC errors show any spikes.

The next morning, the involved optical cable was switched; that didn't help. Inserting another SFP helped, leading to the conclusion that the old SFP had started to malfunction.

Morale: Make sure to not just keep spare cables around. A spare SFP should also be kept in stock.
And monitor your systems: A centralized and regularly inspected syslog is invaluable. Generating graphs for key counters is also mandatory for mature systems operation; one way to collect and display key counts for Brocade switches is to use Munin and a Munin plugin which I wrote.

PS: Brocade documentation states that SFP problems might result in the combination of rises in CRC errors and encoding/disparity errors. This did not happen in this situation.

Wednesday, June 23, 2010

Separation or co-location of database and application

In a classical three-tier architecture (database, application-server, client), a choice will always have to be made: Should the database and the application-server reside on separate servers, or co-located on a shared server?

Often, I see recommendations for separation, with vague claims about performance improvements. But I assert that the main argument for separation is bureaucratic (license-related), and that separation may well hurt performance. Sure, if you separate the application and the database on separate servers, you gain an easy scale-out effect, but you also end up with a less efficient communication path.

If a database and an application is running on the same operating system instance, you may use very efficient communication channels. With DB2, for example, you may use shared-memory based inter-process communication (IPC) when the application and the database are co-located. If they are on separate servers, TCP must be used. TCP is a nice protocol, offering reliable delivery, congestion control, etc, but it also entails several protocol layers, each contributing overhead.

But let's try to quantify the difference, focusing as closely on the query round-trips as possible.

I wrote a little Java program, LatencyTester, which I used to measure differences between a number of database<>application setups. Java was chosen because it's a compiled, statically typed language; that way, the test-program should have as little internal execution overhead as possible. This can be important: I have sometimes written database benchmark programs in Python (which is a much nicer programming experience), but as Python can be rather inefficient, I ended up benchmarking Python, instead of the database.

The program connects to a DB2 database in one of two ways:

If you specify a servername and a username, it will communicate using "DRDA" over TCP
If you leave out servername and username it will use a local, shared memory based channel.

After connecting to the database, the program issues 10000 queries which shouldn't result in I/Os, because no data in the database system is referenced. The timer starts after connection setup, just before the first query; it stops immediately after the last query.

The application issues statements like VALUES(...) where ... is a value set by the application. Note the lack of a SELECT and a FROM in the statement. When invoking the program, you must choose between short or long statements. If you choose short, statements like this will be issued:

VALUES(? + 1)

where ? is a randomly chosen host variable.
If you choose long, statements like

VALUES('lksjflkvjw...pvoiwepwvk')

will be issued, where lksjflkvjw...pvoiwepwvk is a 2000-character pseudo-randomly composed string.
In other words: The program may run in a mode where very short or very long units are sent back and forth between the application and the database. The short queries are effectively measuring latency, while the long queries may be viewed as measuring throughput.

I used the program to benchmark four different setups:

Application and database on the same server, using a local connection
Application and database on the same server, using a TCP connection
Application on a virtual server hosted by the same server as the database, using TCP
Application and database on different physical servers, but on the same local area network (LAN), using TCP

The results are displayed in the following graph:

Click on figure to enlarge; opens in new window

Clearly, the the highest number of queries per second is seen when the database and the application are co-located. This is especially true for the short queries: Here, more than four times as many queries may be executed per second when co-locating on the same server, compared to separating over a LAN. When using TCP on the same server, short-query round-trips run at around half the speed.

The results need to be put in perspective, though: While there are clear differences, the absolute numbers may not matter much in the Real World. The average query-time for short local queries were 0.1ms, compared to 0.8ms for short queries over the LAN. Let's assume that we are dealing with a web application where each page-view results in ten short queries. In this case, the round-trip overhead for a location connection is 10x0.1ms=1ms, whereas round-trip overhead for the LAN-connected setup is 10x0.8ms=8ms. Other factors (like query disk-I/O, and browser-to-server roundtrips) will most likely dominate, and the user will hardly notice a difference.

Even though queries over a LAN will normally not be noticeably slower, LANs may sometimes exhibit congestion. And all else being equal, the more servers and the more equipment being involved, the more things can go wrong.

Having established that application/database server-separation will not improve query performance (neither latency, nor throughput), what other factors are involved in the decision between co-location and separation?

Software licensing terms may make it cheaper to put the database on its own, dedicated hardware: The less CPUs beneath on the database system, the less licensing costs. The same goes for the application server: If it is priced per CPU, it may be very expensive to pay for CPUs which are primarily used for other parts of the solution.
Organizational aspects may dictate that the the DBA and the application server administrator each have their "own boxes". Or conversely, the organization may put an effort into operating as few servers as possible to keep administration work down.
The optimal operating system for the database may not be the optimal operating system for the application server.
The database server may need to run on hardware with special, high-performance storage system attachments. - While the application server (which probably doesn't perform much disk I/O) may be better off running in a virtual server, taking advantage of the flexible administration advantages of virtualization.
Buying one very powerful piece of server hardware is sometimes more expensive than buying two servers which add up to the same horsepower. But it may also be the other way around, especially if cooling, electricity, service agreements, and rack space is taken into account.
Handling authentication and group memberships may be easier when there is only one server. E.g., DB2 and PostgreSQL allows the operating system to handle authentication if an application connects locally, meaning that no authentication configuration needs to be set up in the application. (Don't you just hate it when passwords sneak into the version control system?)
A mis-behaving (e.g. memory leaking) application may disturb the database if the two are running on the same system. Operating systems generally provide mechanisms for resource-constraining processes, but that can often be tricky to setup.

Summarized:

Pro separation	Con separation
Misbehaving application will not be able to disturb the database as much.	Slightly higher latency.
May provide for more tailored installations (the database gets its perfect environment, and so does the application).	Less predictable connections on shared networks.
If the application server is split into two servers in combination with a load balancing system, each application server may be patched individually without affecting the database server, and with little of no visible down-time for the users.	More hardware/software/systems to maintain, monitor, backup, and document.
May save large amounts of money if the database and/or the application is CPU-licensed.	Potentially more base software licensing fees (operating systems, supporting software).
Potentially cheaper to buy two modest servers than to buy one top-of-the-market server.	Potentially more expensive to buy two servers if cooling and electricity is taken into account.
Allows for advanced scale-out/application-clustering scenarios.	Prevents easy packaging of a complete database+application product, such as a turn-key solution in a single VMWare or Amazon EC2 image.

Am I overlooking something?
Is the situation markedly different with other DBMSes?

Monday, December 28, 2009

PostgreSQL innovation: Exclusion constraints

It seems like the next generation of PostgreSQL will have a new, rather innovative feature: Exclusion constraints. The feature is explained in a video from a presentation at a recent San Francisco PostgreSQL Users' Group; the presentation couples the new feature with the time period data type.

In a perfect World where databases implement the full ISO SQL standard (including non-core features), exclusion constraints could be nicely expressed as SQL assertions, but the perfect World hasn't happened yet. And I think that I know the reason for this: SQL assertions seem like a strong cocktail of NP-hard problems - they are probably very hard to implement in an efficient way.

In our less than perfect World, it's nice that PostgreSQL will soon offer a way to specify exclusion constraints other than the UNIQUE constraint.

The presenter, Jeff Davis, has an interesting blog, by the way. An on the subject of video, Vimeo has a little collection of PostgreSQL video clips that looks interesting.

Sunday, October 04, 2009

SQL comparison update: PostgreSQL 8.4

I finally got around to adjusting my SQL comparison page, so that the improvements in PostgreSQL 8.4 are taken into account. PostgreSQL is now standards-compliant in the sections about limiting result sets; actually, PostgreSQL is the first DBMS that I know of which supports SQL:2008's OFFSET + FETCH FIRST construct for pagination. PostgreSQL's new support for window functions is also very nice.

My page doesn't cover common table expressions (CTEs) yet, but it's certainly nice to see more and more DBMSes (including PostgreSQL, since version 8.4) supporting them. Even non-recursive CTEs are important, because they can really clean up SQL queries and make them more readable.

SQL comparison updates: Oracle 11R2, diagnostic logs

I finally found some time to update my page which compares SQL implementations. I performed a general update regarding Oracle, now that Oracle 11R2 has been released. And I added a new (incomplete) section which describes where the different DBMSes store diagnostic logs.

I turned out that very little has changed in Oracle since generation 10. The only remarkable new SQL feature in Oracle 11 is support for CTEs and proper CTE-based recursive SQL (introduced in version 11, release 2) -- but as I don't cover this topic on my page, updating the Oracle items was mostly a question of updating documentation links. Oracle still doesn't support the standard by having a CHARACTER_LENGTH and a SUBSTRING function, for example. This is simple, low-hanging fruit, Oracle(!) Sigh. It seems like Oracle's market position has made them (arrogantly) ignore the SQL standard.