Sebastian Kirsch: Blog

Sunday, 17 June 2007

Paging Agent Smith!

Filed under: — Sebastian Kirsch @ 16:00

If you are working in an environment where cell phones are used as pagers, you will be familiar with the problem: Everyone’s cell phone is going off all the time, and you cannot tell whether it was your own or someone else’s. This is made worse if everyone carries a company cell phone, since they are usually the same model and therefore all have the same SMS sound. Or if you can choose between several sounds, there is usually exactly one sound that is annoying enough to make sure you don’t miss a page – to wake you up during your first REM phase if needs be. (Not that I’m of much use when I’m paged during deep sleep, but that’s a different topic.) My personal phone (a Sony Ericsson K750i) does not have a single suitable ringtone.

I solved this problem by using Morse2MIDI, a web site that generates a simple MIDI file with the characters you type in in morse code. MIDI files can be played by most cell phones, and are ridiculously small in size (my current ringtone is 618 bytes.) Their generator allows strings up to 100 characters long. A space can be used to insert pauses; one space is ca. 1/3 of a second long.

So my cell phone now goes “bip-bip-bip beep-beep beep-bip-beep” three times in a row when I get paged. Which should be unique enough, since those are my initials – in morse code. And if I train myself to respond only to that sequence of beeps, I can safely ignore all other pagers in the area. It is pretty loud and annoying too, so I probably won’t miss it even if I’m in a different room.

The only remaining grudge is that the Sony Ericsson stops playing the ringtone after 10 seconds. Some older Nokias had a “pager mode” where a text message causes the phone to ring until you look at the message.

It’s the old lament – we now have cell phones that can surf the internet and play music, but you cannot use them anymore for useful stuff like making phone calls or receiving text messages …

Monday, 16 April 2007

How to mess with Python’s mind

Filed under: — Sebastian Kirsch @ 15:53
>>> x = True
>>> True = False
>>> False = x
>>> True
False
>>> False
True

This is legal code in Python 2.4.

Apparently, assignment to None was explicitly disallowed some versions back. Not so for True and False.

Thursday, 21 December 2006

The trouble with X11 authentication

Filed under: — Sebastian Kirsch @ 09:12

I used to have problems with the X11 server of Mac OS X 10.4 (X11.app): After a while, applications could no longer authenticate to the X server – programs that were running still worked, but I could not start any new X11 applications. If I restarted the X server, it worked for a while, then authentication would fail again.

The best explanation I found for this behaviour was this: The X server uses MIT magic cookies for authentication, and the cookies become invalid after the IP address changes. This is on a laptop, so I move between different networks all the time, and everytime I did, the X authentication would fail.

In previous versions of X11.app, there was a handy checkbox in the preferences to turn off authentication; unfortunately, this is gone in 10.4. The only way of turning off authentication is via the commandline:

$ defaults write com.apple.x11 no_auth 1
$ defaults write com.apple.x11 nolisten_tcp 1

The first command turns off authentication, the second one disables connections to the X server via TCP (programs can only use /tmp/.X11-unix/X0 unix domain socket to speak with the X server, which is faster and which they do anyway.) This limits the possibilities for mischief somewhat.

The downside of this is that you cannot use X via the network anymore – although you could get around this limitation by allowing TCP connections and firewalling the port off instead. And it is marginally unsafe, since any local program can connect to the X server, even if they were started by other users. But since this is practically a single user machine anyway, I do not care about the last part.

Tuesday, 12 December 2006

OLPC

Filed under: — Sebastian Kirsch @ 09:47

Yesterday, I got my hands on an OLPC:

This was one of the prototypes that are about to go into production. Unfortunately, it didn’t have a battery pack, so I could not see it in action, but just getting my hands on one was pretty awesome. It is very small and light, and does feel more like a toy, or like an appliance than like a real laptop. All of the electronics are housed in the display, and the bottom of the case has just the battery, the keyboard and the touchpad. The display can be twisted around to function as a kind of tablet computer – I would guess that this is intended mostly for reading. The two “ears” on either side of the screen are the wifi antennae.

They built in some nifty technical details. For example, there are a number ways to save energy. The network controller can do mesh routing with the CPU turned off, and the display can hold an image with the CPU turned off. There are no moving parts (main storage is flash), and the keyboard is dust- and water-resistant. The display has a monochrome mode and a color mode; the monochrome mode is about 200dpi (enough for comfortable reading), while the color mode as a lower resolution. I don’t know exactly what processor it has, but it’s equivalent to a 400-500MHz x86 CPU (ie. enough for most tasks.) You can find more info about the specs on the OLPC hardware wiki page.

And the most important part (as a friend pointed out): I can see some kid caring about this device. It’s cool. It looks good. It doesn’t feel fragile. I could definitely see someone giving this device to a six-year-old for school. (And he wouldn’t have to worry about being beaten up for it, like he would for having an iBook.) I could see myself buying one just because it’s cool. It’s probably the only laptop you can take to the beach and be reasonably certain that it will still work afterwards.

Sunday, 01 October 2006

Ausser Gravis nix gewesen

Filed under: — Sebastian Kirsch @ 00:14

Mein PowerBook, das bei Gravis zur Reparatur war, habe ich am Dienstag morgen wieder zurück bekommen. Laut Techniker wurde das Mainboard ausgetauscht, alles getestet (auch 1GB-RAM-Bausteine), und es ausserdem noch kostenfrei gereinigt. Vielen Dank insbesondere für letzteres, so schön sauber war das Display schon lange nicht mehr.

Nur einen kleinen Schönheitsfehler bemerkte ich, nachdem ich es zu Hause ausprobierte. Nein, zwei eigentlich: Der erste war, dass der Netzteil-Anschluss nicht mehr funktionierte. Mit Akku lief das Gerät, mit Netzteil nicht, und der Akku wurde auch nicht geladen. D’oh.

Der zweite war, dass mein 1GB-RAM-Riegel immer noch nicht funktionierte. Nachdem ich ihn eingebaut hatte, fuhr der Gerät nicht mehr hoch. Das “bling” beim Einschalten kam noch, danach blieb aber das Display schwarz, und die Festplatte fuhr nicht hoch. Double-D’oh.

Also bin ich mit meinem kaputtreparierten Notebook wieder über die Strasse zu Gravis gelaufen. Wenn ich jetzt nicht gerade gegenüber der Filiale wohnen würde, sondern, sagen wir mal, 20km weit weg, dann hätte ich mich zu diesem Zeitpunkt schon ordentlich geärgert. Aber was nun tun, wenn das ausgetauschte MLB kaputt sein sollte? Beim letzten Mal hatte Apple mit dem Ersatzteil 10 Tage gebraucht, und das wäre mit dem Umzug sehr, sehr knapp gewesen. Glücklicherweise war das alte, prinzipiell funktionale MLB noch nicht zurückgeschickt, im Zweifelsfall hätte man dieses wieder verbauen können – dann hätte ich zumindest ein benutzbares Notebook, wenn auch nicht mit mehr Speicher.

Dieses Mal hatte ich beim Abgeben des Geräts auch gleich den RAM-Riegel dazugelegt – ich wollte einfach nur noch ein funktionierendes Notebook, und hatte keine Lust mehr auf RAM ein- und ausbauen, testen und bangen, ob es funktioniert. Ich habe es dann tatsächlich am selben Tag wieder zurückbekommen. Könnte es sein, dass ich etwas ausfallend geworden bin? Sarkastisch? Dass ich meinen Ärger habe spüren lassen? Nein, das kann ich mir nicht vorstellen, dafür bin ich viel zu ausgeglichen und höflich.

Das Verdikt: Der Techniker hatte vergessen, das DC-Board anzuschliessen. Aber: Mein 1GB-Riegel funktioniert immer noch nicht – defekt. Andere 1GB-DIMMs funktionieren angeblich. Aber mangels eigenen Tests werde ich wohl nie mit Sicherheit sagen können, in welcher Konstellation was denn jetzt nicht funktioniert hat.

Somit habe ich nun ein PowerBook mit neuem MLB, viel Stress gehabt, aber immer noch nicht mehr Speicher. Ausser Spesen nix gewesen.

Den Vogel abgeschossen hat aber folgende Geschichte: Mir sagte man, dass Gravis Ersatzteile erst bestellen könnte, wenn das Gerät abgegeben sei, da man das defekte Teil innerhalb von drei Tagen zurückschicken müsse. Deshalb habe ich mit der Reparatur meines PowerBook bis zu meinem Urlaub gewartet – ich wollte nicht auf gut Glück 10 Tage auf es verzichten.

Nun habe ich am Dienstag einige Zeit bei Gravis verbracht, und in dieser Zeit wurde nicht nur einem, sondern gleich zwei Kunden angeboten, doch das Ersatzteil schon vorab zu bestellen, damit man es möglichst schnell austauschen könne. Man müsse nur sicher sein, dass das Gerät am nächsten Tag im Laden sei, wenn das Ersatzteil da ist.

Ah.

Oh.

Hatte ich nicht genau das vorgeschlagen?

Hatte man mir nicht gesagt, dass genau das nicht geht?

Ich glaube, jedem anderen ausser mir wäre in dieser Situation der Kragen geplatzt. Was macht man, wenn man von den Mitarbeitern so offensichtlich vera…lbert wurde? Mir fehlten einfach die Worte. Ich konnte einfach nichts mehr sagen. Jedes weitere Wort wäre zu viel gewesen.

Monday, 25 September 2006

Servicewüste Gravis

Filed under: — Sebastian Kirsch @ 13:53

Mein PowerBook ist inzwischen über zwei Jahre alt und hat mir in dieser Zeit treue Dienste geleistet. Gekauft habe ich es seinerzeit bei Gravis – einem der größten deutschen Apple-Händler, der praktischerweise eine Filiale in Bonn direkt bei mir gegenüber hat. Und um allem Ärger aus dem Weg zu gehen, habe ich damals sogar das sogenannte Gravis Safety Pack dazugekauft, das einen “Vollkaskoschutz” gegen Hardware-Defekte und sonstiges bietet.

Nachdem mir der Speicher mit 768MB inzwischen etwas zu klein geworden ist, wollte ich ihn auf 1,25GB aufrüsten. Diese Prozedur ist im Handbuch beschrieben und mit ein paar Handgriffen in wenigen Minuten durchzuführen. Leider erkannte das PowerBook danach immer noch nur 768MB – die Hälfte des neu eingebauten 1GB-Speicherriegels wurde also nicht genutzt. Anstatt den Speicher sofort zurückzuschicken, versuchte ich mein Glück erst einmal bei der Gravis-Filiale gegenüber.

Die Diagnose des dortigen Technikers lautete: RAM-Sockel kaputt, Mainboard (MLB – Main Logic Board) muss ausgetauscht werden. Dank Hardwareschutz würde mich das nicht einmal etwas kosten, sondern ginge auf Garantie.

Soweit, so gut – doch dann kam der Hammer: Ich sollte das Notebook doch bitte für 10-12 Tage dort lassen, damit sie das Ersatzteil bestellen und einbauen können. Wie lange es dauert, hinge davon ab, wie die Lieferzeiten für das Ersatzteil seien. Das war für mich zu dem Zeitpunkt inakzeptabel; ich kann momentan schlicht und einfach nicht für 10 Tage auf mein Haupt-Arbeitsgerät verzichten. Einen Ersatz besitze ich auch nicht, da das PowerBook mein einziger Apple-Rechner ist.

Ich machte daraufhin das Angebot, dass sie doch bitte das Ersatzteil bestellen sollten, und wenn es da ist, würde ich das PowerBook vorbeibringen, und sie könnten es innerhalb von einem oder zwei Tagen einbauen. Das ginge leider auch nicht, so sagte man mir: Gravis müsse das defekte Teil innerhalb von drei Tagen nach Erhalt des Ersatzteils an Apple zurückschicken, sonst würde es von Apple in Rechnung gestellt. Deshalb würden Ersatzteile generell erst bestellt, wenn das defekte Gerät angenommen sei.

Diese Praxis an sich ist für mich bereits eine äußerst kundenfeindliche Vorgehensweise. Das Ziel sollte sein, die Zeit möglichst gering zu halten, in der der Kunde auf sein Gerät verzichten muss – hier wird diese Zeit aber künstlich und sinnlos verlängert. In diesem Fall ist es für mich als Kunde egal, ob die Verzögerung denn nun an Gravis liegt, oder an einer Policy von Apple – als größter deutscher Apple-Händler sollte Gravis entweder in der Lage sein, entsprechende Konditionen mit Apple auszuhandeln, oder selbst einen entsprechenden Pool an Ersatzteilen aufzubauen. Oder im Zweifelsfall die entstehenden Kosten selbst tragen – denn wofür habe ich eigentlich mit dem “Safety Pack” bezahlt?

Man stelle sich einmal vor, ein Autohändler verlange, man solle doch sein Auto zwei Wochen auf seinen Parkplatz stellen, während er auf ein Ersatzteil wartet – niemand würde sich darauf einlassen, und schon gar nicht, ohne einen Ersatzwagen zu bekommen. Auf ein Auto lässt sich ausserdem wesentlich einfacher zu verzichten: Entweder, ich benutze den öffentlichen Nahverkehr, um zurArbeit zu kommen, oder ich miete ein Auto. Ein Notebook kann ich nicht einfach mieten – und es trägt zudem alle Daten, die ich zum täglichen Arbeiten brauche, ist also eigentlich sogar wichtiger als ein Auto.

Für dieses Problem war glücklicherweise eine Lösung in Sicht: Ich wollte zwei Wochen später sowieso für 10 Tage in Urlaub fahren, und in dieser Zeit könnte die Reparatur stattfinden. Mir wurde versichert, dass es kein Problem wäre, das Gerät in 10 Tagen zu reparieren, und dass ich es nach meiner Rückkehr aus dem Urlaub abholen könnte. Ich machte also meine letzten Backups und fuhr in Urlaub.

Gestern abend kam ich zurück, ging heute morgen in die Filiale, um mein Notebook abzuholen, auf das ich gerade 10 Tage verzichtet hatte – doch, man hätte es fast erwartet: Das Notebook war natürlich noch nicht repariert. Ersatzteil gerade erst eingetroffen, der Techniker braucht noch einen oder zwei Tage, um es zu verbauen und zu testen.

Daraufhin ist mir dezent der Kragen geplatzt. Nicht nur, dass Gravis sich nicht in der Lage sieht, das Ersatzteil rechtzeitig zu besorgen, um eine reibungslosen Reparaturablauf zu garantieren – nachdem ich ihnen 10 Tage Zeit gegeben habe, ist das Gerät immer noch nicht repariert. Der Zeitplan war vorhersehbar, und die nochmalige Verzögerung hätte nicht sein müssen.

So stelle ich mir professionelles Geschäftsgebaren nicht vor, und bin es aus meinen bisherigen Erfahrungen mit professionellen Computerherstellern nicht gewohnt. Gerade wenn ich einen zusätzlichen Hardwareschutz abschliesse und dafür teures Geld bezahle, dann erwarte ich einen gewissen Level an Service – und keinen Reparaturablauf mit eingebauten Verzögerungen. Professioneller Service ist anders.

Mit der Frage, ob man mir den (nicht bei Gravis gekaufen) Speicher, an dem sich das Problem gezeigt hatte, denn nicht auch gleich noch einbauen könnte, bin ich übrigens auch gegen eine Wand gelaufen: Nein, dafür müsse man mir mindestens zwei Service-Einheiten à 20EUR berechnen. Vielen Dank, für das Geld baue ich den Speicher selbst ein – das traue ich mir gerade noch selbst zu.

Service bei Gravis? Fehlanzeige.

Wednesday, 23 August 2006

SMF-Vortrag @ LUUSA

Filed under: — Sebastian Kirsch @ 01:09

Am Donnerstag halte ich im Rahmen der LUUSA einen Vortrag zur Service Management Facility von Solaris 10 und OpenSolaris. Interessierte sind wie immer herzlich eingeladen, Folien gibt’s hier.

Friday, 03 March 2006

Oracle auth mysteries

Filed under: — Sebastian Kirsch @ 14:15

Judging from my experience with Oracle databases, one of the most arcane mysteries is this: How does Oracle do authorization and authentication? Especially as processes at the “fringe” of normal operations are concerned – for example when connecting to an idle instance or when authenticating as a system user.

For those unfamiliar with Oracle, it ships with two system users: “sys” with a password of “change_on_install", and “system” with a password of “manager". These passwords can be changed after initializing a new Oracle instance, and will be stored in the database. They can also be stored in an external password file for remote logins.

But what if no Oracle instance exists? New Oracle instances are created by Oracle itself – it’s kinda like pulling yourself out of the swamp by your own bootstraps (or by your own hair, if you are Baron Münchhausen.) To do this, you use sqlplus, the Oracle commandline tool, and connect to an idle instance. This instance is started by sqlplus upon logon, but does not contain a database. So where does the authorization info come from? How does Oracle determine that you are allowed to start up an idle instance?

To start an idle instance, you have to the “SYSDBA” privilege. When no database exists, this privilege is inferred from the group membership of the operating system user starting sqlplus. This group is called the “SYSDBA” group in the Oracle documentation, and it is usually set to “dba". So if you create a unix group “dba” and create a user which is member of this unix group, this user has the “SYSDBA” privilege as far as Oracle is concerned.

You can then (as the aforementioned user) set the environment variable ORACLE_SID to the SID of the new instance and execute “sqlplus sys as sysdba” or “sqlplus / as sysdba". sqlplus will start up an idle Oracle instance and connect to it. You can then use this instance to create a new database. (Do not set the environment variable TWO_TASK – this variable is only used for non-local connections. But you can only connect to an idle instance via local connections.)

Here is the rub: How does Oracle determine the name of the unix group which has the SYSDBA privilege?

This information is hidden in the file $ORACLE_HOME/rdbms/lib/config.s or $ORACLE_HOME/rdbms/lib/config.c, depending on the platform; one is an assembler file, the other is a C source file. On Solaris, it is an assembler file (presumably because Solaris does not ship with a C compiler out of the box), and there is a part in it that looks like this:

                        .L12:
/* 0x0008         15 */         .ascii  "dba\0"
/* 0x0014         20 */         .align  8

.L13: /* 0x0014 22 */ .ascii “dba\0″

So to change the unix group with the SYSDBA privilege, you have to change the string constants in this assembler file, assemble it (there’s a makefile in the directory that does this for you,) and then relink the oracle binary. How is that for a (seemingly simple) configuration change?

This information applies to Oracle 9 and Oracle 10; if you are still on Oracle 8, you have to use “sqlplus /nolog” and then say “connect internal” to connect to an idle instance. (As far as I remember; I do not have an Oracle 8 instance handy at the moment.) The privilege is called “CONNECT INTERNAL” instead of “SYSDBA". But changing the group name is done by the same means.

More info about this can be found in this AskTom article. If there is anything wrong with this article, please contact me; I am no Oracle DBA, I just spent many happy hours chasing after Oracle login problems.

Saturday, 18 February 2006

Praktische Sachen

Filed under: — Sebastian Kirsch @ 17:03

Ich habe mir kürzlich ein ganz praktisches Gerät gekauft. Hintergrund ist folgender: Ich bin manchmal etwas vergesslich, und wenn ich morgens die Kaffeemaschine anschalte, dann vergesse ich oft sie wieder auszuschalten – und komme dann abends nach Hause und ärgere mich, weil sie noch läuft. Oder ich will abends noch einen Kaffee trinken, schalte die Maschine an, mache was anderes, während sie aufheizt – und drei Stunden später stelle ich fest, dass ich sie komplett vergessen habe, und habe dann natürlich keine Lust mehr auf Kaffee. Nicht nur, dass sie dabei die ganze Zeit sinnlos Strom frisst (und das ist nicht gerade wenig), dazu kommt ja auch noch, dass so eine Maschine eine nicht zu vernachlaessigende Brandgefahr darstellt.

Aber, wie gesagt, gibt es dafür ein ganz praktisches Gerät. Das ist eine Zeitschaltuhr, die nach einer fest eingestellte Zeit nach dem Einschalten automatisch wieder ausschaltet. Bei mir ist die Uhr beispielsweise auf eine Stunde eingestellt, sprich, wenn ich die Kaffeemaschine einschalte, wird sie automatisch nach einer Stunde wieder ausgeschaltet. Gekauft habe ich dieses Wundergerät für 15€ bei P+M Elektronik, meinem erklärten Lieblingsladen für alle Elektrosachen.

Um diese 15€ wieder reinzuholen, müsste die Maschine schon lange sinnlos laufen. Aber andererseits beruhigt es mein Gewissen ungemein, wenn ich mich nicht ständig fragen muss, “Hab ich jetzt heut morgen die Kaffeemaschine ausgeschaltet oder nicht?” Und das ist mir definitiv 15€ wert.

Hier noch ein Foto des Geräts:

[ Zeitschaltuhr ]

Wednesday, 15 February 2006

Lucene-Demo @ LUUSA

Filed under: — Sebastian Kirsch @ 10:13

Letzten Donnerstag habe ich bei der LUUSA einen kurzen Vortrag über Lucene gehalten, eine Open Source-Library für Text Retrieval. Der Vortrag war als Demo angelegt, sollte also die minimalen Schritte zeigen, um Indexierung und Suche in Lucene zu implementieren – frei nach dem Motto “Show us the code".

Ich habe mich besonders über die Diskussion nach dem Vortrag gefreut, die sehr angenehm und lebhaft war – und mir so gezeigt hat, dass ich mein Publikum nicht “abgehängt” habe. So kamen beispielsweise Fragen zu Indexierungsstrategien auf, zu Stemming-Algorithmen, zur Indexgröße, und zur Parallelisierung.

Das hat mir so viel Spass gemacht, dass ich schon überlege, worüber ich den nächsten Vortrag halten könnte. Ich könnte beispielsweise ein paar Solaris 10-Themen anbieten, etwa zur Service Management Facility (SMF) oder zu Zones.

Die Folien und der Code für den Vortrag sind übrigens auch online.

Sunday, 22 January 2006

Freiheit für Programmierer?

Filed under: — Sebastian Kirsch @ 02:01

Manchmal schreiben die üblichen Verdächtigen doch Sachen, zu denen ich mir einen Kommentar nicht verkneifen kann. So schreibt Marcus unter der Überschrift Design happens:

wenn man eine Menge an guten Programmieren nimmt (ein Projektteam), ihnen sagt was sie tun sollen (die Featureliste), dann entsteht das Design von ganz allein.

Das hört sich im ersten Moment gut an, hat leider einen gravierenden Nachteil:

Diese Methode scheitert schon bei Schritt 1. “Wenn man eine Menge an guten Programmierern nimmt…” Leider ist diese Menge an guten Programmierern meistens nicht so ohne weiteres greifbar. Es gibt nicht so viele gute Programmierer. Es gibt genauer gesagt sogar nur sehr, sehr wenige gute Programmierer.

Und: Methoden des Software-Design werden nicht für gute Programmierer entwickelt. Ein Mozart musste keine Musikhochschule besuchen, um Symphonien zu schreiben – aber behaupten wir deshalb, Musikhochschulen seien überflüssig, und man müsse guten Musikern nur ein Instrument an die Hand geben, und dann würden sie schon mit Freiheit und klar definierten Aufgaben Meisterwerke komponieren? Behaupten wir, Mathematik-Fakultäten wären überflüssig, weil man guten Mathematikern (wie Srinivasa Ramanujan) nur ein Anfängerbuch über Differentialrechnung in die Hand geben müsste, und sie erfinden dann die Grundlagen der Numerik selbst neu?

Nein, Software-Engineering wurde nicht für die wenigen Wunderkinder erfunden. Software-Engineering wurde erfunden, damit nicht-Wunderkinder eine Methodologie haben, nach der sie ein Projekt zu Ende bringen können – vielleicht nicht so elegant wie ein Wunderkind, vielleicht nicht so fehlerfrei, vielleicht nicht so schnell, aber so, dass es beendet wird. Einem Mozart würde niemand übel nehmen, wenn er sich über die bestehenden Konventionen hinwegsetzt. Aber gleichzeitig käme niemand auf die Idee, seine Arbeitsweise verallgemeinern zu wollen und als Vorbild darstellen zu wollen. Ebenso sollten die “guten Programmierer", die oben angesprochen wurden, eine gesunde Selbsteinschätzung besitzen, und Methoden, die für sie selbst funktionieren, nicht als allgemeingültig propagieren.

Und wenn förmliche Prozesse wirklich nur behindern, und wenn gute Programmierer wirklich nur Freiheit brauchen, um erfolgreiche Software entstehen zu lassen: Wie erklären wir uns dann die absolut katastrophalen Erfolgsraten im Software-Engineering? Wie erklären wir uns, dass 1994 nur 16% aller Software-Projekte im Zeit- und Budgetrahmen fertiggestellt wurden? Und dass 31% aller Projekte gar nicht beendet wurden? (Zahlen aus dem Chaos Report.) Hatten die Programmierer zu wenig Freiheit?

Nachtrag: Wer von Hackers and Painters beeindruckt war, dem empfehle ich (um die Perspektive ein bisschen zurecht zu rücken) einen Blick in Dabblers and Blowhards.

Wednesday, 18 January 2006

Linux DVD burner troubles

Filed under: — Sebastian Kirsch @ 20:59

Recently, I had much fun trying to set up an IDE DVD burner under Linux.

After asking a couple of friends, who told me, “Buy LG or BenQ, those are good", I decided on a BenQ DW1640. Little did I know …

Strangely enough, burning DVDs and CDs worked fine from the start. The drive just had trouble reading media – regardless whether it was a DVD or a CD, whether it was burned or pressed. I would get I/O errors, the media wouldn’t mount, and in some cases, the machine locked up completely or threw a kernel oops.

Obviously, this is something that is not supposed to happen. A Linux box is not supposed to throw a kernel oops or lock up just because of a DVD burner. And googling for the error messages did not turn up much useful material either.

So I tried the usual elimination strategy: I tried two different mainboards (one a P3-800, one a P4-1.5GHz), I tried two different Linux versions (2.6.5 and 2.6.14.6), I tried different media (see above). CDs written by the DVD burner would read perfectly in the CDROM drive (which the machine also has), but would not read in the DVD burner. Nothing helped.

In the end, I found my answer buried in an Ubuntu bug report:

It seems that the BenQ DW1640 does not like PIO mode. You have to use it in DMA mode. No DMA, no dice. And you need a proper 80-pin cable. If you don’t, you will get all sorts of strange errors. While testing, I even had a very helpful message telling me that “This drive is not supported by this version of the driver.” – which is quite clearly not the case. It’s just another IDE drive, ya know?

According to the bug report, a firmware upgrade may help with this problem. But of course, BenQ in their infinite wisdom only supply the firmware upgrade software for Windows (specifically, Windows 98/Me/2000/XP – if you have NT or Windows 95, you’re out of luck.) Whatever happened to supplying firmware upgrade utilities on small bootable DOS floppies? Also according to the bug report, this problem is shared by drives with the Philips Nexperia PNX7860E chipset, including the Plextor 740A.

So, thank you very much, BenQ and Linux, for two days of fun tracking down this problem!

For the record (and for Google) this is the kind of error message I got in the log file:


Jan 17 17:32:42 kernel: hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest }
Jan 17 17:32:42 kernel: ide: failed opcode was: unknown
Jan 17 17:32:42 kernel: hdc: drive not ready for command
Jan 17 17:32:42 kernel: hdc: ATAPI reset complete
Jan 17 17:32:42 kernel: hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest }
Jan 17 17:32:42 kernel: ide: failed opcode was: unknown
Jan 17 17:32:42 kernel: hdc: drive not ready for command
Jan 17 17:32:42 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Jan 17 17:32:42 kernel: printing eip:
Jan 17 17:32:42 kernel: c015e050
Jan 17 17:32:42 kernel: *pde = 00000000
Jan 17 17:32:42 kernel: Oops: 0002 [#1]
Jan 17 17:32:42 kernel: Modules linked in: edd evdev joydev st sd_mod sr_mod ide_cd cdrom nvram speedstep_lib thermal ipv6 processor fan button battery ac paride parport intel_agp agpgart sg scsi_mod uhci_hcd i2c_i801 i2c_core 3c59x mii usbcore dm_mod
Jan 17 17:32:42 kernel: CPU: 0
Jan 17 17:32:42 kernel: EIP: 0060:[<c015e050>] Not tainted VLI
Jan 17 17:32:42 kernel: EFLAGS: 00010287 (2.6.14.6)
Jan 17 17:32:42 kernel: EIP is at create_empty_buffers+0x20/0x80
Jan 17 17:32:42 kernel: eax: 00000000 ebx: c1134fa0 ecx: 00000000 edx: 00000000
Jan 17 17:32:42 kernel: esi: 00000000 edi: dffea384 ebp: 00010000 esp: d64adce0
Jan 17 17:32:42 kernel: ds: 007b es: 007b ss: 0068
Jan 17 17:32:42 kernel: Process dd (pid: 11848, threadinfo=d64ac000 task=de3cd030)
Jan 17 17:32:42 kernel: Stack: c1134fa0 00000000 c015eba0 22e8ca00 dffede00 de5cd01c dffede08 00000020
Jan 17 17:32:42 kernel: 00000001 dfff3800 00000096 00000000 00000000 00000096 dffea388 dffea2e4
Jan 17 17:32:42 kernel: c0162ba0 c1134fa0 c021852f c1134fa0 00000000 dffea388 c1134fa0 dffea384
Jan 17 17:32:42 kernel: Call Trace:
Jan 17 17:32:42 kernel: [<c015eba0>] block_read_full_page+0x270/0x320
Jan 17 17:32:42 kernel: [<c0162ba0>] blkdev_get_block+0x0/0x80
Jan 17 17:32:42 kernel: [<c021852f>] radix_tree_insert+0xbf/0x110
Jan 17 17:32:42 kernel: [<c0144b2c>] read_pages+0x4c/0x100
Jan 17 17:32:42 kernel: [<c01427ac>] __alloc_pages+0x17c/0x410
Jan 17 17:32:42 kernel: [<c0144c84>] __do_page_cache_readahead+0xa4/0x100
Jan 17 17:32:42 kernel: [<c0144e31>] blockable_page_cache_readahead+0x51/0xd0
Jan 17 17:32:42 kernel: [<c01450b6>] page_cache_readahead+0x146/0x1a0
Jan 17 17:32:42 kernel: [<c013e896>] do_generic_mapping_read+0x326/0x490
Jan 17 17:32:42 kernel: [<c013ea00>] file_read_actor+0x0/0xf0
Jan 17 17:32:42 kernel: [<c013ec92>] __generic_file_aio_read+0x1a2/0x210
Jan 17 17:32:42 kernel: [<c013ea00>] file_read_actor+0x0/0xf0
Jan 17 17:32:42 kernel: [<c013ee05>] generic_file_read+0x95/0xc0
Jan 17 17:32:42 kernel: [<c014d45a>] do_no_page+0x6a/0x2a0
Jan 17 17:32:42 kernel: [<c012ed00>] autoremove_wake_function+0x0/0x50
Jan 17 17:32:42 kernel: [<c015b667>] vfs_read+0xd7/0x180
Jan 17 17:32:42 kernel: [<c015b9f1>] sys_read+0x41/0x70
Jan 17 17:32:42 kernel: [<c0102eaf>] sysenter_past_esp+0x54/0x75
Jan 17 17:32:42 kernel: Code: 8d 74 26 00 8d bc 27 00 00 00 00 56 89 ce b9 01 00 00 00 53 89 c3 e8 40 f7 ff ff 89 c1 89 c2 8d b6 00 00 00 00 8d bf 00 00 00 00 <09> 32 89 d0 8b 52 04 85 d2 75 f5 89 48 04 8b 03 a8 08 75 06 8b

Saturday, 26 November 2005

Upgrade to iTunes 6

Filed under: — Sebastian Kirsch @ 20:49

I’m slowly catching up on my upgrade backlog now that I have some more free time (might even get around to upgrading my desktop pc to Debian sarge some of these days), and today I installed iTunes 6.0.1. Well, I tried to anyway.

After installing it via Software Update, about one fourth of my music library were suddenly missing. All the files were in their respective directories, they just didn’t turn up in iTunes. (NB: Those were all files that I ripped from CD, not purchased via iTMS.)

On top of that, Software Update somehow hadn’t registered that the new iTunes was already installed and still offered it to me for upgrade. So I installed it again, this time from the downloadable disk image. This time, Software Update registered the upgrade, but I had to reboot during the installation. I have to reboot to install a new iTunes version? Why is that? It’s just a music player!

But my songs were still missing, and nothing I found on the web described this problem.

In the end, I dragged the “iTunes Music” folder to the “Library” in iTunes. This caused iTunes to re-import the songs, but of course, all the ratings etc. that are stored in iTunes, not in the MP3 file itself, were lost.

And people wonder why I’m reluctant to upgrade.

Wednesday, 28 September 2005

Don Knuth doesn’t validate

Filed under: — Sebastian Kirsch @ 11:30

Prof. Donald E. Knuth is one of the über-gods of computer science; most people know him as the author of numerous computer science papers, author of “The Art of Computer Programming", the author of the TEX typesetting system, and as the editor of the Journal of Algorithms and the ACM Transactions on Algorithms.

Knuth is known to be very fastidious in his work. The quality of the TEX system is legendary; Knuth pays a reward for every bug found in TEX, which started at $2.56 (one hexadecimal dollar), and doubled every year (until it was frozen at $327.68). TEX is essentially frozen, and has been in use unchanged for 20 years. While the first three volumes of The Art of Computer Programming have already been published, at the age of 67, Knuth is still preparing the remaining four volumes, expecting to work on them for another 20 years.

Unfortunately, he is also known for under-estimating timescales: When he started work on TEX, he expected to complete the project during his sabbatical; in fact, it took about eight years. The first draft for The Art of Computer Programming was supposed to be a book on compiler design, but turned into a 3000 page manuscript on the fundamentals of computing.

Recently, Knuth clashed with the administrators of W3 Validator, a validation service for HTML documents. The complete email exchange started with this message. Knowing a little about Knuth’s personality and mode of work, I found it quite hilarious.

Oh, the reason why this message is not sent from Knuth’s own email address is that he doesn’t have one. This lucky person quit using email in 1990, he has a secretary for that.

(Thanks to ScottyTM)

Tuesday, 09 August 2005

FrOSCon ahoy!

Filed under: — Sebastian Kirsch @ 21:22

I am pleased to report that my local linux user group, the LUUSA (Linux/Unix User Group St. Augustin), is co-organizing a conference next year, and the web pages have just gone on-line.

The FrOSCon (Free and Open Source Conference) (deutsche Seiten: FrOSCon) is taking place on 29th/30th of April, 2006, in St. Augustin, near Bonn, Germany. A call for papers is expected in November.

Spread the word, and all that.

Wednesday, 29 June 2005

Belgium, Netherlands, what’s the difference?

Filed under: — Sebastian Kirsch @ 23:21

It seems that Google Maps needs a little update in European geography:

How to get this view: Just open Google Maps and drag the map over to Europe. If you zoom in one level, the descriptions get reversed.

Monday, 06 June 2005

Mixed feelings

Filed under: — Sebastian Kirsch @ 23:50

I am in a quandary this evening: Should I rejoice because Debian Sarge has been released, or should I mourn that Apple is switching to Intel processors?

Realistically speaking, the Sarge release does not solve any of Debian’s problems – it just gives the developers some breathing room. I wrote the Heise Newsticker announcements for the two previous releases 2.2 (2000/08/15) and 3.0 (2002/07/20), and even back then, we were lamenting the fact that the release cycles of Debian were much too long. Oliver Diedrich is making the same point in his Newsticker announcement for Debian 3.1: Sarge is out of date on the day it is released. Sure, it is an improvement over Woody, but everyone who wanted to have an even marginally up-to-date Linux has already switched to a different distribution long ago.

But well, who am I to talk? I have not updated my Woody box for almost a year, I have not even used it much except for watching TV and checking my mail. Ever since I got my PowerBook. Which brings us to the second topic of this post:

Apple is switching to Intel processors. After this has been rumoured and predicted for years, it has finally come true. The worst part: John C. Dvorak has been right after all, and he only missed the timeframe by about 8 months.

I have always been a fan of the PowerPC – I was amazed how much power IBM could squeeze out of this thing. I am working at a Sun shop, and Sun has needed more than twice as many CPUs in the past to outperform an IBM PowerPC box. IBM pioneered dual-core chips and multi-chip modules with its POWER4 and POWER5 architectures. (Dual core chips have been available in the x86 world for less than two weeks.)

In comparison, the only processor innovation at Intel in the last couple of years, the Itanium with its EPIC architecture, was a monumental flop. As regards processors for desktops and workstations, Intel had the be dragged screaming and kicking into the 64 bit world by the un-anticipated success of AMD’s Opteron processor. And even with Intel’s version of the x86-64 architecture (called EM64T), the Opteron has the better architecture, especially as regards multi-processor configurations.

What does this mean for Apple? At the surface, not much. Apple products will have the same design, the same manufacturing quality, and the same ease of use, regardless of whether they are powered by an IBM processor or an Intel processor. But for the more technical audience, they will lose a lot of their appeal – and in the long run, I expect that this will influence the typical home users as well. The slogans will no longer be “Think different” and “Computers for the rest of us", but “Think almost the same” and “Computers for those who want a prettier box for their living room".

It is interesting to note that Steve Jobs pulled a similar trick once before, when porting the NeXTStep operating system from its original 68k platform to x86 (yes), SPARC and HP-PA. NeXT was a company similar to Apple, in that it was a combined software and hardware manufacturer. Their innovative software (a modern Unix running on a Mach microkernel, the Objective-C language) combined with forward-looking hardware (the NeXT Cube, optical drives) were legendary. But the commoditization of the hardware platform eventually led to the demise of NeXT and the sale of the company to Apple, which reused large parts of NeXT’s technology to create MacOS X.

Another precedent is SGI’s brief flirt with x86 processors for workstations that served no purpose but to accelerate the demise of the company. Nowadays, SGI is producing MIPS R16000 and Itanium systems, but no x86 systems anymore. SGI’s x86 system were regarded as nothing but a gimmick, the usual reaction was “Why should I buy SGI to get the same processor as a Dell box for twice the prize?”

As for me, I am not very deeply entrenched in MacOS X as an operating system. Most of the apps I use are cross-platform, and the few that are not are easily replaced. I bought my PowerBook because it was the only real Unix notebook on the market at that time. I am quite happy with MacOS X since it runs all my required applications, and allows me to develop code that will easily run on Unix-like operating systems. But in the future, I will take a good, long look at Apple’s offerings and think about whether Apple still provides a significant advantage over an Intel notebook running Linux.

Saturday, 14 May 2005

MacOS X: Disabling iPhoto launch

Filed under: — Sebastian Kirsch @ 13:15

I don’t use iPhoto on MacOS X, but iPhoto keeps popping up every time I connect my digital camera, which is very annoying. I searched in vain in the iPhoto preferences and in the System Preferences for a way to disable this, but couldn’t find any. (Other applications, like Mail.app or Safari have an option in their preferences, for example “Default Web Browser:” or “Default Mail Client:".)

It turned out that the option is hidden in the Preferences of “Image Capture", not in iPhoto – an application I’d never used or opened before. Congratulations to Apple!

According to this page, there used to be a pane in System Preferences for chosing the default applications before MacOS X 10.3. In 10.3, these options were moved to the individual applications’ preferences. This seems counter-intuitive, to say the least. Why not make it available in both places?

Fortunately, RCDefaultApp makes this feature available again. Perhaps the author can also find a way to make actions like the iPhoto launch configurable via this panel.

Oh, another tip I got today: How to make hidden folders accessible in Finder: Click on the “Go” menu, then on “Go to folder", and enter the full pathname. I have not found a way to make all hidden folders visible in Finder, in just the way that they are visible on the console. This is still very annoying. If someone has a fix for this, I would be grateful to hear it.

Monday, 09 May 2005

Mikrotypographie

Filed under: — Sebastian Kirsch @ 22:52

Da ich momentan Diplomarbeit schreibe und ein paar Bekannte auch gerade an Diplom- und Doktorarbeiten sitzen, und ich zudem als LATEX-Experte bekannt bin, häufen sich bei mir gerade die Fragen zum Thema Schriftsatz, Typographie und LATEX. Die Antwort liegt dabei oft im sprichwörtlichen “wissen, wo’s steht” – in welcher FAQ, welchem Buch oder welchem Dokument die Antwort denn zu finden ist.

Ein beliebtes Thema ist die Mikrotypographie: Wo setzt man welche Art von Bindestrich, wo kommt welches Leerzeichen hin, welche Anführungszeichen wohin, kommt das Satzzeichen vor die schließende Klammer oder dahinter? Diese und viele andere Fragen wird in einem Aufsatz von Marion Neubauer behandelt, der als Teil I und Teil II von “Feinheiten bei wissenschaftlichen Publikationen” in der TEXnischen Komödie erschienen ist.

Ich versuche meistens, mich nicht allzu lange mit typografischen “Fitzelkram” aufzuhalten. Manche Grundlagen sind aber unverzichtbar, beispielsweise der richtige Gebrauch von Satzzeichen und Abständen. Abkürzungen vermeide ich, soweit es geht – sie sparen im Fließtext weder Zeit beim Schreiben noch beim Lesen. Akronyme setze ich in Kapitälchen, mit folgendem Kommando:

\DeclareRobustCommand{\abbrv}[1]{\textsc{\MakeLowercase{#1}}}

Ansonsten habe ich mir logische Auszeichnung im großen Stil angewöhnt. Explizite Schriftauswahl benutze ich praktisch nicht; stattdessen definiere ich ein Makro, das die Funktion des Schriftwechsels ausdrückt: beispielsweise \forgn für fremdsprachlichen Text, der meistens kursiv gesetzt wird, oder \defn für Definitionen, die ich fett setze. Für Anführungszeichen habe ich auch ein Makro, mit dem ich den Stil nachträglich ändern kann, und das sicherstellt, dass alle öffnenden Anführungszeichen auch wieder geschlossen werden:

\DeclareRobustCommand{\q}[1]{`#1'}

Bei Texten mit vielen mathematischen Formeln gehe ich sogar so weit, für alle oft gebrauchten Variablen und Funktionen beschreibende Makros anzulegen: Eine Instanzvariable x aus einem Instanzenraum X wird beispielsweise als Makro \inst und \Inst angelegt. Das ist zwar mehr Schreibarbeit, hat aber mehrere Vorteile: Zum Einen sind die Variablen so automatisch konsistent benannt. Gibt es Kollisionen zwischen den benutzten Zeichen, so kann man dies frühzeitig erkennen und beheben; es besteht nicht die Gefahr, beim Umbenennen einer Variable etwas zu übersehen. Entscheidet man sich im Nachhinein, sich an einer anderen Notation zu orientieren, kann man dies ohne Probleme ändern.

Tuesday, 26 April 2005

A layman’s take on aspect-orientated programming

Filed under: — Sebastian Kirsch @ 14:12

During my classes on software technology, we were also treated to a short introduction to aspect-oriented programming. Aspect-oriented programming is the latest trend1 in software engineering: It aims to implement separation of concerns not in different classes or object, but as different “aspects” of the same object. Typically, the cited uses are a “logging/debugging aspect", a “security aspect”2, a “persistence aspect” etc.

What’s an aspect? Twenty years ago, people were asking the same thing about object-oriented programming: What’s an object? As no coherent definition of an aspect has yet emerged, I will describe aspects by their most common implementation, AspectJ, a Java dialect. AspectJ introduces the concept of join point interception: at certain points in the program flow, additional code (the aspect) may introduced into the program. “Certain points” in this case are, basically, every method call. One may write a function (an aspect, also called “advice") that will be executed before certain method calls, after certain method calls, or even around certain method calls. This function will be executed in the context of the method call in question, receive all of its arguments, and may basically do what it wants. The methods to which an aspect may be applied can be specified using wildcards.

There is a workgroup for AOP and OOP at Bonn University that has developed an even more general aspect language called LogicAJ, as well as tools for transforming java source code and run-time interception of Java method calls.

Obviously, this is a very powerful technique. The first time I saw it, I said to the instructor, “We’ve been doing that in Perl for years, manipulating the symbol table of other modules and exchanging their subroutines for our own. And we always felt bad about it.”3 I was wary of it; it somehow went against my instincts. (That’s not to say I couldn’t become used to it; I also became used to function and operator overloading in C++.)

My reasoning was thus: I perceive a trend towards more and more structural and semantic clues in programming language, in order to help the programmer to provide better structure for his code. In LISP, we had functions and lists. That was it; LISP is one of the most syntax-free languages – but on the other hand, it allowed you to do everything, to program in any way you wanted. In Java, one of the most successful languages at the moment, we have a distinction between classes and interfaces, objects, object methods and variables, class methods and variables, static variables, final variables, subclasses and subinterfaces, etc. ad nauseam. Some languages provide specialized syntax for member initialization (for example C++), for choosing the appropriate function depending on the type of the argument (function overloading, in C++ or Java) or even according to the value of the argument (pattern matching, for example in the ML language.)

And now people are using this artificial structure to put a totally generic system on top of the language. Which, predictably, leads to problems. The application of aspects to a base program is euphemistically called “weaving” – which works all right and is understandable as long as you have only one aspect. But what happens when several aspects are applied to the same base program? How do they interact? In what order should they be executed? What if one aspect relies on functionality provided by a different aspect? Can an aspect be applied to another aspect (leading to a meta-aspect)? These problems are still unsolved. I doubt that it will ever be solved – the formal semantics of even the simplest imperative languages are thorny enough.

Slashdot picked up this problem in a recent article. The article is based on papers from Passau University that attack the soundness of aspect-oriented programming. They even go as far as titling one paper “Aspect-Oriented Programming Considered Harmful” (recalling Edsger Dijkstra’s seminal paper Go To Statement Considered Harmful”.) They state some of the problems of aspects much more succinctly than I could.

In a way, aspect-oriented programming is the antithesis to another trend in language design – functional programming. Functional programming tries to minimize or even remove side effects of code, specifying that a function with the same input data will always return the same data – regardless of the order in which function calls happen. Aspect-oriented programming does exactly the opposite – the whole purpose of an aspect is introducing side effects to existing code. One may argue which way is better – no side effects or all side effects.

Functional programming does have some advantages: for one, it is usually much easier to debug and to verify than imperative code (and, by extension, object-oriented and aspect-oriented code.) – owing to the fact that it’s virtually free of side effects. Some organizations that tried it reported a marked increase in programmer productivity; Ericsson springs to mind with the Erlang language. A good overview of the characteristics and advantages of functional languages in in the corresponding Wikipedia article.

As I said in the headline, I am just a layman in the field of programming language design. I am not particularly attached to a specific programming language, but I have seen lots of them. Why I just seem to perceive all their worst points, I don’t know; I guess I just have a critic in me.

Footnotes:
1It’s hard not to say “fad” instead of “trend".
2Whatever security actually is in this context.
3It turns out that there is actually a Perl module for aspect-oriented programming that does exactly that; in difference to Java, there is no need to change the Perl interpreter itself in order to implement aspect-oriented programming. The same is true for LISP, where aspects can be introduced using macros. In fact, LISP with its macro mechanism was the first language to introduce the idea of code as a “different kind of data” that can be manipulated by other code.

Friday, 01 April 2005

Data Structures in Python

Filed under: — Sebastian Kirsch @ 19:05

Modern programming languages, especially scripting languages like Perl and Python exhibit a suspicious lack of data structures. They provide arrays (or lists), and associative arrays, which are often implemented using a hash table with a fixed number of buckets.

More complicated data structures are missing: you won’t find binary search trees (such as AVL trees or red-black trees), or priority queues (like fibonacci heaps), or tree structures for implementing sets in their standard library, nor data structures for common problems, such as graph structures. And this is despite the fact that the standard library of those languages is usually huge, providing everything from HTTP clients over modules for manipulating emails to XML-RPC interfaces. (In contrast, the C++ Standard Template Library provides a set and priority queue implementation, as does the Moscow ML standard library.)

Why is that? Is it because everyone who needs a specialised data structure is supposed to implement it himself? Or are the performance gains deemed to be negligible because the built-in data structures are heavily optimized and therefore “fast enough"? Or because nobody expects these languages to be used in performance-critical environments anyway? I don’t know …

Common wisdom in data structure design is that an algorithm that is in a lower complexity class will always outperform one in a higher class for large problems, regardless of its overhead. So a search tree implementation in Python should be able to outperform the built-in hash table implementation for large data sets regardless of how well optimized the hash table is.

For example: A lookup in a hash with 8 buckets (the standard size in Perl) and about 10000 items will need on average about 600 operations. (On average 1250 items in one bucket, and the right one will be found after searching on average half of the items.) Hash tables are only appropriate when the number of items that will be stored is about the same as the number of buckets. A lookup in a balanced search tree of the same size will need at most 14 operations (because a balanced binary tree with 10000 nodes has a depth of log2(10000) ≈ 13.3)

Every computer science student has implemented these data structures at least once, and most have before the end of their second year – so it cannot be the shortage of capable programmers or maintainers that prevents them from being included in the standard library.

Of course, there are Perl packages on CPAN that implement trees and sets and whatnot. Usually, they are at version 0.1 or 0.3 and have not seen a release for a couple of years – and look like they were implemented by a computer science student before the end of his second year. By putting those algorithms in the standard library, one would ensure that the code is properly maintained and follows the coding standards.

For Python, there is an online book called Data Structures in Python, with downloadable source code. The author has a Ph.D. in computer science, was associate professor at the University of Waterloo, and published two books on data structures and algorithm with OO languages, namely Java and C++. These books, plus their C#, Python and Ruby versions are all available online. However, the source code is copyrighted by the author and is not licensed for further use, so one cannot use it in one’s own projects.

Sunday, 27 March 2005

Email clients

Filed under: — Sebastian Kirsch @ 15:59

“All mail clients suck. This one just sucks less.” (Michael R. Elkins, author of mutt)

How true …

My email setup is somewhat historically grown and involved – I started out with an account on a German mailbox network, and in order to use that under Linux, I had to have a gateway between that mailbox network and RFC822 mail. That meant having an email system and a news server on my home box. (This was about ten years ago.) Three years later, I signed up with INKA e.V. to have email and news delivered via UUCP – which also meant having a complete email system. I used to use mutt for years on my home box to read the mail as it was delivered to my local user account.

When I got my PowerBook, that changed – I did not want to replicate this email setup on my notebook (I do not even know whether UUCP will work under MacOS X); instead, I wanted to use my Linux box as a mail server, and access it from my notebook via IMAP. I also wanted to replicate all my emails (about 1.7GB in 800 mailboxes, collected over ten years) via IMAP. Since replication and offline mode is officially supported in IMAP, that should be possible. I just had to find a piece of software that would let me do that …

This turned out more difficult than I imagined.

I tried Mozilla Thunderbird, and while it supports offline mode, it has to be enabled individually for each and every folder. Not good.

Next came Mail.app, which is included in MacOS X. Its support for offline mode is good (one checkbox, and it synchronizes every mailbox; plus, different accounts can be taken offline and online individually.) It also downloaded all of my email without a hitch. Unfortunately, it turned out to be rather buggy: It would occasionally forget that it had moved a mailbox to a new place, and complain that it couldn’t find it. The upgrade to 10.3.8 broke it completely; it started to open so many connections to my email server that the server shut down, presuming that the client was caught in a loop. I also couldn’t get it to work with the Courier IMAP server on which one of my accounts resides; Courier uses . as a path separator and groups all mailboxes as subfolders of INBOX, and that didn’t seem to go over well with Mail.app.

There’s also Mulberry, which I used a couple of years ago at work under Windows, and which is supposed to support IMAP quite well. I realized that there’s a MacOS X version available, so I gave it a try. It is written using Carbon, so it doesn’t follow the usual OSX interface guidelines very well; it has a rather quaint feel. The latest stable release is a couple of years old, so I tried a recent beta that was already quite stable. Unfortunately, synchronisation is quite buggy on mulberry as well: I sometimes wouldn’t be able to apply local changes to the remote IMAP server, and had to disable the synchronisation of local changes completely in the end. The interface also takes some getting used to, and you cannot take accounts online and offline individually.

I also tried Eudora, which is rather popular at my current employer. The configuration and use of Eudora was completely incomprehensible to me – yes, I did not read the manual, but I could not even get far enough with it to find out whether it supports the features I need. It’s considerably different from all other mail clients I tried.

Well, where did I end up in the end? You probably guessed it already: mutt – with the help of a couple of other tools.

Mutt by itself is just a mail reader – it does have IMAP support for reading mails online, but it does not support offline operation and synchronisation between a local and a remote store, and it cannot send email without an external program. After all, mutt is a unix program, and the philosophy is “one tool for one job". Mutt is a mail reader, not a mail synchronizer or a mail sender. There are other tools for those jobs.

The most powerful IMAP synchronizer I could find was Offlineimap. Offlineimap is very peculiar at first glance: The homepage lives on a gopher server (the link above is to a gopher-http gateway; the gopher itself is also directly accessible.) Version control for the source code is done using GNU arch. It is written in Python. But despite those peculiarities, it is very powerful, very easy to configure and fast. Some drawbacks: It cannot add folders on the IMAP server (but if you add a folder on the IMAP server, it will synchronize that to a local folder.) And mucking about with the synchronization info can be very dangerous to your email: I once failed to delete all the synchronization info before a configuration change, and offlineimap proceeded to delete my INBOX on the IMAP server. D’oh! Deleting all the synchronization info is OK, though, and will not lead to mail loss.

For sending email, I use msmtp, an SMTP client that was developed for use with mutt. It connects directly to the remote SMTP server (with SSL and TLS and authentification and all that, and support for several accounts) and delivers your mail.

Several accounts can be configured in mutt via hooks that change the personal info (From header, signature files, mail server) according to the current folder. I’m still tweaking my mutt setup, but it looks promising. And hopefully, I’ll soon be able to read emails in a useful fashion again.

Saturday, 19 March 2005

Interview with Tim Bray, Inventor of XML and RDF

Filed under: — Sebastian Kirsch @ 11:11

ACM Queue is proving itself again as a source of worthwhile information on the current state and trends in computer science and IT. This time, it has an interview with Tim Bray, the inventor of XML and RDF.

I find that I share many of Bray’s views, especially his pessimism about the so-called Semantic Web. Having read AI and computational linguistics in the course of my studies, I concur with Bray when he says, “You know, K[nowledge] R[epresentation] didn’t suddenly become easy just because it’s got pointy brackets.” He also mentions Doug Lenat, founder of the Cyc project, which produced one of the largest and most complete knowledge bases and has been going on for more than 15 years. There is an open off-shot called OpenCyc which contains part of the software and of the knowledge base. Cyc is an admirable project, but its applications are scarce, despite the enormous amount of energy that has been spent on it. It’s unlikely that knowledge representation in XML will change that – KR touches on many very deep and thorny problems in machine reasoning, inference, logics, and computational linguistics.

Bray also mentions the applications of XML and its relationship to older information exchange formats like ASN.1, and declares that the key point of XML is that it doesn’t describe the format of the data, but its meaning.

Personally, I am happy about the widespread adoption of XML – writing parsers is extremely hairy, and defining a document format in a non-ambiguous manner is doubly so. An open interchange format can give a boost to the development of applications, since the tedious tasks of serialization and transport can be delegated to the XML library. On the other hand, I do think that XML is excessively noisy for some applications, and that more efficient interchange formats exist, for example for purely numerical data.

Wednesday, 16 February 2005

SHA-1 Broken

Filed under: — Sebastian Kirsch @ 11:26

Bruce Schneier reports that SHA-1 has been broken. SHA-1 is the Secure Hash Algorithm, a 160-bit hash function designed by the National Security Agency in 1995. Schneier’s entry is scarce on details, as the original paper has not yet been published. The attack was devised by Xiaoyun Wang, Yiqun Lisa Yin and Hongbo Yu from Shandong University in China, who previously published attacks on the MD-5 hash algorithm.

Of course, one has to take into account that a cryptographer’s “broken” is different from anyone else’s “broken". For one, it doesn’t mean that all applications using SHA-1 will magically stop working, or will suddenly be insecure.

The attack on SHA-1 is a collision attack, not a preimage attack. A collision attack is an attack in which two different data streams are produced that hash to the same value – and therefore can be used interchangeably as far as the hash value is concerned. In difference, a preimage attack would allow the generation of a data stream that hashes to a specific hash value. Because the attack on SHA-1 is a collision attack, it cannot be used when SHA-1 is used as a message authentication code. It only affects its use as a digital signature algorithm.

Another thing is the scale of the attack. It reduces the number of hash computations needed to find two data streams with the same hash value from (theoretical, brute force) 280 to 269. This is factor of about 2000. The practical effect of this is rather negligible, as 269 is still a pretty large number.

So the attack on SHA-1 is noteworthy primarily because it dispels the belief that SHA-1 is just as secure as a random function and can only be attacked by brute force. This in itself is not surprising – in fact, it is the basic premise of cryptoanalysis: That a better method than brute force is possible. The “major, major cryptoanalytic result” touted by Schneier seems to be that attacks on the SHA family of hash functions were not previously known.

So, what to do now, since SHA-1 is “broken"? As a software developer, I would not be too alarmed by this result. As detailed above, the hypothetical attack only concerns digital signatures – all the other areas where SHA-1 is used are still safe (for example, password hashing or integrity checking for files.) And even then, the required effort is still too large for casual usage.

As a cryptoanalyst, I would look forward to the publication of the paper. And as everyone else, I would start a bet on the time of the first attack on the lesser-known hash algorithms like RIPEMD-160.

Saturday, 12 February 2005

AuthImage: CAPTCHA! Gotcha!

Filed under: — Sebastian Kirsch @ 13:55

I added CAPTCHA support to my blog now, to curb blog spam. A CAPTCHA is a kind of reverse turing test: It is intended to allow humans to access a certain function, but keep out automated programs. (For example programs that deposit spam in the comments section of a blog.) Nowadays, this is usually done with an image that contains a short word, or a some characters, but twisted, distorted and with a distracting background. This is an attempt to foil OCR (optical character recognition) programs.

I used the AuthImage plugin for Wordpress, by Gudlyf. The installation was relatively straight-forward, but I had to make a couple of changes:

  • authimage-inc/image.veriword.php: This file used the tag <? to introduce PHP sections instead of the more common <?php; this prevented my web server from processing it properly.
  • authimage-inc/class.veriword.php: I added cache control headers to the outputImage method to prevent caching of the image.
  • The README.txt file mentions wp-comments.php and wp-comments-popup.php as places where to add the CAPTCHA section, but forgets wp-comments-reply.php.

I also added a short explanatory text, explaining how to get a new image if you can’t decipher the current image, and that a comment with the wrong code will still appear on the web site, it may simply take a while. This way, people who enter the right code get instant gratification, and those who can’t decipher the image will still get their comment posted. (And the CAPTCHAs from this system can be really hard.)

A short anecdote from the history of CAPTCHAs: A couple of years ago, German email provider web.de had a free SMS gateway. They tried to limit abuse of this system by providing a simple CAPTCHA: An image with a word (without distortion or background noise) that one had to enter in order to send the SMS. It turned out that this system could be defeated with a 60-line shell script, using lynx and the free OCR system gocr. The complete details are here.

Friday, 11 February 2005

Python string handling

Filed under: — Sebastian Kirsch @ 01:30

LUUSA runs the feed aggregator PlanetPlanet! on planet.luusa.org, which also subscribes to my weblog’s feed.

PlanetPlanet is written in Python, and we got bitten by some peculiarities in Python string handling, specifically the conversion between byte strings and unicode strings.

For some reason, it appears that the feed parser puts all parts of the content into byte strings (even if they contain unicode characters), but sometimes, very rarely, constructs unicode strings. These typically contain hyperlinks with, let’s say, “strange” URLs, for example URLs with query strings. In this case, it was the URL http://ithaka.ikp.uni-bonn.de/cgi-bin/lv/view.pl?lvNummer=3919&semDir=winter0405. I haven’t been able to identify the exact cause yet.

When it tries to merge the byte strings and the unicode strings, this error occurs and causes the offending feed to be ignored.

I found a very strange workaround for this problem: By converting all unicode strings to byte strings (unicodestring.encode("utf-8″)) and all byte strings to unicode (bytestring.decode("utf-8″, “ignore")), I was able to make the error disappear. I still don’t know what caused it, and why this method caused it to disappear.

Our version of PlanetPlanet uses feedparser.py 2.7.6 by Mark Pilgrim; the error occurs in the output method of the class BaseHTMLProcessor of feedparser.py. There’s a version 3.3 of feedparser on sourceforge; we’ll have to see whether it’s a drop-in replacement for our version, and whether it fixes the problem.

Mark Pilgrim, the author of feedparser, also has a few choice words to say about Python and unicode:

I had a flash of insight and suddenly the entirety of Python’s Unicode support became clear to me. I coded madly for several hours until it faded. It’s entirely possible that that’s just the LSD talking, but thanks to the magic of open source, everyone can now share in my good trip.

Tuesday, 08 February 2005

Guardian Unlimited on tag-based information sharing

Filed under: — Sebastian Kirsch @ 18:28

The Guardian has an article on tag-based information sharing under the title “Steal this bookmark!” (though I don’t know what tagging has to do with Abbie Hoffmann.) They write about del.icio.us, Flickr and 43 Things, as well as the tag-tracking studies by Technorati. They also make a good job of describing the problems with tags – namely, that they are a complete nightmare to AI researchers who try to fit the whole world into a carefully constructed, strictly hierarchical, redundancy-free and consistent semantic network. On the other hand, tags seem to work, whereas carefully constructed … etc. don’t.

Monday, 31 January 2005

The Sapir-Whorf hypothesis

Filed under: — Sebastian Kirsch @ 14:50

I learned the correct name today for a theory I have known for a very long time: The concept that your language shapes and influences your thoughts is formally called the Sapir-Whorf hypothesis. I have also heard it attributed to Ludwig Wittgenstein and Ferdinand de Saussure; more information is on wikipedia.

The Sapir-Whorf hypothesis had an influence on science as well as literature. George Orwell’s classic “1984″ contains a language called Newspeak that is designed to eradicate illoyal thought by abolishing words like “free". The reasoning is that once you cannot fight for freedom, because the very word does not exist in your language, rebellion becomes pointless. Another example is Samuel R. Delany’s “Babel-17″, where a fictional language denies its speakers independent thoughts and transforms them into traitors and terrorists; it is used as a weapon of war. The notion that certain works lose their meaning when translated to another language (for example, the Quran) is also an example of this principle.

The validity of the Sapir-Whorf hypothesis is under dispute; if it were all true, then translation between different languages would be impossible. Chomsky’s quest for a universal grammar that is innate to every human also goes counter to the Sapir-Whorf hypothesis. But neurolinguistic research points to the fact that certain areas of the brain are indeed influenced by a speaker’s native language.

Leaving the realm of human languages, how does this hypothesis apply to computer programming languages? From a purely functional point of view, all programming languages are equal – they are all turing complete, which means that any of them can be used to solve any problem as well as the others. So as regards the computer, it does not care about which programming language is used.

The reason why there are still hundreds of different languages and language dialects is not the computer, but the human who has to read and write the code. So the fundamental problem in language design is not designing a language that will allow you to do as much as possible, but one that will allow you to think about your problems in the most meaningful way.

The people who invent those languages appear to have been aware of their effect on the human mind for quite some time. Everyone knows Edsger Dijkstra’s quote, “The use of COBOL cripples the mind; its teaching should therefore be regarded as a criminal offense.” And of course the famous “A FORTRAN programmer can write FORTRAN in any language.” The implication is that your first programming language indeed shapes your mind and determines how you write code in subsequently learned languages.

In my experience, the language you use does indeed affect how you think about problems. I started out with imperative programming languages (C, Pascal), went on to object-oriented languages (Modula-3, Delphi, C++), learned functional programming in my first year at university (with Standard ML), and acquired a host of languages on the way (among them Perl, Python, PHP, PostScript, SQL, and a number of “lesser” languages like Bourne Shell or awk). The only thing I never properly learned is a logical programming language like Prolog (though I did hear a lecture about automated theorem proving, which essentially detailed the inference process of Prolog.)

Object-oriented programmers (especially those reared on UML etc.) tend to see everything as objects that pass messages to each other; objects with a state, a lifetime, and methods that can be called. While this is suited to most modeling tasks, good object-oriented design for other domains is very difficult, as no readily apparent mapping between parts of the problem and a hierarchy of objects exists. That is one reason, I think, why all the exercises in classes on object-oriented software design are always about modeling some part of the real world, and not about modeling an abstract concept. Tasks like stream processing or symbol manipulation are hard to capture properly in object-oriented models.

Purely functional programming is another extreme; it is extremely well suited for symbol manipulation and stream processing. When I learned it, functional programming inspired a view in my mind of lists and other data structures, flowing through functions and getting mangled and massaged on the way. I solved some exercises for the above mentioned lecture on theorem proving in ML, and the code was exceptionally clear and concise; some theorem provers were less than 20 lines of code. But once imperative elements enter the language, and the need to keep state arises, the functional model becomes rather unwieldy. There are object-oriented functional languages (OCaml being the most prominent example), but I have no experience with them.

Further evidence that the code you write influences how you think about and formulate algorithms offline is this: I encountered a number of algorithms that were exceptionally difficult to program in a modern, imperative or object-orientated language. They were formulated without loops or subroutines, and used jump instructions excessively. The only explanation I can think of is that maybe the authors were used to programming languages that promote this style, like fortran or assembly language. Writing a clean C++ or ML version of such an algorithm requires extensive reformulation, whereas other algorithms (for example, the ones mentioned above) could be implemented in ML in a minimum of time, straight from the description.

But the fact that different programming models promote a different approach to problems can also be used to your advantage. If you know a number of languages, then you can – bar external requirements – choose the language for solving your problem that makes it easiest to think about it. If you are doing symbol manipulation, choose a functional programming language (or a language with substantial functional features, like Python). If the task is modeling, choose an object-oriented language. If you want to query complex data structures, use a logical language like Prolog. Especially in research, many problems can be solves more efficiently by combining a hodge-podge of different languages and use each one in the area where it is best.

For commercial software development, other factors like maintainability tend to be more important, but even there, using an embedded language for configuration or extension can be an advantage.

There are also hybrid languages, like OCaml, which is a functional object-oriented language, or Curry, which is a logical-functional hybrid. Their effect on the programmer’s psyche and problem-solving technique is unclear to me.

And other programming languages claim to be agnostic of the programming model, for example C++. While C++ is mostly associated with object-oriented programming, Stroustrup maintains that other styles like functional programming are also possible in C++. Because of the poor support of features like garbage collection in C++, it does not provide the comvenience of other functional programming environments, though.

Tuesday, 25 January 2005

Blogs and search engines

Filed under: — Sebastian Kirsch @ 11:50

Judging from personal experience, responsible behaviour towards search engines is not every site owner’s first priority. But if a sizeable part of your visitors come from Google, you should pay attention to the way a search engine sees your site, and make sure that the visitors who come from a search engine get the optimal results from your site.

How your site is indexed can be controlled in a number of ways; they are described on the robots.txt web site.

For blogs, the most important thing is that the front page and the archive pages should not be indexed; only pages for the individual posts should be indexed. Why is that? Two reasons: What if Google indexes your front page, and serves it as a result two weeks later, when the indexed content has already dropped off the front page? Users will go to the front page, will not find what they are looking for, and go on to the next result. They will not search the archives of your blog in the hope of finding something relevant. And if Google indexes an archive page, it may present this page as a search result because of keywords that are in two different posts. Users will find their keywords, but the individual posts (each with one of the keywords) will not be relevant, thus they will move on to the next result. Sure, one more hit for your web site – but in the end, users will disregard your web site, because “there’s never something relevant on that site.”

So how do you accomplish this? Put the tag <meta name="robots" content="noindex, follow"> on your front page and archive pages, and <meta name="robots" content="index, follow"> on the pages with the individual posts. This will cause the search engines to not add the front page to their index, but still follow all links on the index page to the individual posts.

There is a thread about how to do this with WordPress in their support forum (WordPress is the blogging software I use.) I use the following code in index.php:

        <?php if ($single) { ?>
                <meta name="robots" content="index,follow" />
        <?php } else { ?>
                <meta name="robots" content="noindex,follow" />
        <?php } ?>

Another thing was detailed in a Slashdot article some weeks ago as a way to curb comment spam: Flag all links in comments with a tag “rel=nofollow", so that a search engine won’t follow them, and so that they won’t contribute to a site’s search engine rankings. That way, comment spam will be pointless. I hope there will be a WordPress plugin for this soon.

Bloggers are often accused of polluting the search engine results, and of misleading the search engine’s ranking algorithms because they link so heavily to each other. (Popular search engines like Google use the number of links that point to a web site to determine its popularity.) It’s our choice to do our share and play nicely with the search engines!

RSS Feed

Filed under: — Sebastian Kirsch @ 09:35

I activated the rss feed for this blog. Have a look at the right-hand column, under “meta", or at the <meta> tags.

Providing rss feeds and permalinks requires some trickery with mod_voodoo (aka. mod_rewrite), and I didn’t get around to doing that until now. Sorry!

Tuesday, 18 January 2005

Illustration software

Filed under: — Sebastian Kirsch @ 22:17

Preparing a paper for a seminar, I’m once again faced with the problem of creating illustrations. Just some basic diagrams of graphs, trees, points, lines, arrows, some formulas.

I have a couple of rather simple requirements: Should produce decent-looking pictures, should provide means of specifying sizes, spaces and alignments in a meaningful way, should export to somnething that can be integrated with pdflatex (because I’m writing the paper in LATEX.)

Oh, and I’m cheap too. I know I could buy professional graphics software (like Adobe Illustrator) for something like EUR700, but could I afford it? No. Would I ever use it to its potential? No. My old favourite, CorelDRAW!, is not available for Mac OS X, unfortunately.

After trying a couple of alternatives – like OpenOffice.org Draw, and OmniGraffle, which came with my PowerBook – and asking a couple of friends, I found myself going back to the traditional tools: MetaPost and XFig.

MetaPost is kind of like a programming language for pictures, with the ability to solve systems of linear equations. That way, you almost never have to specify any coordinates – you just say the equivalent of “Well, those two boxes are 2cm apart, and they are evenly spaced, and the center of this box and that one is in one line", and the layout is done automatically. MetaPost can process your text using LATEX, so math is no problem, and you can use the exact same fonts as in your main documents. MetaPost can be imported in pdflatex natively, too.

XFig is more of a traditional drawing program. It was one of the first drawing programs available for SunView, written in 1985, and later ported to X11. That means it has been around for about 20 years! The user interface is somewhat strange (for example, it relies heavily on having a three-button mouse), but overall, it’s rather usable. And one very important point is that it allows exporting to MetaPost – having the same graphics format and the same fonts provides some sort of continuity for graphics that were made with different software.

So I use MetaPost for things that are easy to specify algorithmically (tree structures, for example) and XFig for everything else. Both programs are scriptable, so creating a little workflow that recompiles everything when a drawing changes is not too difficult, using a Makefile.

I still wonder whether my choice of software limits me in my graphical expression; after all, good illustrations are an integral part of a good paper, and can serve to explain much better than text or formulas.

Oh, the topic of the seminar is information extraction; I’m reporting on relation extraction using kernel methods. The paper and slides should appear on my academic page as soon as they’re finished.

Sunday, 16 January 2005

Jeff Dean’s behind-the-scenes look on Google

Filed under: — Sebastian Kirsch @ 21:18

In this 1-hour lecture (available as a Windows Media or Real Video format streaming video), Jeff Dean talks about Google’s architecture, why they do things the way they do it, and the frameworks they developed for their applications. He also gives a few examples of Google’s current and future projects.

Google is usually pretty secretive about the inner workings of their products; apart from the original papers on PageRank, some engineering papers about the Google File System, little is known about their software. On the other hand, the number of successful projects that came out of google in the last few years is astonishing: Groups, Pictures, Gmail, Blogger, Froogle, Desktop Search, localized search, Scholar, etc. pp.

Furthermore, these are difficult things to do. Research Index has been trying to do for years what Google Scholar is doing now: An exhaustive search aid for scientific papers, finding every single place where a certain paper is published, extracting authors, dates, references etc. Research Index always struggled because of lack of funding and computing power. For Google, it seems to work quite well. (I find myself using Google Scholar more often that Research Index, because it’s more reliable.)

So what is the secret behind Google’s success? I haven’t been able to work that out. They hire top-notch people (the list of papers by Google employees is very impressive), they have enormous amounts of computing resources (though Google never published any numbers, estimates range from 70.000–100.000 CPUs), access to huge datasets (their whole web index, basically), and the employees are encouraged to actually use those resources for their research projects: Every employee gets to devote 20% of his time to research, and apparently it’s not unheard of to requisition a cluster of a couple of thousand CPUs for a research project.

Do I want to work there? Hell yes.

Friday, 07 January 2005

Chaos everywhere

Filed under: — Sebastian Kirsch @ 01:28

There is a common perception that software projects are notable for their rate of failure, but hard data about this fact is difficult to come by. Prominent examples in Germany are the recent toll collect disaster (which delayed a toll for lorries on German motorways for years) and the introduction of the new unemployment insurance (Arbeitslosengeld-II), which caused the bank account data of thousands of people receiving dole to be garbled, resulting in them not getting their money.

Some data is found in the Standish Group’s chaos reports. They report that in 1994, a staggering 16% of all surveyed software projects were completed on time and in budget; 31% of the software projects are never completed at all. The figures for 2000 are slightly better; 28% of the projects succeeded and only 23% failed outright. So, all the advances in software design, object-oriented programming, modeling tools, CASE tools, IDEs etc. managed to decrease the failure rate by 3/4 and increase the success rate by 3/4.

Still, this is a pretty dismal track record. If we build houses the way we develop software in 2000, the first woodpecker would still destroy civilisation. For two thirds of the software projects, it means a very bumpy ride to completion, requiring either massive cost overruns or severe delays, and if you are already on that track, you only have a two-thirds chance of ever completing the project.

The reports also contain a detailed analysis of the cause of failure; I think this should be required reading for any IT project manager. Unfortunately, these reports were not part of any of my software engineering classes.

Wednesday, 22 December 2004

Math software

Filed under: — Sebastian Kirsch @ 09:05

On the subject of open source math software – this one isn’t exactly open source, but it’s free as in beer, nonetheless: The original Graphing Calculator from Mac OS 9 is available for free as a beta version for Mac OS X. Looks neat.

Monday, 23 August 2004

When a dynamic linker isn’t that dynamic

Filed under: — Sebastian Kirsch @ 13:15

Another OS X hurdle for Unix geeks: In OS X, there’s no such thing as a dynamic linker search path.

Every library has its full pathname compiled into it.

This means that for all practical intents, you have to compile a library for correct path. No relocation. You compile a library for /usr/lib, you cannot install it in /opt/lib. This is mildly irritating. If you compile a library for /usr/lib, and install it in /opt/lib, and then link a program against this library, the link editor will cheerfully link the program against the library in /opt/lib. It’s only when you try to run it that you’ll realize that the dynamic linker is looking for the library in /usr/lib. Oups.

There’s two ways around this problem, but none is entirely satisfactory.

The pathname compiled into the library can be relative to the path of the executable that’s linked against it. This allows for libraries belonging to one package residing in
/lib, and the executables residing in
/bin. The pathname compiled into the library is @execution_path/../lib.

You can also specify the actual pathname of a library using the -dylib_file flag for the compiler. This is roughly similar to the -R linker flag under Linux or Solaris, or -rpath under Irix. But in difference to Linux or Solaris, the -dylib_file flag doesn’t specify a search path (remember, there’s no such thing as a dynamic linker search path in OS X); it rather specifies the full pathname of the library. You also have to specify the intended (compiled-in) pathname of the library as a second argument to -dylib_file.

Well, it’s not quite true that the dynamic linker doesn’t have a search path. You can set the environment variable DYLD_LIBRARY_PATH (similar to LD_LIBRARY_PATH) to specify a search path.

So in the end, it’s just like every other version of Unix: The dynamic linker doesn’t find your libraries, you give up after a few tries, sigh, and set DYLD_LIBRARY_PATH. Same procedure as last year.

Friday, 20 August 2004

unison; or: Why I don’t like Mac OS X software

Filed under: — Sebastian Kirsch @ 00:18

I recently bought the most wonderful little toy, a PowerBook G4 12″ with all the bells and whistles. I’m a Unix professional, and since it’s almost impossible to find a Unix notebook, the PowerBook seemed to be a good choice. It’s based on a BSDified Mach kernel, with a BSD userland. Having worked with some half-dozen Unix operating systems, I thought that one more or less wouldn’t make a difference.

Well, I was half right. As long as you only look at the Unix part, working with Mac OS X is indeed quite pleasant and familiar.

The problem starts when you realize that there’s more to Mac OS X than the Unix part. For one thing, there’s the original (pre OS X) Mac OS culture, and there’s also people who write for Unix, but don’t understand Unix.

This was exemplified when I recently looked for an application to synchronize my files with my desktop computer. (I’m running Linux on the desktop.)

Asking the local OS X gurus didn’t provide helpful answers. Most answers pointed me to tools for synchronizing with other Macs, which isn’t what I wanted to do. (For example, there’s psync, which uses a Perl module called MacOSX::File. Not likely to work under Linux.)

Now what does an old Unix hand think of for file synchronization? rsync, of course. The problem is that rsync only works in one direction: It can create and maintain a mirror of a filesystem very efficiently, but it can’t synchronize two filesystems. One helpful soul directed me to a tool called syncIt! that was mentioned on a Mac news site that very day. Hm, it’s supposed to be based on rsync, so I thought – what kind of magic does the author work to use it for synchronization?

None, it turned out to be. The whole “syncIt!” application – apart from the cool name – is just a shell script that does nothing but run rsync twice: first in one direction, then in the other. All nicely packaged with Cocoa dialog that asks you for the hostname of the remote computer. That’s it. You can’t even tell it to exclude some files, or browse differences, or the myriad of other features one might envision for a synchronization tool. You can’t even save the hostname of the remote site. Nothing. It’s just a fricking shell script. So that’s how you get on the front page of a Mac news site.

That did piss me off a bit. At the danger of sounding elitist, where I come from, we don’t create 100s of KB of “application” just for running two commands. We may pass those two commands around and tell people, “That’s how I do it, have a look at it.” But an application? No. That word is reserved for something that actually achieves something, and does not merely repackage a ten-year-old tool.

But there’s hope. Specifically, there’s unison. Supposedly, it’s a real file synchronization tool, written in OCaML, cross-platform (Unix and Windows), fast, and stable. Sounds good. In principle. It could be better.

I tried unison 2.9.1, compiled with OCaML 3.07 from fink.

First thing that’s problematic is the Mac’s file system. Mac OS X doesn’t use a regular Unix filesystem, but the regular Mac OS filesystem, called HFS+. The Unix part of Mac OS X just sees a POSIX-like view on HFS+. One of the brain damages that HFS shares with Windows is that it doesn’t discern the case of filenames – at filesystem level! This is quite contrary to the Unix way of treating filenames: Just stuff anything in it you want, as long as it doesn’t contain “/” or a null byte, we don’t care. Unfortunately, Mac OS X does care.

Being used to this woe of the Windows world, unison has a switch called “ignorecase". This should come in handy, if for one thing: Once you activate this switch, unison presumes that you are working on a Windows filesystem. And that means that several filenames that are perfectly legal on HFS+ are presumed to be illegal, for example filenames that contain “:".

unison detects when you try to synchronize such a file, and aborts the whole synchronization process. But it doesn’t tell you the filename of the offending file. You are left to guess which one of your 2GB of files caused the error. Then you start the synchronization again. To find that half-way through those 2GB, there’s another offending filename that you haven’t thought of yet. I still haven’t managed to weed all of those files from my home directory.

Please, dear unison developers – when you provide a switch, make sure that it does what it name says it does. If the name says “ignorecase", then it should be set to ignore the case of filenames. If it’s called “windowsfilesystem” or “fatfs", it should accomodate for the quirks of Windows.

Once you managed that, a more graceful way of failing would be nice. unison is interactive anyway, and it’s supposed to change both replicas. So why don’t you provide a way of resolving those conflicts? For example, a way of renaming the offending files before transmitting them? That would be really dandy.

As of this release, unison does not work with filesnames that contain accented characters as well. I haven’t been able to work out yet whether this is unison’s, OCaML’s or Mac OS X’s fault. Another class of files to weed from your home directory, because they will cause your synchronization to abort half-way through.

UPDATE: It appears that Mac OS X itself blocks filenames that are not valid Unicode (or UTF-8) strings. A small C program verifies that if you call open("b\374rger",…) ("bürger” in ISO-8859-1), you get an “invalid argument” error. open("buerger\xC2\xA9lars",…) ("buerger©lars” in UTF-8) works correctly, though, and also displays correctly in Finder.

So unison would have to include character set conversions for synchronizing between different operating systems. This is not a nice prospect.

I still don’t think that an operating system should impose semantics on filenames at system call level.

Still, unison is more powerful than rsync, and I hope I can use it on a daily basis once those quirks have been worked out.

Next I’ll be trying to cobble together some kind of backup solution. I haven’t found a suitable native OS X application, so I’ll try to cobble something together using amanda and hfstar. Should be fun …


Copyright © 1999--2004 Sebastian Marius Kirsch webmaster@sebastian-kirsch.org , all rights reserved.