Log in

No account? Create an account
Gavin Greig [userpic]

Anthropomorphic Hungarian

February 12th, 2012 (05:42 pm)
current location: KY16 8SX

There’s a partially justified tendency among software developers to say “Eeeewwww!” when the subject of Hungarian notation – a naming scheme for variables – comes up. It’s fair to say it has a bad name. I don’t like it either, in its most common form; but I said “partially justified” because it isn’t a simple open and shut case.

What people tend to think of as Hungarian when deriding it is Systems Hungarian; where variable names are prefixed with something identifying the type of the variable. This is how you wind up with identifiers like szName or even lpszName, where that indicates a long pointer to a zero-terminated string (piece of text) containing a name. As you can imagine, code with a lot of that sort of stuff in it can become a bit tricky to read. When the variable names are dancing before your ideas, you know it’s time to take a break. Let’s not get into what long pointers are, and use szName for our examples from here on.

It’s also a bit fragile, because if you change the underlying type (say from an int to a double, when you realise that you need to store a real number rather than an integer) and you forget to change the variable name at the same time, then your “self-documenting” code is suddenly no longer quite as helpful as it was.

The Systems Hungarian style was used by the Microsoft Windows development team, and will be recognised by anyone who’s had to deal with the Windows API.

That’s not where Hungarian notation began though. It was originally proposed while working at Xerox PARC by Charles Simonyi,  who was born Simonyi Károly in Hungary (Hungarian surnames come before given names). Simonyi later moved to Microsoft, where he became Chief Architect. You can read the content of the original, fairly short proposal in the MSDN library.

Simonyi’s version of Hungarian notation, which was used by Microsoft’s applications division, later became known as Applications Hungarian to differentiate it from the Systems Hungarian developed by the Windows development team.

Applications Hungarian takes a slightly different approach to choosing those prefixes. In Applications Hungarian, the prefix is meant to encapsulate some semantic information about the variable – it’s meant to give some idea what the purpose of the variable is, rather than exactly what type it is. For example, Applications Hungarian would prefer strName to szName, because all you need to know for most purposes is that the name is a string rather than needing to know exactly what type of string it is (many different implementations of strings exist for C/C++).

This emphasis on semantics helps to convey more information the variable and how it is expected to be used than just its type. Another example from would usName: the us prefix indicates that the variable is an unsafe string (probably user input) and that it needs to be checked carefully before it’s used.

Applications Hungarian is also a bit less fragile than Systems Hungarian, because the information its prefixes contain is less likely to be invalidated by changing the underlying type of the information. To use the name example, you could switch to a different string implementation, but you would still have a string containing a Name.

However, both Systems Hungarian and Application Hungarian share the flaw that they can be a bit cryptic and difficult to read, especially with a proliferation of prefixes. Both have fallen out of favour for this reason, and even Microsoft recommend against use of Hungarian notation when coding against the .NET framework which has been introduced since 2001.

When Insights moved from C++ to .NET development in C# in 2005, we wanted to come up with a naming convention that would comply with Microsoft’s recommendations, but also contribute to the expressiveness of our code. What we came up with was what I’ve decided to call Anthropomorphic Hungarian.

Microsoft’s recommendations for naming variables and parameters – which you can read in more detail if you wish – at their simplest boil down to:

  1. Use Camel Casing
  2. Don’t use Hungarian notation

Strictly speaking, Camel Casing here means lowerCamelCase, in which the the first letter in the first word of a compound variable name isn’t capitalised. Its sibling, UpperCamelCase, in which the first letter in the first word is capitalised, is used for other purposes and is referred to as Pascal Case. We’re not discussing the situations where Pascal Case should apply.

Although Hungarian notation is fairly unequivocally deprecated, lowerCamelCase has a couple of significant drawbacks. It tends to de-emphasise the first word. Using the example of lowerCamelCase itself, this is unhelpful because “lower” is actually the most significant word in the name; the one which differentiates lower camel case from upper camel case. Also, what happens when your variable name is a single word and not a compound word? The resulting variable name entirely in lower case looks a bit incongruous and out of place. Small speed bumps like this can have a surprising effect on the readability of code.

Anthropomorphic Hungarian aims to satisfy the Microsoft guidelines, provide some semantic information about variables, and improve the readability of the code.

To do this, Anthropomorphic Hungarian follows these principles:

  1. It uses a very small set of approved prefixes. One of the things that can go wrong with semantic prefix schemes is that the number of prefixes expands to cover more and more unforeseen types of information, and ultimately the scheme conveys less information as the list of prefixes becomes too difficult to stay on top of.
  2. Although not quite all Anthropomorphic Hungarian prefixes are made up of English words, English words are strongly preferred; sometimes the sort of words you might use if you were trying to explain the code in speech. This increases the readability of the code.
  3. The prefixes should encourage developers to make the rest of the name useful, by starting with English words that can lead into something more expressive.
  4. The prefixes must be short. They should add meaning and ease of reading to the code without being too onerous to type, or taking up too much space.
  5. The prefixes must not contain primary information about the purpose of the variable; they’re a way of recording a small of amount of useful, but secondary, metadata. As a result, it doesn’t matter that lowerCamelCase tends to de-emphasise the prefix.
  6. The prefixes are largely, though not exclusively, concerned with scope – where the variable is declared and therefore where it can be used. This is a usage of the older forms of Hungarian that developers are notably reluctant to give up with regards to prefixes for class member variables, and it is also useful in other circumstances.
  7. Classes and interfaces are anthropomorphised. This is the most distinctive feature of Anthropomorphic Hungarian, and the source of the name.

Without further ado, here are Anthropomorphic Hungarian prefixes as used at Insights. As you’ll see, most prefixes are only two or three characters, with a rare maximum of four:

Anthropomorphic Hungarian Prefixes
Prefix Meaning
the A local variable within a method.
in A parameter that is only passed into the method.
my A member variable of a class. It’s “my” variable from the point of view of the anthropomorphised class.
a/an A member of a collection or a loop control variable (often these are one and the same thing).
ui/ux A member variable of a class that represents a user interface control. We agreed on ui, my personal preference is ux. This is an example where the prefix is not an English word.
ICan Not for a variable in this case. There are two possible descriptions of the service that an interface provides to a class that implements it. This one, which is usually preferable, is for an interface that provides a particular behaviour.
IAmA/IAmAn Sometimes an interface is defined in such a way that it’s more like a base class than a description of a service. This prefix will be more appropriate for those cases.

The prefixes for interfaces meet the Microsoft recommendation that the names of interfaces should begin with “I”, but encourage more expressive interface names. When a class implements an interface or interfaces, the interface names in its declaration form a positive statement in English on the part of the anthropomorphised class as to what contracts it satisfies.

Here are a few prefixes that were also agreed at some point, but are seldom or never used:

Seldom Used Prefixes
Prefix Meaning
is A Boolean value, with the main part of the name expressing the true condition.
loop This prefix was agreed for loop control variables, but in practice a/an has almost always been preferable.
out This prefix was agreed for reference parameters which are only used to pass information out of the method.
io This prefix was agreed for reference parameters which are used to pass information both in and out of the method.

Much of this isn’t original. “a” has appeared as a popular prefix in SmallTalk for many years, for example, and my thanks to tobyaw and qidane for introducing me to "the"; but I think that pulling all these together into a fairly tight little convention is novel, and has proven quite successful. It’s not always immediately popular with new developers, but it seems to be something people come round to with a little experience of it.


Posted by: Andrew Patterson (qidane)
Posted at: February 12th, 2012 10:06 pm (UTC)

I like "is" and "loop", I still use them along with "a" and "the". Not sure they are popular with the others, but at least I try and resist the urge to add them in to other peoples code to tidy it up.

Posted by: Gavin Greig (ggreig)
Posted at: February 12th, 2012 10:57 pm (UTC)

Yeah, resisting that urge can be hard! I think the reason for "is" being seldom used is that we just don't have all that many Boolean variables. It seems to be more common to have a method call, it's not that we have anything against "is"! There are certainly a few places where it makes sense to have a variable, and in those places "is" is alive and well.

A lot of our loops are controlled by collections, and "a" just seems more natural in that case, so I guess that accounts for the sparsity of "loop" variables.

2 Read Comments