Where Has All the Notation Gone?

About Hungarian Notation

Hungarian notation is a practice of using prefix characters to provide information about a variable’s scope and type. Here are some examples:

 sdtmExpires (static, date/time)

 lblnExpires (local, Boolean)

 rblnExpires (parameter by reference, Boolean)

You can adopt your own scheme or use someone else’s, but whatever you do, be consistent.

 

Somewhere along the way, creating maintainable code became uncool.

When Microsoft introduced the .NET framework with its strongly-typed languages, many developers chose to drop variable prefixing schemes like "Hungarian notation." Hungarian notation gets its name from the fact that the variable names it produces look like something written in a foreign language, and because its originator, a Microsoft employee named Charles Simonyi, was from Hungary. Interestingly, Microsoft itself was the first to suggest that Hungarian notation is no longer necessary.

This article suggests a few reasons why you should reconsider using Hungarian notation in your .NET applications. I believe that, as your .NET application matures into its maintenance cycle, you will discover that you need all the help you can get to figure out what that code you wrote six months ago is doing. And you won’t want to hover over every variable in the IDE to get those clues.

The arguments I’ve heard in favor of dropping Hungarian notation include the following:

  • Strongly typed languages don’t let you make the same assignment errors that you can make with a weakly typed language, so keeping track of variable data types is not necessary.
  • The IDE shows you the data type of the variable.
  • Good variable names should describe semantics, not type.
  • It’s hard to read.
  • It is hard to type, and adds extra keystrokes to your coding.
  • It’s hard to keep track of the right prefix to use. You spend more time trying to figure out how to name variables than you do writing code.

All of these arguments are valid to one degree or another, but now that I’ve had to review code from other developers who don’t use Hungarian notation, I have to conclude that, collectively, these arguments don’t hold water. After researching other developer’s opinions on the matter, I’ve concluded that most of the developers who are the most vehement about getting rid of Hungarian Notation just didn’t like using it in the first place.

In spite of the current anti-notation climate, I’ve noticed that prefixing schemes are creeping back into code. Unfortunately, these new schemes aren’t standardized, which leads to still more unmaintainable code. For example, some developers have taken to putting an underscore prefix on private variables, but they don’t do it consistently. Or, worse yet, they use a camel-case variable name for the private variable that holds the value of a public property of the same name in Pascal case (like expirationDate versus ExpirationDate). Now, tell me how that is supposed to improve readability or reduce the likelihood of errors?

Another place you still occasionally see Hungarian notation is in control names (like txtUserName for a user name textbox). The same people who sneer at using notation in code will use it on controls. When I’ve asked what makes controls special, I get a blank stare.

In other cases, I’ve seen variables with nearly identical names that refer to completely different declarations that require very different handling. For example, I’ve seen code where the primary property of an object was copied into a string variable that had a nearly identical name to the object itself. For example: HttpCookie cookie = Request.Cookies["test"]. So is the "cookie" variable an object or a string? Granted, the code probably wouldn’t compile if you used it incorrectly, but why create the confusion in the first place? During a code review, the code itself has to be clear regardless of how much the compiler does for you. (Actually, I seriously doubt there are many code reviews going on right now in the rush to push out .NET projects.)

Using Hungarian notation is no panacea for these issues, but it does force you to think about what you are doing, and by its very nature it imparts valuable contextual information.

I’m not saying that you can’t go overboard with Hungarian notation. In fact, most of the guidelines I’ve seen do take it too far, which is why I developed my own simple standards for use at our company.

My philosophy on the matter is straightforward: use notation to indicate scope and usage, but don’t try to invent a prefix for everything. I use a specific prefix for intrinsic types, and a generic prefix for all user-defined types. I’ve seen standards that recommend you come up with a new prefix for every type you create, but I think that road leads to madness (and lack of compliance in your development staff).

Saying that you don’t need to know the data type of a variable in a strongly typed language is hogwash. I agree that context will frequently give you the clues you need to identify the data type of a variable, but only when literals are involved. If you really want to understand the code you are looking at, Hungarian notation provides invaluable clues.

I do agree that good variable names should describe semantics, but I’d argue that the data type is frequently a critical part of the semantics. If you have two ways to dereference an object, say a key and an identifier, doesn’t it help to know that the identifier is an integer and the key is a string? Or that the key is a GUID? Ultimately, you have to care about the data type of your variables, particularly if a database is involved.

Okay, so how about scoping? Unless you are reviewing code in the development environment where you can quickly jump to the declaration (a trick some developers don’t even know is possible), you can’t reliably determine the origin of a variable in today’s notation-less environment by just looking at it. Again, I’d make the argument that the scope of a variable is a critical part of its semantics. It is certainly a critical part of understanding the proper usage of the variable.

As for Hungarian notation being hard to read, well I guess I can’t argue that one much. But I suspect that the truth of the matter is that this argument is really based on Hungarian notation being harder to type, not read. Many developers do not know how to touch-type, and for the hunt-and-peck crowd, anything that leads to extra typing and unusual combinations of characters is anathema. As long as your prefixing scheme is relatively simple and consistent, code readability actually improves, at least from a comprehension standpoint.

One place that I agree Hungarian notation gets in the way is in public interfaces. For example, I wouldn’t recommend using Hungarian notation to name the columns of a table or the properties of a class. Their scope is implicit, and developers usually do a better job of providing meaningful names that hint at the underlying data type.

I can go either way with parameters. On the one hand, parameters are a form of public interface, so Hungarian notation should be shunned. Besides, the object browser tells you the parameter data types, and IntelliSense gives you the clues you need as you type your code. On the other hand, parameters are effectively locally-declared variables, so notation is quite useful once you drop into the body of the code.

Over time, I think you will start to see a return to some kind of variable prefixing scheme. As today’s code moves out of development and into maintenance, the ability to quickly understand the function and purpose of the code will once again become important. I’m afraid that the code I’m seeing now just won’t cut it during future maintenance cycles.