Jamal's Professional Blog

Friday, 27 June 2008

String.Empty vs. ""

Throughout my MS.NET years I was occasionally asked about differences between C# String.Empty and the empty string literal (""). And also when implementing C# and .NET coding-style documents this has always been challenging since everybody has his own opinion, experience, and taste.
A) string s1 = String.Empty;
B) string s2 = "";
Here’s some advice which might come handy:
Performance
String.Empty is a static read-only public field which is initialised by the static constructor of the String class. Simplified MSIL code generated for statement A is ldsfld String.Empty; this simply pushes the reference of a specific field (String.Empty in this case) into local stack of your executing method after invoking static constructor of the class if not previously executed – this is a 5 bytes instruction though. Note that String.Empty field actually assigns the empty literal ("") to a static field only once and during first access to the String class and this is no big deal; then the resulting reference to constant "" value (from string pool - see below) will be reused for your application domain lifetime (your windows process whatever it is, your hosting environment whatever it is…)
On the other hand, the C# statement B results into ldstr "" MSIL statement; again a 5 bytes instruction. ldstr pushes the supplied literal into AppDomain's internal string pool (CLR internal GlobalStringLiteralMap C++ class) if not previously loaded into the map. The reason is obvious: more efficient memory usage by sharing string literals in memory - a technique called string interning.
When comparing the two alternatives, statement A is so straightforward in CLR implementation making it simple and fast, whilst statement B goes through the overhead of checking the AppDomain's string pool (an internal hash-table) bringing very small performance penalty. BUT:
  1. A and B have JIT compile-time differences and JIT-Compiler will eventually generate nearly same results in either cases; this also implies there is absolutely no performance penalty when your assembly [not framework assemblies] is NGENed (using ngen.exe for example)
  2. During JIT-compilation of your executing method there is this very small (nearly zero) performance penalty which is paid only once. In other words, worrying about performance differences between statement A and B is pointless.
Coding Style
Well, whenever it comes to developing an effective coding-style document, I've realised we're not only talking about technology and it also involves aesthetics and human nature! I suppose we all do agree that increasing readability to at least reduce maintenance costs is a generally accepted coding-style design measure; this may hardly become a baseline for coding-style challenges.
What I'll say here, is only my personal opinion and favourite syntax supported by 16+ years of extensive coding effort - yet it does not necessarily reflect community preference: the String.Empty syntax usually results into higher readability by adding distinctness compared to the "" syntax. I'll definitely go for the first one as I feel more comfortable when reviewing and skimming someone else’s code. The "" syntax is not visually clear enough [needs more attention to see whether or not spaces are contained - could make eyes fatigued for very lengthy code]. Also, it is not clear with non-fixed-size fonts where the number of involved spaces matters [in very rare cases – like when using spaces for print or control alignments]. This is because these fonts usually have narrowed space characters not appropriate for extensive coding.

Labels: , , ,

Thursday, 29 May 2008

C# Covariance and Contravariance

Overview
As object-oriented programmers we're all familiar with the concepts of covariance and contravariance, however not all programmers are comfortable with such industry-standard terms.

Let's review what they stand for. Take a look at the following code fragment:

class Vehicle
{
}
class Car : Vehicle //a derived class
{
}
void StartEngine( Vehicle v ); //simple method

Contravariance
Sure you know we may be calling the method StartEngine( ) by providing some instances of Car class. Well, it does make sense as we expect compiler implicitly up-cast the Car instances to Vehicle and then invoking the StartEngine( ) method. Yes, this very simple object-oriented technique is called "Contravariance." Contravariance is supported in almost all object-oriented programming languages. In C# 2.0, delegates support contravariance so that programmers see delegate invocations a bit more like normal method calls. Honestly, I think C# designers should had done this by the first version of the compiler, especially considering their implementation didn't require atypical CLR support.

Covariance
And take a look at this method: Vehicle GetMostExpensive( );
It also does make sense for C# programmers to return an instance of Car as the return value of this method. This is simply called "Covariance." Covariance is supported in most OOP languages. In C# 2.0, covariance is supported by delegates in the same way we expect from normal method calls. Again, it could had been available by the first version of C# compiler.

Covariant Arrays
The good news is that C# supports covariant arrays of reference-types. Here we go:

Car[ ] cars = new Car[ 9 ]; //an array of cars
Vehicle[ ] vehicles = cars; //super-type array
object[ ] vehicleObjects = cars; //object array!

Interesting point is that such type casting is 100% supported by CLR and therefore the resulting bits of compilation (.dll or .exe) do not include overheads. However, based on OOP constraints, the real instantiated type of the array is recognised by CLR and prevents adding an instance of non-expected type (String class for instance) to the vehicleObjects array - well, vehicleObjects is a reference which refers to an instance of type Car[ ]. Such attempts result into run-time errors.

But consider how the following code fails during compilation:

int[ ] intArray = new int[ 9 ]; //value-type array
object[ ] objectArray = intArray; //ERROR!

Although CLR supports direct covariant array type casting of reference-types, it doesn't support such behaviour for array of value-types such as int[ ] and DateTime[ ]. Here, I agree with CLR designers who decided not to support such feature as it has conflicts with inline arrays where such simple array casts toggle unintended memory block relocations and new array creation. I personally believe performance and security issues MUST always be visible to developers minimising human mistakes. But the main reason would probably be such casts result into new references which kicks out the whole idea. Not to mention such casts of int[ ] to object[ ] need all members be boxed to their equivalent object reference at first.

Other Covariant Collections
Do you expect the following line of code compile?

List<Car> cars = new List<Car>( ); //list of cars
List<Vehicle> vehicles = cars; //convert!

Nope, it doesn't compile! In fact, there are much differences between a simple array and an instance of List class. Array is directly supported by CLR whilst List is a custom class from CLR's point of view. There are hefty of custom collection classes available out there, and in this case, they all perform the same as .NET Framework's List class does.

Such collection classes (including List class) directly/indirectly rely on CLR's array type and usually contain instance(s) of this primitive array type. Do you really expect List<Car> be implicitly cast to List<Vehicle> as we did by primitive array in which no wrapper class was instantiated? If "yes", How about FileLogger<Car> and FileLogger<Vehicle>, do you still like that cast happen? There are many scenarios where such casts are making you crazy. Simply X<Car> is not of type X<Vehicle> and hence should not be cast!

But still you may realise there are scenarios in which consumers should be able to cast X<Car> to X<Vehicle>. Well, they're not casts in fact. They're simple conversions through which object references are not maintained. Such functionality can easily be supported by adding appropriate methods including conversion operators. However, C# compiler doesn't allow using implicit/explicit operators simulating array conversion (another wise decision), and implementers are to add some normal methods, say List<T>.ConvertTo<SuperType>( ).

Are you happy as implementers of framework's List class have already added ConvertAll( ) generic method?! Hey, don't let it fool you as it's a conversion method helping you "convert" individual entries of the original list to some other type enclosed in a new generic list - it's not a cast-like conversion at all! Actually framework's List implementers haven't added such method, in the sake of complexity it involves for both implementer and consumer developers. As a class library design best practice, it's usually not a good idea having read-write collection wrapper classes for casting purposes! Take a look at code below:

Car[ ] cars = new Car[ 9 ];
Vehicle[ ] vehicles = cars; //implicit cast

if
( cars == vehicles ) { } //it's true

List
<Car> cars = new List<Car>( );
List<Vehicle> vehicles = cars.UpCast<Vehicle>( );

if
( cars == vehicles ) { } //it's FALSE!!!

As you see we've got a conversion (by internal wrapper classes) in which object references are not maintained, however internal primitive arrays are the same CLR objects:

List<Car> cars = new List<Car>( );
List<Vehicle> vehicles = cars.UpCast<Vehicle>( );

if
( cars[ 2 ] == vehicles[ 2 ] ) { } //it's true!!!

Though, it seems a missing workaround for the casting problem, it was sound decision not to add such method to List, as it brings further complexities to the code and eventually introduces complicated bugs. At the other hand, UpCast( ) method should return an internal wrapper object (either same or derived type) which delegates all operations to the original instance yet adding performance issues behind the scene in more complicated cases like this:
Dictionary<List<List<Car>>,List<List<String>>>

When converting to:
Dictionary<List<List<Vehicle>>,List<List<object>>>

Summary

  1. Casting and conversion are different as in cast object references are maintained. In other words, when casting no object creation happens but a new view to the existing object is maintained.
  2. C# language designers have not failed not supporting casting from GenericType<B> to GenericType<A> as this is wrong from OOP's point of view
  3. Primitive array cast is supported by both CLR and C#
    e.g.: Car[ ] can be cast to Vehicle[ ] - all the references are maintained and no object constructions occur
  4. C# implicit/explicit operators cannot be used to simulate cast-like conversion from same inheritance hierarchy path
  5. .NET Framework designers have not failed not adding cast-like conversion methods to some classes such as List as it has few performance and maintainability consequences. But still more advanced developers could be demanding this from their framework for certain scenarios.

Labels: , , , , ,

Monday, 28 April 2008

C# vs. C

It is pronounced C Sharp, but why is it written C# with the number sign? Why isn’t it written C; with the real musical sharp sign?

C# is Microsoft’s widely known object-oriented programming language. The name is picked from C programming language and Microsoft’s standard programming language, C++ - C plus plus. The latter part (Sharp sign) is selected from the music world; in musical notation, Sharp (), means higher in pitch by a semitone. Sure you have noticed how the name “C++” actually depicts a higher level “C”. And I like the way “C” or “C#” does the same; a beautiful mixture of computer software and music building a name for the new born programming language!

Here is copied from Section 6, Acronyms and abbreviations of ECMA-334 (4th Edition) – C# Language Specification:
“The name C# is written as the LATIN CAPITAL LETTER C (U+0043) followed by the NUMBER SIGN #(U+0023)”

As you notice, it is stated that the official name of the language is C# by using number sign. The number sign (#) was selected by Microsoft to replace the musical sharp sign, and that's some outcome we’re all now familiar with.

The decision was made in the sake of the technical difficulties in displaying the musical sharp sign () which is not supported by all fonts, browsers, and applications; and also the fact that it is not easily accessible by keyboards. The surprising part is that Microsoft sometimes uses the original name (using musical sharp sign) in some ads and arts; just have a brief look at this shot from Microsoft Visual C# 2003 boxes:


Some Background
C# was publicly announced at the Microsoft’s Professional Developers Conference (PDC) in 2000. The lead designer and architect of C# programming language is the famous Danish software engineer, Anders Hejlsberg, who joined Microsoft at 1996, and was very famous for his developments in Turbo Pascal and Delphi – Borland’s amazing development tools at the time. Microsoft offered Anders a bonus of $1,000,000 to join Microsoft. One year later, Borland lodged a complaint that Microsoft has hired some of Borland employees to take Borland’s secrets. Fortunately, the story had a happy ending at last, especially for developers like me who were passionate about both parties.

In August 2000, Microsoft, Hewlett-Packard, and Intel submitted the C# programming language to European Computer Manufacturers Association (ECMA), and at the end of the day ECMA released ECMA-334 – C# Language Specification in December 2001. It was then passed to the International Organization for Standardization (ISO) in 2002, and so it became an ISO standard in 2003 – ISO/IEC 23270. Currently the best standard-compliant implementation of the standard C# is Microsoft C# Compiler, which is actually being employed by some IDE’s such as Microsoft Visual C# and Borland C# Builder.

Labels: ,