A Joke

humor, programming No Comments

Why do computer scientists often confuse Christmas and Halloween?

Because Oct 31 = Dec 25

I’m not going to explain this one. If you don’t get it, you’re not going to find it funny anyway. :-P

Stack Overflow

programming 1 Comment

Stack Overflow is a new site that aims to be the Wikipedia of programming questions and topics. It takes the basic concept of a question/answer messageboard on programming topics, adds in voting and wiki-editing, puts a very good user interface on it, and makes it available for free. The result is an excellent site to find solutions to software development problems of all sorts.

Joel Spolsky is one of the founders; he’s written his own post about the launch.

I’ve been in the beta for about a month now, and listening to the podcasts since their inception. I’m really impressed with the quality of the site and how well they’ve achieved their goals. Their aim is a lofty one: be the best site on the entire Internet for finding answers to programming questions. Even at this early stage, I think they’ve already accomplished that; it’s now just a matter of time before Google confirms it.

If you develop software, it’s in your best interest to familiarize yourself with the site and discover its capabilities. As time goes on it could easily become your best resource on the Web for solving tricky problems effectively and efficiently.

By the way, here’s my user page, for your enjoyment.

.NET Generics and Type Inference Landmine

programming 4 Comments

Let’s say that you have a class that takes a generic type parameter:

public class Foo<T> {
  public void accept(T someObject) { }
  public void doSomething() { }
}

It’s difficult to doSomething() to a collection of Foo<T>s when you don’t know what T is and/or there’s multiple types used for T. To work around this, you can create a typeless interface:

public interface FooTypeless {
  void doSomething();
}

The declaration of Foo<T> now becomes:

public class Foo<T> : FooTypeless

Now you can have this method:

public void doSomethingOnAllFoos(
  IEnumerable<FooTypeless> foos) {

  foreach( FooTypeless foo in foos ) {
    foo.doSomething();
  }
}

Also, say that you have the following extension method:

public static IEnumerable<T> toEnumerable<T>(this <T> obj ) {
  // Any IEnumerable implementation should work here;
  // but we'll get to that in a bit
  return new T[] { obj };
}

Type inference would let you write this:

Foo aFoo = new Foo();
doSomethingOnAllFoos( aFoo.toEnumerable() );

However, this code does not work; you get a compile error:

cannot convert from ‘System.Collections.Generic.IEnumerable<Foo<string>>’ to ‘System.Collections.Generic.IEnumerable<FooTypeless>’

This occurs even though Foo<string> inherits from FooTypeless. Even though the generic type parameters are compatible, the generic-enabled reference isn’t, at least according to the compiler. (We humans could probably see that IEnumerable is an interface that could safely be converted, but the existing compiler cannot).

Now, you can do this:

IEnumerable<FooTypeless> someFoos
  = new Foo<String>[] { foo };

…but not this:

IEnumerable<Foo<String>> someStringFoos
  = new Foo<String>[] { foo };
IEnumerable<FooTypeless> someFoos
  = someStringFoos;

You may be tempted to try casting to work around this. For example, this works:

IEnumerable<Foo<String>> someStringFoos
  = new Foo<String>[] { foo };
IEnumerable<FooTypeless> someFoos
  = (IEnumerable<FooTypeless>) someStringFoos;

However, this only works if someStringFoos is assigned an array. The following does not work:

IEnumerable<Foo<String>> someStringFoos
  = new List<Foo<String>> { foo };
IEnumerable<FooTypeless> someFoos
  = (IEnumerable<FooTypeless>) someStringFoos;

If you try this, you’ll get an InvalidCastException. Furthermore, this generates the compiler error:

IEnumerable<FooTypeless> someFoos
  = new List<Foo<String>> { foo };

Currently I have no idea why the array works but the List doesn’t; it doesn’t make a lot of sense to me.

I originally discovered this by using type inference with my toEnumerable() extension method. An easy way around these problems is to avoid the type inference completely and explicitly specify the type of IEnumerable you want:

Foo<String> aFoo = new Foo<String>();
doSomethingOnAllFoos( aFoo.toEnumerable<FooTypeless>() );

This works fine, even if toEnumerable() uses a non-array type (such as List) as its IEnumerable implementation. Execution-wise, nothing has changed, but the generic type used in the call can make or break the code.

Stuff like this starts to make Java’s type erasure look not so ugly in comparison. :-P

Stupid .NET Tricks #13

programming No Comments

In Java, making a field “final” means that you can only assign a value to it once. It’s an important part of making a class immutable. It helps to prevent bugs too: make a field final and you’ll get a compiler error if you leave it unassigned or try to reassign it anywhere.

.NET has a similar concept with the “readonly” keyword for fields. However, there’s one important difference compared to Java: a “readonly” field can only be assigned in the class’s constructor, but it can be assigned multiple times within that constructor. The only restriction it places is that the field can’t be reassigned outside of the constructor. You don’t even get a compiler error for not assigning it at all; you only get a compiler warning (which can be turned off).

This has encouraged bugs at least twice in my code: I assigned a value to a field twice within the constructor and then got unexpected results due to the incorrect object being used. One of these was caused by a conflict resolution from a Subversion Merge (and thus it was less-than-obvious that it had been introduced).

Stupid .NET Tricks #12

programming 1 Comment

Disclaimer: There may, in fact, be a good reason for the following .NET behavior. However, if there is, it’s certainly not clear to me. I pose a question at the bottom; dear lazyweb, please explain.

.NET’s type inference lets you avoid specifying the type of a generic parameter on a method call some, but not all, of the time. Consider the following infrastructure:

public String getSomeString() {
  return "Some String";
}

public T returnTheValue<T>(T value) {
  return value;
}

public delegate T Factory<T>();

public T createSomeValue<T>(Factory<T> factory) {
  return factory();
}

The following code works fine:

Console.WriteLine(returnTheValue("A String"));

.NET is smart enough to know that, since you’re passing in a String for T, it can assume that String is used for T elsewhere in the method (including the return type), and so it doesn’t require you to explicitly state that returnTheValue is using String for T, like so:

Console.WriteLine(returnTheValue<String>("A String"));

The above code does work, and if you specify a non-compatible type (ex: int), then you get an error. However, it’s not necessary due to type inference.

The following code also works:

Console.WriteLine( returnTheValue( getSomeString() ) );

There’s no difference between using a literal and calling a method (they’re both dealing with values).

The following does not work:

Console.WriteLine( createSomeValue( getSomeString ) );

Here we’re not passing a value, but passing a method that will create/return a value. In this case, the .NET compiler returns the following error message:

The type arguments for method ‘createSomeValue<T>( Factory<T> )’ cannot be inferred from the usage. Try specifying the type arguments explicitly.

“Specifying the type arguments explicitly” means doing this, which works:

Console.WriteLine( createSomeValue<String>( getSomeString ) );

What I don’t understand is why the type inference works in one case and not the other. We know that getSomeString() satisfies Factory<String>, and thus we know that the return value is a String. If returnSomeValue() can infer its generic type parameter is String based on the String argument, why can’t createSomeValue() infer that it’s generic type parameter is String based on its Factory<String> argument? It already knows for sure that it’s getting a Factory; it just needs to know what type of Factory.

Is there some inherent limitation to the .NET type inference system that makes it impossible to know this? Or is this simply that the .NET implementors didn’t carry the concept one step further?

.NET, Generics, and Wildcards

programming 7 Comments

Java’s generic system (with all of its type erasure horror) is generally regarded as being bad, but it does have one feature that is very useful: wildcard generic types. The lack of wildcards in .NET trips me up time and time again.

Today, the culprit was Nullable<T>. I want to have a method that will effectively do a ToString() of the Nullable’s value, but return some other string if the Nullable does not contain a value. (I happen to want to make it an extension method, but that’s irrelevant here).

I want to do this:

int foo? = null;
foo.ToStringNullable("No Value"); // returns "No Value"
foo = 42;
foo.ToStringNullable("No Value"); // returns "42";

The code I want to write is this:

public static String ToStringNullable(
  this Nullable<?> nullable,
  String inCaseOfNull) {

  String value;
  if (nullable == null || !nullable.HasValue) {
    value = inCaseOfNull;
  }
  else {
    value = nullable.ToString();
  }
  return value;
}

Note the <?>: this is a Java-ism that does not work in .NET. Java has the wildcard type “?” for generics: it means accept any type. .NET doesn’t have this. Instead, I’d be forced to write:

public static String ToStringNullable<T>(
  this Nullable<T> nullable,
  String inCaseOfNull)
  where T : struct

…which in turn means I’d have to redundantly specify the type of the first parameter, like so:

int foo? = null;
// returns "No Value"
foo.ToStringNullable<int>("No Value");

…which in turn gives me the opportunity to makes mistakes like:

int foo? = null;
foo.ToStringNullable<bool>("No Value");

(although this does generate a compiler error, so it’s not so bad — just tiresome).

(Also note: The “where T : struct” at the end is an additional requirement for the use of Nullable; it’s not important to the argument.)

One would legitimately think that you could use <Object> in place of <T>. However, this is not allowable in .NET. The problem is that some, but not all operations are necessarily typesafe when doing this. Consider:

// This is conceptually sound, but problematic
List<Object> aListOfObjects = new List<String>();

// This won't ever work
aListOfObjects.Add( -1 );

// However, there's nothing conceptually wrong with this
aListOfObjects.Contains( -1 );

This unsolved problem is why I didn’t tag this post as a “Stupid .NET Trick.” The .NET language team made a reasonable decision to forgo wildcards so that they could avoid the erasure mess of Java. It was a tradeoff, and probably a good one overall — but that doesn’t mean that it doesn’t cause problems.

I’ve come up with one way around this problem:

  1. Declare an interface
  2. Give this interface a name like [Type]Typeless. For example, if you had a type Foo<T>, you might name your interface FooTypeless
  3. Let your generic-typed class implement the interface (ex: Foo<T> : FooTypeless)
  4. Create signatures (methods, properties) in the interface for all of the members in the implementing class that happen to not use the generic type parameters. (ToString() is an example, although a bad one since it’s inherited from Object. I’ll provide a better example below.)
  5. Whenever you need to access the non-generic-type members of the class, use the interface instead.

Thus, I would do something like this:

public interface NullableTypeless {
  bool HasValue { get; }
  // Not necessary, but provided for example purposes
  String ToString();
}
public struct Nullable<T> : NullableTypeless {}

Note that NullableTypeless does not have any type-specific members such as <T> Value { get; }.

Then my extension method could be written like so:

public static String ToStringNullable(
  this NullableTypeless nullable,
  String inCaseOfNull ) {

  String value;
  if (nullable == null || !nullable.HasValue) {
    value = inCaseOfNull;
  }
  else {
    value = nullable.ToString();
  }
  return value;
}
// This would work fine:
int foo? = null;
foo.ToStringNullable("No Value");

This works acceptably well in my own code. Unfortunately, since I can’t rewrite .NET, this option isn’t available in the case of Nullable.

Ideally I’d like to see some sort of modifier to methods that says “this method is safe to use if the value supplied to a generic argument is higher up on the inheritance tree than the type that was actually used to create the class”. You could use this on a method-by-method basis. For example, List<T>.Contains(<T>) would be tagged with it (since it doesn’t matter what the argument is; everything has Equals()) but List<T>.Add(<T>) would not (since it can only accept members that are the same or lower down than the List’s inherent type). I don’t think this ever will happen in .NET though.

Calling .NET Return-Type-Based Overloaded Methods

programming 1 Comment

This came dangerously close to a “Stupid .NET Trick”, but I’ve figured out an elegant solution. As far as I can see though, it wasn’t documented on Google, so I decided to post it here.

In Java (and, I suspect, many other languages), a method signature is determined by the method’s name and its parameters, but not by it’s return type. Thus, you cannot have two methods that differ only by their return type:

public boolean convertTo(String value) { ... }
public int convertTo(String value) { ... }

Here, the method signatures are identical, and thus the compiler sees two duplicate methods, not an overload. The usual way around this is to suffix the return type to the end of the method name.

C# nearly has the same restriction; you cannot write two (simple) methods in the same class that differ only by return type. However: the .NET runtime does not have this restriction. C# (and VB, and presumably the other CLR languages) take advantage of this by allowing overloading based on return types when implementing different interfaces.

For example: Consider these interfaces:

public interface Foo {
    void perform();
}
public interface Bar {
    int perform();
}

This doesn’t work:

public class NoInterfaces {
    public void perform() {
    }
    public int perform() {
      return -1;
    }
}

If you declare a class that implements both Foo and Bar, you must provide both perform() methods. C# gives you a way past the signature conflict by allowing you to specify the source interface of the duplicate method. However, you cannot make both methods public: you must choose at most one (and potentially zero) method to make public, and the others must have no access modifier at all:

public class BothInterfaces : Foo, Bar {
  public void perform() {
  }

  int Bar.perform() {
    return -1;
  }
}

Note that the Bar version of BotherInterfaces.perform() (or any other method without an access modifier) is very nearly uncallable, even from within the class itself (that is: it’s even less accessible than a private method). There is a way to call it though, which I’ll discuss below.

To further complicate things: you can in fact declare overloaded public methods with differing return types if they’re declared at different levels of inheritance:

public class ImplementsFoo : Foo {
  public void perform() {
    throw new NotImplementedException();
  }
}

public class ExtendsFooImplementsBar
  : ImplementsFoo, Bar {
  public new int perform() {
    return -1;
  }
}

ExtendsFooImplementsBar gains “public void perform()” by virtue of subclassing ImplementsFoo, and also must declare “public int perform()” when implementing Bar. The trick is that ExtendsFooImplementsBar.perform() must hide (not overload/override) ImplementsFoo.perform() (hence the “new” keyword in the ExtendsFooImplementsBar declaration). The hidden ImplementsFoo.perform() is not accessible by external callers; however it is accessible from within ExtendsFooImplementsBar by calling base.perform().

I’ve seen this crop up in two places:

  1. System.Collections.IEnumerable declares GetEnumerator(), which returns System.Collections.IEnumerator. System.Collections.Generic.IEnumerable<T> declares GetEnumerator(), which returns System.Collections.Generic.IEnumerator<T>. Since System.Collections.Generic.IEnumerable<T> also inherits from System.Collections.IEnumerable, it also must declare System.Collections.IEnumerator GetEnumerator() (without the generic type parameter).

    This isn’t much of a problem, as:

    1. You almost always want to use the generic-supporting version whenever possible.
    2. IEnumerator GetEnumerator() can (and should) almost always simply call IEnumerator<T> GetEnumerator() and return its value (as the generic-supporting IEnumerator is also a non-generic IEnumerator by virtue of the inheritance).
  2. Iesi.Collections.Generic.Set<T> (and it’s subclasses) inherits from System.Collections.Generic.ICollection<T>, and thus inherits ICollection<T>’s void Add(<T>). However, it also implements Iesi.Collections.Generic.ISet<T>, which in turn declares bool Add(<T>).

    Both methods have good reasons for declaring their respective return types. Set’s bool Add() returns true or false depending on whether the value passed in is already in the Set. (Keep in mind that Sets can only contain a particular value once. Knowing whether the Add() call actually added the value can be useful, especially in sorted implementations of Set.) On the other hand, that functionality is not useful in ICollection (as determining the results of the Add() may be meaningless or expensive for other collection implementations), so returning void from Add() is appropriate for the most part. Set definitely should be implementing ICollection for interoperability purposes.

    A problem occurs when you need to access the void version of Set.Add() instead of the bool version. This is most common when using the method with a delegate. Consider this:

    // This is a standard .NET delegate
    public delegate void Action<T>(<T> target); 
    
    public void performOnAll(Action<Foo> action) {
      // Perform some action on multiple objects in
      // some sort of data structure
      // (perhaps the results of a database query?)
    }
    
    List<Foo> aList = new List<T>();
    // This works fine
    performOnAll(aList.Add);
    
    HashedSet<T> aSet = new HashedSet<T>();
    // This doesn't work because HashedSet.Add()
    // returns bool, not void
    performOnAll(aSet.Add);
    
    // Instead, you're forced to do this:
    performOnAll(delegate(Foo foo){
      aSet.Add(foo);
    });
    // (Or use a lambda in .NET 3.5)
    

    Iesi could have avoided this problem by renaming “bool Add()” to something like “bool AddIfPossible()”… but they didn’t. Fortunately, there’s a way around this.

Casting to the Rescue

There’s a relatively simple solution to this complicated problem: cast the reference to one of its superclasses/interfaces and you can access the methods from that interface.

int result;
BothInterfaces bothInterfaces = new BothInterfaces();
// This method returns void and thus
// can't be assigned to result
((Foo)bothInterfaces).perform();
result = ((Bar)bothInterfaces).perform();

ExtendsFooImplementsBar extendsFooImplementsBar
  = new ExtendsFooImplementsBar();
// This method returns void and thus
// can't be assigned to result
((Foo)extendsFooImplementsBar).perform();
result = ((Bar)extendsFooImplementsBar).perform();

All of these calls access the method from the respective interface as expected.

This also works for my Set/delegate problem:

performOnAll( ( (ICollection<Foo>) aSet).Add);

Lastly, it also allows you to call the otherwise uncallable zero-access-modifier method when implementing multiple interfaces:

public class BothInterfaces : Foo, Bar {
  public void perform() {
  }

  int Bar.perform() {
    return -1;
  }

  public void test() {
    // Can call the void version normally
    perform();
    // Can call Bar's perform via casting
    int result = ( (Bar) this).perform();
  }
}

All in all, it’s a pretty good solution… but understanding the problem it solves is quite sticky.

Stupid .NET Tricks #11

programming No Comments

TextWriter is an abstract class that has heavily-overloaded Write() and WriteLine() methods for all of the more basic types such as Int32, Boolean, and String. The idea is that it takes these values, converts them to character representations (given a specified Encoding), and then outputs the results (usually to a StringBuilder or a Stream). It’s equivalent to Java’s PrintWriter. However, unlike PrintWriter, there’s no underlying output source / OutputStream: you’re free to do whatever you like with the resulting characters.

The stupidity lies in the interface and it’s accompanying documentation. TextWriter is an abstract class, but the only abstract member is the Encoding property’s get accessor; you get to decide how and when you want the Encoding to be specified. All of the Write() and WriteLine() methods, however, are not abstract: they’re simply overrideable. That’s not necessarily bad, but there’s little to no documentation on what any of these 36 methods actually do (specifically, whether they call any of the other Write() methods and thus don’t need to be overridden in any subclass). For example:

  • Write(char[]):
    • “This method does not search the specified String for individual newline characters (hexadecimal 0×000a) and replace them with NewLine.” (Gee thanks. Does it also not cure cancer?)
    • “This default method calls Write and passes the entire character array.” (If it’s passing the entire character array to Write(), doesn’t that mean it’s recursive? I can guess that they mean it passes each member of the array to Write(char), but that’s not what the documentation actually says.)
  • Write(Decimal): “dd” (And that’s all it says.)
  • Write(Double): “The text representation of the specified value is produced by calling ToString.” (But once you do that conversion from Double to String, where does the String go? Is it used anywhere or simply dropped?)
  • Write(String): “This version of Write is equivalent to Write .” (Yes, the URLs are identical. I’m glad to see they understand the Reflexive Property.)
  • WriteLine(char) states that it (effectively) calls Write(char) and then WriteLine(), which is exactly what it should be doing. Unfortunately, none of the other WriteLine() overloads state this; all they say is that they write the converted characters plus the newline (not necessarily calling the appropriate Write() methods). Thus you should be overriding these methods too.
  • Write(char): “This default method does nothing, but derived classes can override the method to provide the appropriate functionality.” (If it does nothing, then why not make it an abstract method and communicate the intent explicitly?)

The documentation for the TextWriter class itself is no help either:

A derived class must minimally implement the Write method to make a useful instance of TextWriter.

(The link points to the list of all the Write() overloads, not one specific method.)

All Microsoft needed to do was:

  1. Make Write(char) abstract, to show that derived classes need to do something with a character
  2. Document that Write(char[]) calls Write(char) for every element in the array
  3. Document that Write(String) calls Write(char[]) after converting the String to a char[]
  4. Document that all of the other overloads call Write(String) after calling .ToString() on the target
  5. Document that all of the WriteLine() methods call their equivalent Write() method and then call WriteLine() to write the newline character.
  6. Leave all the Write() and WriteLine() methods as overrideable, to allow for changes in behavior / optimizations.

Then, a TextWriter implementation would only have to override one method (ie: Write(char)) and reliably get all of the conversion behavior for free. That’s almost exactly what PrintWriter already does (except that it defers to an aggregated Writer/OutputStream to write its characters rather than deferring to its own abstract method). Instead, we get this mess.

Bonus Points: Write(Object) simply does a .ToString() on the given object, which is reasonable. Most of the other overloads (for Boolean, Double, Int32, Int64, Single, UInt32, UInt64, and probably Decimal) state that they do exactly the same thing… thus making them unnecessary. It’s clear that the .NET API designers were trying to emulate Java’s PrintWriter, which is forced to create the equivalent overloads due to the separation between primitive and Object types. Of course, if you’re implementing your own TextWriter, you still have to override all of the superfluous methods (and don’t forget the WriteLine()s!).

Extra credit: There’s Write(String, Object) to format the Object argument given the formatting String argument (which is a very common way to write out dates and numbers in varying ways). There’s also a Write(String, Object, Object) method that does the exact same thing but with two Object arguments, and then a Write(String, Object, Object, Object) that does the same thing but with three Object arguments. There’s no method that takes four arguments though; I guess they considered that excessive.

Must Have

business, programming No Comments

I saw “24 hours”, “Must Have”, and “Crap”, and immediately thought “software development rush job.” That’s not quite the intent this Indexed card had, but I think it’s still appropriate.

Update: Hrm, I can’t hotlink to the image, and I’m not going to make a copy on my server… so I guess you’ll have to click the link if you want to see the card. That lessens the impact. :-(

Stupid .NET Tricks #10

programming No Comments

Some .NET classes (System.Data.DataSet is the one I’m currently using) define a property called “Namespace”. In VB, “Namespace” (note the capital “N”) is a reserved word used to associate a class with a particular namespace. Since it’s a reserved word, it’s not a valid identifier in VB… meaning that the MS classes that contain the property “Namespace” were not written in VB (at least without some sort of compiler hack).

Currently I’m trying to reimplement a (legacy) class that subclasses DataSet; I want to break the DataSet inheritance and replace it with custom members. Due to the “Namespace” conflict, I can’t do this in VB, although it wouldn’t be a problem in C#, and isn’t a problem for all of the other properties in the old class.

So much for language independence.

Update: I spoke too soon; VB does have a mechanism for specifying identifier names that match keywords: just enclose the identifier in brackets. It’s unfortunate that it’s necessary, but it’s easy enough to implement once you know the trick.

« Previous Entries