7.1 Iterating Over Collections
In computing, there are many different kinds of collections ranging from
simple data structures, such as arrays or linked lists, to more
complex ones, such as red/black trees and priority queues. While the
internal implementation and external characteristics of these data
structures vary widely, the ability to traverse the contents of the
collection is an almost universal need. In the FCL, this is supported
via a pair of interfaces (IEnumerable and
IEnumerator) that allow different data structures
to expose a common traversal API.
7.1.1 IEnumerable and IEnumerator Interfaces
To expose an enumerator, a
collection
implements IEnumerable. This interface allows
clients to retrieve an
enumerator,
typed as an IEnumerator interface reference. The
enumerator is a logical cursor that can be used to iterate over the
elements of the collection in a forward-only manner. The interfaces
look like this:
public interface IEnumerable {
IEnumerator GetEnumerator( );
}
public interface IEnumerator {
bool MoveNext( );
object Current {get;}
void Reset( );
}
The IEnumerable interface has
a single GetEnumerator( )
method that returns an IEnumerator interface
reference, with the enumerator positioned before the first element in
the collection. The IEnumerator interface provides
standard mechanisms (the MoveNext( ) and
Reset( ) methods) to iterate over the collection
and to retrieve the current element (the Current
property) as a generic object reference.
The collection can then be enumerated as follows, or as in the next
code example:
MyCollection mcoll = new MyCollection( );
...
// Using IEnumerator: substitute your typename for XXX
IEnumerator ie = myc.GetEnumerator( );
while (myc.MoveNext( )) {
XXX item = (XXX)myc.Current;
Console.WriteLine(item);
...
}
C# also provides the foreach statement, which
works on any collection that either implements the
IEnumerable interface, or provides an accessible
GetEnumerator( ) method that returns a type with
accessible MoveNext( ) and Reset(
) methods. Using foreach can simplify
iteration code, as follows:
MyCollection mcoll = new MyCollection
...
// Using foreach: substitute your typename for XXX
foreach (XXX item in mcoll) {
Console.WriteLine(item);
...
}
7.1.2 Implementing IEnumerable and IEnumerator
To allow your data types to
be
enumerated via the standard mechanisms, implement
IEnumerable and IEnumerator,
taking care to follow the interface's semantic
contract. IEnumerator is often implemented as a
nested helper type, which is initialized by passing the collection to
the constructor of the IEnumerator:
using System.Collections;
public class MyCollection : IEnumerable {
// ...
int[ ] data;
public virtual IEnumerator GetEnumerator ( ) {
return new MyCollection.Enumerator(this);
}
private class Enumerator : IEnumerator {
MyCollection outer;
int currentIndex = -1;
internal Enumerator(MyCollection outer) {
this.outer = outer;
}
public object Current {
get {
if (currentIndex = = outer.data.Length)
throw new InvalidOperationException( );
return outer.data[currentIndex]; // boxed!
}
}
public bool MoveNext( ) {
if (currentIndex > outer.data.Length)
throw new InvalidOperationException( );
return ++currentIndex < outer.data.Length;
}
public void Reset( ) {
currentIndex = -1;
}
}
}
7.1.3 Optimizing foreach
One of the downsides of implementing IEnumerable
and IEnumerator, as shown in the previous example,
is the return type of IEnumerator.Current is
object. This approach has two main disadvantages:
Clients need to cast the object reference to a
more specific type, which is both inefficient compared to a more
strongly typed interface for homogenous collections, and may also
cause TypeCastExceptions to be thrown if the casts
fail. When a collection contains value types, returning them as object
references results in their being boxed and unboxed, which is
inefficient, creates excess garbage on the heap, and makes it
impossible to edit the contents of the collection directly.
Fortunately, there are some ways to resolve these issues. Although
the foreach statement is designed to work with
types that implement IEnumerable and
IEnumerator, the C# compiler actually allows
foreach iteration over types that merely provide a
type-compatible subset of the methods and semantics in
IEnumerable and IEnumerator. It
does this by looking first at the type used for the collection in the
foreach statement to see if it has an accessible
method called GetEnumerator( ) that takes no
parameters, and then returns a publicly accessible type. If it does,
it then looks at the return type to see if it has loosely equivalent
functionality to the IEnumerator interface. To do
this, the compiler checks whether the type has an accessible
MoveNext method (taking no parameters and
returning a bool result) and an accessible
Current property (matching the type of the
individual element in the foreach statement). If
all of these requirements are met, the C# compiler emits IL code to
use these types and methods directly, without going through the
interfaces. This means that client code does not need to downcast
from object, and in the case of collections
returning value types, no boxing occurs. Modifying the previous
example to work this way looks like the following:
using System.Collections;
public class MyCollection {
// ...
int[ ] data;
public Enumerator GetEnumerator ( ) {
return new MyCollection.Enumerator(this);
}
public class Enumerator {
MyCollection outer;
int currentIndex = -1;
internal Enumerator(MyCollection outer) {
this.outer = outer;
}
public int Current {
get {
if (currentIndex = = outer.data.Length)
throw new InvalidOperationException( );
return outer.data[currentIndex]; // no boxing!
}
}
public bool MoveNext( ) {
if (currentIndex > outer.data.Length)
throw new InvalidOperationException( );
return ++currentIndex < outer.data.Length;
}
}
}
The problem with this approach is that when used in languages other
than C#, this type does not appear to be a collection since it does
not implement the standard interfaces. Fortunately, explicit
interface implementation can be used to support both approaches
concurrently, as follows:
using System.Collections;
public class MyCollection : IEnumerable {
// ...
int[ ] data;
public Enumerator GetEnumerator( ) {
return new MyCollection.Enumerator(this);
}
IEnumerator IEnumerable.GetEnumerator( ) {
return this.GetEnumerator;
}
public class Enumerator : IEnumerator {
MyCollection outer;
int currentIndex = -1;
internal Enumerator(MyCollection outer) {
this.outer = outer;
}
public int Current {
get {
if (currentIndex = = outer.data.Length)
throw new InvalidOperationException( );
return outer.data[currentIndex];
}
}
public bool MoveNext( ) {
if (currentIndex > outer.data.Length)
throw new InvalidOperationException( );
return ++currentIndex < outer.data.Length;
}
object IEnumerator.Current {
get { return this.Current; }
}
bool IEnumerator.MoveNext( ) {
return this.MoveNext( );
}
void IEnumerator.Reset( ) {
currentIndex = -1;
}
}
}
This approach allows C# clients to bind to the more efficient,
type-safe version of the methods, while other languages such as
VB.NET bind to the more generic interface-based implementations.
7.1.4 IDictionaryEnumerator Interface
Later in this chapter we discuss the use of dictionary data
structures. The IDictionaryEnumerator interface is
a standardized interface used to enumerate over the contents of a
dictionary, in which each element has both a key and a value. The
Entry property is a more type-safe version of the
Current property, and the Key
and Value properties provide access to the
element
keys and values directly. The interface looks like this:
public interface IDictionaryEnumerator : IEnumerator {
DictionaryEntry Entry {get;}
object Key {get;}
object Value {get;}
}
|