Object Graph Traversal by Recursive Reflection

Every software engineer faces with this problem of traversing an object graph and processing it, at least once in his or her career :). If you are a software engineer and did not have to solve this puzzle yet, believe me that day will eventually come… If you are not a software engineer and still you had to solve this problem please send me the details, I would be very curious about how you got yourself into that situation :))

In my case, I needed to implement a public .net api as part of a public SDK in Github, basically needed to implement my own Serializer, that takes an object as a parameter, traverse its nodes and end nodes (properties) and process these as it traverses them.I put no type, interface or inheritance constraint on the input object, the type of the input object is base type of all base types “object”. That makes the api super flexible and powerful, basically you are telling the client code, send me whatever type of object you want and I am going to process it, but on the other side it is a very bold statement because the object can really be anything, we have no compile time knowledge or run time type constraints of its type, nor do we know its object graph and hierarchical structure. It can be a simple object with simple properties directly under the object root, or it can be a complex object with a depth of many layers (practically infinite) with all kinds of complex /nested/composite properties, these nested complex properties can be reference type (a class) or value type (ie. struct) and there may be a full or partial recursive loops within the object graph, to name a few of the interesting points.

So this is an object traversal/processing problem where I need to visit every intermediate and end node property of the input object and process these properties. The output of this processing logic could not be “log something to command line/logging infrastructure” either, the output generated by processor had to be accumulated and returned to the caller.

I solved this problem by recursive reflection. For more interested, I implemented a depth first traversal with structural and multiple recursion technique. You can have a look at different types of recursion in this good and short wiki article (https://en.wikipedia.org/wiki/Recursion_(computer_science)#Types_of_recursion).

The public facing API is called Flatten. It takes an object as I mentioned before, processes all of its intermediate and end nodes, lets call that function EndNodeProcessor.

The algorithm passes each node it visits to EndNodeProcessor function, in that way the responsibility of traversing the object graph and processing of each node is clearly separated between the recursive algorithm and the EndNodeProcessor function.

EndNodeProcessor’s exact signature depends on what do you want to do with the exact node and what kind of information you need from the algorithm to be passed in, as a minimum you need the nodes to be passed so EndNodeProcessor should take the node as a parameter, in my case I also needed the Type information of the current node, so that’s the second parameter.

The return type of your EndNodeProcessor also defines the return type of the public method that does traversal and node processing. (In my case it is called Flatten).

Lets say EndNodeprocessor takes each visited node as a parameter, processes them and returns an object of type of T back to the recursive algorithm, the algorithm then collects these objects and persists them in a collection. This collection is then returned back from the public api. In my case this collection is Dictionary<string, T>. In my case just an IEnumerable<T> was not enough, I also needed a way to recompose the original object back so I associated each output with a key of string type in a Dictionary <string, T>.

So the signature of EndNodeProcessor is: T EndNodeProcessor (object Node, Type type)

Note that  as mentioned, the Type parameter is something specific to my requirements, as a minimum it needs to take the node as object parameter. The return type T is a pseudo concept, you would need to decide what exact type you want the processor to return and replace the T with that exact type. I specifically left out the actual implementation of the EndNodeProcessor function because what it does with the passed in Node object is specific to the problem you want to solve and is decoupled from the traversal algorithm.

Let’s look at the code:

public static Dictionary<string, T> Flatten(object root)
{

if (root == null) return null;

//Here within the public method we instantiate Dictionary which will hold the output values generated by the NodeProcessor.

Dictionary<key, value> propertyDictionary = new Dictionary<key, value>();

//We create a HashSet to keep track of antecedents during object traversal. This is to detect complete (all the way back to root node) or partial recursive loops (back to an internal node) within object graph and terminate the recursion. If we do not do that  we will end up an infinite loop during recursion, will run out of stack memory and cause a Stack Overflow exception eventually. We like StackOverflow as a web site, we do not want any stack over flows in any other shape or form 🙂

HashSet<object> antecedents = new HashSet<object>(new ObjectReferenceEqualityComparer());

//We will pass the propertyDictionary to the private overloaded recursive method which is also called Flatten, the recursive Flatten method will update this Dictionary by reference. Dictionary is a reference type therefore stored on the heap and so any updates to the dictionary during the recursion will persist after all recursion ends and the private Flatten method returns.

return Flatten(propertyDictionary, root, string.Empty, antecedents);
}

private static bool Flatten(
Dictionary<string, EntityProperty> propertyDictionary,
object current,
string objectPath,
HashSet<object> antecedents)
{

// Recursion Termination Condition: if passed in object is null, return. This heuristically means that we have processed an end node property and we are already at the bottom of the object graph. There is no deeper layer to process.

if (current == null)
{
return true;
}

// If we reach here, that means we are either in the middle of the object graph or we are actually at the deepest layer, at the end node property but we have not yet processed it. We need to make a decision. We first get the type of the current node.

Type type = current.GetType();

// Then we assume we are at an end node property and attempt to process it by passing the current object to our NodeProcessor along with its type.
T output = EndNodeProcessor(current, type);

// If EndNodeProcessor managed to process the current object we infer it was an end node property. This logic could also be put inside the EndNodeProcessor directly.

if (output != null)
{

// We update the passed in PropertyDictionary with the output generated by the NodeProcessor.
propertyDictionary.Add(objectPath, output);

// Recursion Termination Condition: We processed an end node and now we can return from this recursion.

return true;
}

// If we reach this part of the code there are 2 options. Either we are in an intermediate node in the object graph (ie. not an end node property), therefore our EndNodeProcessor returned null. Or we are actually in an end node but the EndNodeProcessor failed to process the request. we need to make a decision. First thing we check is if we are in an intermediate node.

IEnumerable<PropertyInfo> propertyInfos = type.GetProperties();

if (!propertyInfos.Any())
{

// If we are here then the current node does not have any child nodes (properties in our example) of its own then it means that we are actually on an end node property but EndNodeProcessor failed to process that node. This is an unexpected failure so we terminate overall method execution at this point by throwing an exception.
throw new SerializationException(string.Format(CultureInfo.InvariantCulture, UnsupportedPropertyTypeForEntityPropertyConversion, type, objectPath));
}

//Recursive Reference Detection: If we are at this stage in the code we are at an intermediate node in the object graph. We need to make sure we detect recursive loops in the passed in object and avoid going into infinite recursive traversal within this part of the object graph.

bool isAntecedent = false;

// If current type is a value type ie. we are at a struct type intermediate property then there is no danger of recursive reference, recursive reference by definition only happens on reference type s (ie. a child reference type property and its parent are pointing to the same point in memory, therefore creating a circular loop.)

if (!type.IsValueType)
{

// antecedents is a custom HashSet<object> which I initialized with my own EqualityComparer implementation (ObjectReferenceEqualityComparer()).  ObjectReferenceEqualityComparer uses purely the object references to detect object equality and avoids any overloaded GetHashCode() implementations (if any) of input objects. The hashset contains all of the child properties of the current parent property. We update this hashset also recursively as we traverse in the object graph. We add all child nodes under the current node to the hashset as we traverse recursively in this part of the object graph where current node becomes the root.  And we remove all of the nodes from the hashset after we process them.

If any time during this traversal, antecdents.Contains(current) returns true, it means that we saw this reference before within ths sub tree and hence we detected a recursive reference. So we terminate. We could here decide what other things we do with recursive references, may be take an option to alter that (ie. skip those as opposed to terminate the execution etc.). In fact Circular Reference Detection is actually a topic of its own, so I may write up another article just about it to go over that part of the solution
if (antecedents.Contains(current))
{
throw new SerializationException(string.Format(CultureInfo.InvariantCulture, RecursiveReferencedObject, objectPath, type));
}

// we detect that current node is not in our hashset and so we add it to the hashset. We also set isAntecedent flag to true because we are at this stage in the code in an internal node so it is antecedent to child properties.

antecedents.Add(current);
isAntecedent = true;
}

// This is where we apply multiple recursion, we call this private Flatten method on all of the child properties of the current node.

bool success = propertyInfos
.Where(propertyInfo => !ShouldSkip(propertyInfo))
.All(propertyInfo =>
{
return Flatten(
propertyDictionary,
propertyInfo.GetValue(current, index: null),
string.IsNullOrWhiteSpace(objectPath) ? propertyInfo.Name : objectPath + propertyNameDelimiter + propertyInfo.Name,
antecedents);
});

// Once we are here in the code, we completed processing all child nodes (and their child nodes and their child nodes, all the way to their end nodes ) that are under the current property in the object graph. Since we completed the processing all nodes under this sub tree of the object starting from the current node downwards, we can safely remove the current node now from the antecedents hashset, there is no risk of circular reference beyond this ppoint in relatio to the current node.

if (isAntecedent)
{
antecedents.Remove(current);
}

// Here we return this is also the exit point when all intermediate and end node properties under the object root is traversed and we processed the entire object graph successfully.

return success;
}

 

 

 

 

 

 

2 thoughts on “Object Graph Traversal by Recursive Reflection

  1. Dogo, thanks for a great article. I followed most of it, but it looks like there’s a few things missing. The actual EndNodeProcessor function seems missing and the initial Flatten function has OutputTypeOfEndNodeProcessor… is this correct? Any chance you could update the code to remove a few of these issues? Thanks!

    Like

    1. Hi, thanks for the comments. I updated the post and added more detail about the EndNodeProcessor, have a look at the paragraphs just before the code starts. In terms of the points you made, I specifically left out the implementation of EndNodeProcessor because it is specific to the actual problem, for example in my case it converts the input Node object into a specific type that can be saved on Azure Table Storage service, it would be too much and unrelated detail to put that implementation in this post. The OutputTypeOfEndNodeProcessor is also a pseudo concept. It is you who decides that exact output type, in my case the output type was a class from Azure Storage SDK called EntityProperty. Hope this helps.

      Like

Leave a comment