I've been thinking a lot the last while about how best to reduce the footprint for a "bare" EObject. The most basic implementation we have is BasicEObjectImpl. It's the abstract base class that all modeled objects should extend; we can and do add methods to InternalEObject so extending this ensures binary compatibility. While BasicEObjectImpl declares no fields, it implements all reusable logic in a highly factored way. EObjectImpl extends it and declares fields for the commonly used features, leaving a properties holder field to point at an instance that in turn declares fields for the less commonly used features. Unfortunately, calling eContents() and eCrossReferences() causes this holder to be fluffed up. Of course derived classes can and do specialize that to instead always create a new list---these lists are just views---and thereby avoid this, but the general problem is that once the properties holder is allocated, it sticks around. For example, when you turn an object into a proxy while unloading a resource, it shows up to hold the proxy URI. Let's take closer peek.
Here's a bit of simple math, based on a 32 bit JVM, for the memory footprint of a bare fully fluffed up EObjectImpl. An object with no fields has 8 bytes of heap overhead. Each int field and each non-primitive field has 4 bytes of overhead. As such, EObjectImpl with its 5fields has minimally 8 + 5 x 4 = 28 bytes. Once you call EObject.eAdapters(), the adaper list is fluffed up. It uses a relatively smart implementation that has no backing array until there is at least one element in the list, but still it has 8 + 4 x 4 = 24 bytes of overhead. Now call EObject.eContents() and EObject.eCrossReferences(). Originally it was assumed that not many clients would use these but in actual fact they ended up being exceedingly powerful and heavily used in the famework itself and by clients. After the calls, the properties holder is fluffed up so add 8 + 6 x 4 = 32 bytes along with, for each list, 8 + 3 x 4 = 20 bytes. Recall the lists are views, so they never change their footprint regardless of the size. Add that all up, and we're at 124 bytes. To put that in perspective, that's as much space as used by a 44 character-length String. Of course I've already pointed out that it's trivial to reduce this to 76 bytes by not caching the contents and cross references lists, but that's still a lot of nuts to stuff in those little cheeks.
I've been thinking a lot about what's the best possible thing we could do to reduce this and came up with the design that's prototyped in 252501. If you want to store data on the object itself---maps could be used, but that will make the footprint worse, not better---minimally you'd need at least one field. That field would reference a structure that holds only the necessary fields. You could trivially use an array, but casting is very expensive. You'd be shocked if you measured it. Unfortunately most people don't measure performance of micro operations and simply assume something that looks trivial is also trivially cheap. Wrong. An array is also not ideal for storing primitives. Ideally you'd want something from which you could fetch the data without down casting from Object. In any case, fewer trips to empty out the cheeks would be good.
Imagine instead having a class for each combination of fields you require. You'd need quite a few of them, unfortunately, but that would be space optimal while also avoiding casting. When you needed to set a field, you'd create whatever class instance is required to hold that field along with the other fields already set, or, when unsetting a field, you'd create a smaller instance without that field. The thought of writing all those classes is unbearably stupid, so of course generating them is the order of the day; work smarter, not harder. That way you can easily modify the pattern without the mind numbing tedium of writing 2^6 classes by hand. I'm pretty happy with the result, though not so happy with the 75K it adds to the jar. I need to think if there are ways to reduce the amount of byte code...
Some cool things that came out of this is the fact that the eDeliver flag, i.e., whether or not the notifications will be produced, does not require any storage; the implementation class simply returns true or false from its hard coded method as appropriate. The patch in the bugzilla shows the templated used to generate this and CompactEObjectImpl shows the template's result as well as how it's used to fully implement a modeled object. Note that I hacked DynamicEObjectImpl to use this new base class only so I could run the core test suite to verify CompactEObjectImpl's correct behavior. It's a good feeling to know all is well.
In combination with this storage approach, I've also created a new array-backed list implementation that avoids ever modifying the backing array; it allocates an array of exactly the required size and never modifies it after populating it correctly. Now, instead of caching a list implementation for the eAdapters, I can cache only the array of adapters themselves, which of course is null for an empty list of adapters. Add to that the trick of checking, when the adapters array is replaced with a new one, if that replacement is equal to the one held by the container, and caching the container's instance instead, and we've ensured that a tree of objects, where each object has the same list of adapters, uses a single shared array instance. This is particularly valuable for ChangeRecorder, EContentAdapter, ECrossReferenceAdapter, and any adapter, like UML's CacheAdapter, that uniformly adds itself to the entire tree of objects. Are you all perked up for the final result?
If we do the same calculations for this implementation, a fully fluffed up CompactEObjectImpl takes only 8 + 1 x 4 = 12 bytes of storage. I think that's impossible to beat. I know of a large company that will be very happy with this. Isn't open source grand? If we could just reduce the byte code for all those darned storage subclasses, without paying the cost of casting, I'd be ecstatic. Anyone out there with creative ideas? A contribution would make a nice birthday present. Speaking of which, happy birthday Darin.
Use and Misuse of Statistics
4 weeks ago