Pages

GAE/J and JDO/JPA

Thursday, January 28, 2010
In April 2009 Google announced AppEngine for Java, providing support for JDO and JPA. Since that time people have had the chance to see what they think of it and identify shortcomings. This has lead to many misconceptions of JDO and JPA (though mainly JDO) due to thinking that what Google provide is a full and true representation of these standards; it isn't. This blog entry attempts to correct some of these misconceptions, and to suggest some areas where Google could remedy the situation. All of the following items would help in providing a true reflection of these persistence standards and aid GAE/J users in having (much more) portable applications.


JDO/JPA : Lack of support for many methods and operators in JDOQL/JPQL


GAE/J only supports a relatively small subset of the available methods and operators of JDOQL/JPQL. This is due to the underlying datastore not supporting certain capabilities in its queryability. The problem is that this gives the impression that JDOQL/JPQL are somehow weak. What would make way more sense would be to evaluate all that is evaluatable in the datastore, and then have a flag set while compiling for whether the query contains any feature that the datastore query cannot handle and, if so, run the DataNucleus in-memory query evaluator on the resultant instances. This would provide a transparent interface to JDOQL/JPQL and mean that people don't have to have build some custom queries just to get around GAE/J shortcomings.

JDO : No support for input candidate collection for JDOQL


GAE/J does not support inputting a candidate collection and query over that collection. This is a trivial thing to support, particular since the code necessary to do it was contributed some time ago, yet isn't in the current plugin.

JDO : pm.getExtent doesn't handle subclasses


The current pm.getExtent() implementation in GAE/J doesn't support the subclasses flag. The root cause is that the underlying datastore doesn't support a single query to retrieve a class and its subclasses. The simple solution would be to run "n" queries on the datastore, one for each of the possible subclass types, and merge the results. This would be simple to do and implement, and would provide correct JDO behaviour so users don't see any shortcoming.

JDO/JPA : exposing Google-specific id classes


With GAE/J there is some flexibility on what is allowable as a PK field; Long, String, and "Key". The first two are standard classes and standard JDO. The latter is environment specific. Firstly GAE/J ought to allow short/int/Short/Integer/long for true JDO/JPA operation so that users see no difference. Secondly, the "id" exposed to the user should be a JDO or JPA id and should be portable. When we implemented support for persisting to db4o we didn't expose db4o's id, instead wrapping it internally, so the user has no unexpected classes popping up that prevent their migration elsewhere later on.

JDO/JPA : one entity group per transaction


With GAE/J you have a restriction on the number of "entity groups" that can be enlisted in any transaction. Ok, but why expose this to the user and restrict what they do ? The logical way to do it would be to have multiple "internal" transactions for a JDO/JPA transaction and have each of these for a particular entity group. Since the underlying datastore doesn't provide ACID transactions anyway there is little impact of doing this. It would then mean that you don't impose on users having to split their persistence code apart just to get it to run, and hence mean that it is portable

Unowned relations


This term is being used in GAE/J seemingly where you have a Collection<id> and so no real relation, although the "id" relate to other objects. This is perfectly representable in JDO/JPA as a Collection<long>, or Collection<String> or even Collection<Object>. With one of these "relations" the onus is on the user to manage the relation.

Support for types persistable as String


DataNucleus has, for some time, provided a mechanism for defining how to persist a type as a String (and retrieve its value from the String) - see ObjectStringConverter in DataNucleus "core" code. GAE/J could easily provide support for this in their plugin (if a type is not natively supported then check if there is an ObjectStringConverter and use that) and this would mean that many more Java types are persistable using AppEngine.

Documentation : @Persistent


In the GAE/J docs, every field has @Persistent marked against it. This is totally unnecessary, and you only need @Persistent for a non-standard field type. It leads to people believing that you must specify this to get something persisted, and so when they want to have a field not persisted they just remove the annotation. Please update the docs to reflect the minimal configuration required so we give a fair reflection of JDO and its spec. For example
@PersistenceCapable
public class MyClass
{
@PrimaryKey
Long id;

String name;

double value;

...
}


Package naming "org.datanucleus.*"


This plugin is provided by Google not DataNucleus. It's currently packaged as "org.datanucleus.store.appengine". This leads to people believing that DataNucleus itself is at fault for its shortcomings. This is unacceptable and we own the domains datanucleus.org/datanucleus.com. Please rename your packages ASAP.



Nowhere have we seen any attribute of the GAE/J BigTable datastore that cannot be handled by the JDO or JPA API's. The JDO API (and metadata) in particular was designed as generic, and there is nothing in a "NoSQL" datastore that should cause it any problems with representation. We challenge anyone to define where there is such a problem area and it can then be addressed (there's a JIRA open on the Apache JDO project for just this situation); if you really can come up with a problem area then its in all of our interests to understand it and tackle it.

12 comments:

  1. I totally agree! Google needs to put some more effort into App Engine to make it usable.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. As you read, DataNucleus did not provide that plugin, and are not responsible for it - Google are. Google provide support on their plugin. DataNucleus (1.1) can be used to provide persistence to Google BigTable using their plugin. The version of DataNucleus needed for use of that plugin is so old that we wouldn't support it anyway.

    In the same way there is also another 3rd party plugin for persistence to MongoDB using DataNucleus; we don't support that either. We "support" (commercially, or otherwise) the plugins that we write.

    ReplyDelete
  4. Thx Andy ... I was just "mislead" by the reference to Google BigTable as being supported by DataNucleus AccessPlatform product (which is obviously not the case).

    ReplyDelete
  5. Andy,

    from all that i have read and seen of the datanucleus claims, datanucleus dev(s) have offered a bristly or self-righteous tone which always says "we challenge"

    perhaps just sitting down, and coercing the google starter-efforts into the "correct" datanucleus frame of mind and supporting and standing behind that effort is the real challenge. for such clever and obviously dedicated fellows, how hard could it be?

    you'd certainly pick up a consulting windfall

    ReplyDelete
  6. Glamdring,
    if only you knew ;-) I actually worked with Google to get their plugin to the state it needed to be in April 2009, answering whatever questions they had. During the period April til about Oct 2009 we made plenty of positive suggestions (that would have entailed little work on their part) and in some cases even patches on how it could reach its full potential; these were not taken up.

    The problem Max et al have is (or at least my impression is, from the outside) that Google haven't given them the time to invest in their plugin. It isn't down to capabilities, since they know their datastore better than anyone. Maybe it falls into a common Google pattern of get some code out fast, but then the time taken to get it beyond beta is way too long.

    I only wrote this blog entry after a year of frustrated efforts for them to provide a fair reflection of the standards, and of DataNucleus. As with any situation, only a certain amount of time of encouragment is possible, and beyond that we just have to draw a line under it. So now the DataNucleus docs (2.0, 2.1) make very little mention of GAE/J, and instead other offerings like Cassandra, or HBase will get more emphasis since at least in those areas there are people being positive about the whole thing and trying to reflect things well.

    The same thing has happened with many other people raising issues on GAE/J, and starring the issues in their tracker, yet not getting any action further than that.

    In the meantime JDO will move forward and look at how better to accomodate such "NoSQL" datastores, and maybe learn something from these experiences, and we'll contribute to that.

    Obviously if Google wanted to participate they'd be welcome, and if they'd like help with their plugin they'd also be welcome. But the ball is in their court in that regard. Consulting windfalls are not the primary motivation. at. all.

    ReplyDelete
  7. Really..
    Choosing JPA on GAE/J has been a tragedy for me.. I doubt if any real project can be built on App engine's current JPA support. It's a partial JPA implementation that does not support lots of primitive things.

    Some examples are
    - http://code.google.com/p/datanucleus-appengine/issues/detail?id=86 - Support for unowned relations
    - http://code.google.com/p/datanucleus-appengine/issues/detail?id=20 - Support for @Temporal (support querying based on date/time properties
    -http://code.google.com/p/datanucleus-appengine/issues/detail?id=210 - JPA entity listner callbacks are not invoked when using Query API

    ReplyDelete
  8. It's not explicitly documented anywhere, so I'm more than a little surprised to get this exception:

    org.datanucleus.store.appengine.query.DatastoreQuery$UnsupportedDatastoreFeatureException ... Unsupported method while parsing expression: InvokeExpression{[PrimaryExpression{ancestorKeys}].isEmpty()}

    Am I doing something incorrect or is this "isEmpty()" operator really an unsupported feature?


    Here's the /J:


    Query query = pm.newQuery(Element.class);
    query.setFilter("domain == domainParam && ancestorKeys.isEmpty()");
    query.declareParameters("String domainParam");

    Running this within Spring 3.0.2.RELEASE, in a JDO callback object, on the appspot machine.

    ReplyDelete
  9. > Am I doing something incorrect or is this "isEmpty()"
    > operator really an unsupported feature?

    If the "ancestorKeys" is a collection or map field type then the syntax is correct. Obviously a question for Google, not DataNucleus. If you look at the source code for their plugin
    http://code.google.com/p/datanucleus-appengine/source/browse/trunk/src/org/datanucleus/store/appengine/query/DatastoreQuery.java#877

    they make no specific reference to "isEmpty"; in fact they only seem to bother about "contains", "startsWith" and "matches"

    ReplyDelete
  10. Thanks for advising google

    ReplyDelete
  11. Is it possible to retrieve a list of objects of any class simply by defining a kind of relationship? For example, I want to get all objects that has a "parent-child" relationship no matter what class the are.

    ReplyDelete
  12. Andy, one feature of Google's Datastore is support for for asynchronous/non-blocking database operations, where the result of the operation will be picked up by a later thread [1]. How would this look in JDO?

    cheers,
    David

    [1] http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/datastore/AsyncDatastoreService.html

    ReplyDelete