Tuesday, April 7, 2009

EMML Best practices

EMML is a DSL (domain-specific language) for creating data-mashup services. Here are my thoughts on some of best-practices to follow for architecting EMML-based Mashup data services.

EMML Virtual Services
A re-usable services layer consisting of EMML service wrappers for complex web services & other data-sources. This helps create high-level facades for complex web-service interfaces. It allows helps in loosely-coupling Mashup Services with native Web/Data services. This layer can also help with service infrastructure policies i.e., rate-limiting, data-caching, service versioning, etc.

EMML Utility Services
Utility service layer contain reusable, non-business logic services. This avoids occurunces of common redundant logic in higher-level Mashup Services layer.
Common candidates for common utility services are
Persistence Service
Web clipping service
Domain-specific authentication services [e.g. Salesforce / Google authentication]
Data validation/transformation service
Notification service
Logging service

EMML Mashup Services
EMML Mashup services contain core data-mashup logic. Data originating from down-stream service layers are combined in interesting ways. Data produced by Mashup services are typically consumed and rendered by Mashlets / Widgets. Since it is a composable service layer, Mashup services also consume other Mashup services. Depending on the complexity of mashup logic, these services may be created in eclipse-based Mashup Studio or visually-composed using Wires.

EMML offers a rich-set of declarative primitives for filtering, merging, joining, annotating, sorting, transforming data. There is a rich-set of XPath functions available for use in these declarative EMML statements. Category of functions include comprehensive string, date, number, aggregate functions. Refer http://www.w3.org/TR/xpath-functions/ for the full list.
All these functions are available for use in EMML statements. In addition to these standard functions, Presto provides ability to create Domain-specific custom functions to XPath library. Once added, all these custom functions can be accessed similar to standard XPath functions. This provides a powerful mechanism to create a library of re-usable logic.

EMML Macros
Macros provide power to extend EMML language with domain-specific vocabulary. These are eseentially re-usable EMML snippets that contain finer-grained logic than EMML utility services. Additionally, Macros may be made available in Wires for performing user actions. Thus Macros enrich Wires-based visual-style of programming.
Macros can be created at global or package scope. This provides for Macro reuse across and within business organizations.

XPath functions

Most needs for Java-style imperative logic with Mashup services can be eliminated by using standard XPath functions in EMML statements. In addition custom-XPath functions specific to business domain can be created and added to Presto XPath Library. Subsequently these domain-specific functions can be used as any standard XPath functions in EMML statements. This provides a very powerful mechanism to encapsulate & re-use logic across Mashup Services. Custom XPath functions are written in Java.

EMML Refactoring
During development life-cycle, it can typically be observed that programming artifacts(EMML in this instance) tend to grow larger. At certain point, they become incomprehensible and tedious to maintain. It is essential to refactor such large EMML scripts into more modular, reusable EMML services. In a refactoring process extracted artifacts may include Utility EMML services, EMML Macros or Custom XPath functions.

EMML Policy
By setting appropriate configurations, it is feasible to disable usage of SQL, directinvoke or scripting logic in EMML scripts. This is a good way to constraint arbitrary usage of specific resources. This can typically be enforced for EMML scripts in Mashup Services layer. Any required complex logic in Mashup Services layer can be addressed by leveraging EMML Macros, Utility Services or XPath functions.

EMML Parallel
Parallel construct in EMML provides mechanism for executing coarse-grained parallel tasks. The EMML concurrency model must be viewed as message-passing concurrency model, rather than shared-state, lock-based model. So, typically parallel tasks must be used for performing isolated tasks that involve no sharing of data between them. For example they may be used to invoke multiple parallel services and subsequently merge returned data from multiple services. Importantly avoid usage of a same,shared variable across Parallel tasks.

EMML namespaces
A frequent, common error encountered is not accounting for namespace specification in XPath expressions when used in various EMML statements. Hence, ensure specification of appropriate namespace prefix/uri in xpath expressions. If you choose to ignore namespaces, a ‘*’ wildcard namespace may be used in XPath expression.

A wildcard namespace in filter expression may be specified as follows :

Note the usage of ‘*” wildcard for namespace prefix in the above xpath expression.

EMML ‘nanny’ scripts
EMML offers a rich declarative vocabulary for doing data-mashups. However, it is easy/natural for Java developers to fall-back on imperative programming style for coding data mashup logic. They need to be informed/educated about EMML language features and XPath function library available to them. Scripting logic should be considered only as the last option. Even so, it would be ideal to create EMML Service wrappers that contains such scripting logic. The higher-level EMML mashup services can then reuse the scripting logic
In addition, ‘nanny’ EMML scripts can be written to monitor the usage of Scripting logic in every EMML mashup script. Since, EMML is encoded in XML this EMML-on-EMML approach is feasible.

Such ‘nanny’ scripts would traverse EMML scripts to enforce sample policy constraints i.e.,
check if script, sql tags are being used.
check if any dis-allowed Java Interfaces/classes are being consumed in script
check if any dis-allowed endpoints are being accessed using directinvoke
check if any dis-allowed databases are being accessed using datasource

EMML Error Handling
Service invocations to external entities are fraught with reliability issues related to communication failure or poor providers QoS metrics. Hence, its best to be prudent about error handling / fault-tolerance.
When invoking a service using invoke or directinvoke, specify a timeout attribute. This ensures that the service invocation does not wait forever, and is aborted beyond a certain time period. Upon invocation error, it would be best to fail-over to any semantically equivalent alternate service. Typically, this fail-over service may be hosted in-house with previously cached or fall-back data. EMML offers primitives like onerror=continue/abort; faultcode, faultexception variables to handle Error processing.

EMML Testing
All Mashup Services are exposed using REST or JUMP (HTTP + JSON) interface. Any HTTP Client-based Load-Testing tool can be used for automated, regression and load-testing Mashup Services.

1 comment:

partha said...

Hi Raj,
very informative article. I have a question. How do we enable editorial intervention in Presto? I mean, if there is a requirement to edit (manually and/or programmatically) the Feed content after aggregation, and before publishing, how do we do that? Also, is there a caching mechanism in Presto, wherein we can store the output mashups, and alsospecify the refresh time for each output?