2017/12/13

Be lazy with your data

jBPM comes with really handy feature called pluggable variable persistence strategy (or shorter marshalling strategies). This is the mechanism responsible for persisting process instance and task variables to their data store - whatever kind of data store you use.

An important thing to keep in mind when using marshalling strategies is the fact that each time variable is read it will read that from the data store which might be:

  • file system
  • data base
  • REST service
  • document management system e.g. ECM
  • and more
depending on the accessibility of this service and its performance this might not be a big deal but if that data is loaded many times it might become an issue. Especially if the data is of fair size like documents. 
Imagine process instance operates on bunch of documents (word, pdf, etc). Each of this document is of 2MB size which will give 20 MB for 10 documents. That means every time user interact with this process instance all documents will be loaded from external system. And this is not a problem (or it is actually how it should behave) when these documents are needed for the work user intend to perform. 

But what if (s)he is not going to use all documents or even not a single one? Or if it's not a user but an timer is firing of to send reminder to users?

Should all documents be loaded then? Obviously it does not make much sense though the problem with this decision is how would process instance know if given data will be used or not? And because of that it simply loads all variables whenever process instance (or task) is loaded.
This is default mechanism for process instance and task variables that are stored as part of them as it does not make much sense to separate them since their life cycle is bound to each other - always loaded and stored together. So this does not bring any overhead or additional communication effort.

Though situation looks slightly different for variables that are actually stored externally - like physical documents stored in ECM system. In this case, documents (including their content - 2MB each) will be loaded regardless if the data will be used or not. Moreover, in high volume systems this might put unnecessary load on ECM to constantly load and store documents which actually didn't change.

jBPM 7.6 comes with support for lazy load of variables to resolve these issues. Though it won't magically apply to all your variables that are stored externally, but will provide all the support from the engine side to take advantage of lazy loading. Since the mechanism for loading variables will differ based on the back end data store it's up to user to apply this principle which is rather simple as follows:
  • your variable must implement org.kie.internal.utils.LazyLoaded
  • your marshaler strategy needs to provide kind of service responsible for loading content of the variable - user by load method of the variable
  • your marshalling strategy should not load content by default but set that service on the variable so can be used to lazy load when needed
  • to avoid unnecessary operations to store variable, implement sort of tracking system on your data to identify if variable has changed and store it only if it did, your marshaler should check this tracking mechanism in marshal method and should reset it after loading variable in unmarshal method

jBPM 7.6 provides this mechanism as part of its support for documents (jbpm-document module). The DocumentImpl implements both LazyLoaded and tracking mechanism to resolve both issues (too often load and too often store). This extremely improves overall performance as it reduces number of reads and writes to document service at the same time provides content on demand and only when needed.

I'd like to encourage everyone who stores process or task variables externally to make it lazy loaded and tracked to improve the performance of your system and reduce load on your back end data store.

2017/11/24

Case management - mention someone in comments and we will notify them

Another post around case management shipped with jBPM 7. This time some additions for case comments that improves communication and collaboration.

jBPM 7.5 brings in case comment notifications.

That means case participants can be notified directly when they are being mentioned in the case comments. What's really worth noting is that this is completely integrated with case roles. In fact when you mention someone you mention by case role, there is no need to know who is actually assigned to the given role.



That enables smooth collaboration and plays nicely with dynamic nature of case instances and case role assignments.

Comment notifications

Comment notification processing is provided as case event listener that is NOT registered by default though it can be easily registered via deployment descriptor (as event listener).



This listener has three properties that are configurable but comes with defaults for easier setup

  • sender (defaults to cases@jbpm.org)
  • template (defaults to mentioned-in-comment)
  • subject (defaults to You have been mentioned in case ({0}) comment)
{0} place holder will be replaced with actual case id that comment belongs to.

So if you would like to use different values for sender, template and subject, you can provide them as constructor parameters when registering the listener in deployment descriptor:

new org.jbpm.casemgmt.impl.event.CommentNotificationEventListener(
"no-reply@domain.org", 
"custom-template", 
"New comment has been added to case {0}");

Comment notifications are published only when there are any roles mentioned. Roles are then resolved to users or groups assigned to the mentioned roles and then used to publish notification.

First it will attempt to publish notification with the template name it has defined, and if not found then it will send the comment body as raw text.

When sending with templates it will provide following parameters that can be used in the template to be replaced with actual values:
  • _CASE_ID_ - case id of the case the comment belongs to
  • _AUTHOR_ - author of the comment
  • _COMMENT_ - comment text (body)
  • _COMMENT_ID_ - unique id of the comment
  • _CREATED_AT_ - date when the comment was added
Then in the template you can refer to these parameters like this:
You have been mentioned in case comment by ${_AUTHOR_} most likely requires your attention

Notification publishers

Notifications are build as pluggable component and by default jBPM 7.5 comes with email based notification publisher. Notification publishers are discovered on runtime based on ServiceLoader mechanism, so as long as you have jbpm-workitems-email on class path then you have email based notification publisher.

Notification publisher interface (org.kie.internal.utils.NotificationPublisher) comes with three methods:
  • publish notification with raw body
  • publish notification built from a template
  • indication if given publisher is active

Email notification publisher

Email based notification publisher uses two important components to do its work:

  • java mail session that can either be retrieved from JNDI of application server (recommended as it hides the complexity needed to properly setup and secure communication with mail server) or based on property file
  • user info implementation to resolve user ids to email addresses
Since email publisher is discovered on runtime via ServiceLoader then there is no easy way to pass in other objects when initialising it. So it relies on some convention:
  • java mail session
    • register mail session in JNDI via application server configuration and then point the JNDI name via system property so jBPM can pick it up
      • -Dorg.kie.mail.session=java:jboss/mail/jbpmMailSession
    • create email.properties file on the root of the class path with following properties
      • mail.smtp.host
      • mail.smtp.port
      • mail.username
      • mail.password
      • mail.tls
  • user info implementation is automatically retrieved from the environment via org.jbpm.runtime.manager.impl.identity.UserDataServiceProvider

To handle templates there is an extra component (TemplateManager) that allows to deal with html based templates managed with free marker to render actual body of the notifications.

Email templates

Emails can be build with templates to allow better looking notifications than just raw content. TemplateManager that comes with email notification publisher can be used to define custom templates and then use them for your notifications and also email (that are send via EmailWorkItemHandler).

To make use of email templates you need to set the location where those templates are stored via system properties:
  • org.jbpm.email.templates.dir - this property should point to an absolute file system path and must resolve to a directory.
  • org.jbpm.email.templates.watcher.enabled - optionally you can enable watcher thread that will load updated, added or removed templates without the need to restart the server
  • org.jbpm.email.templates.watcher.interval - when using watcher you can also control the poll interval for it, it defaults to 5 and can be changed with this property that represents number of seconds
Emails are processed by free marker so make sure you define templates according to rules for free marker. Each time comment notification is issued it will attempt to use template named mentioned-in-comment and then if failed will send the comment as raw notification.

Implementing your custom notification publisher

Since notification publishers are pluggable you might want to implement your own, e.g. SMS notification. To do that follow these simple steps and off you go:
  • implement org.kie.internal.utils.NotificationPublisher your custom class with the mechanism chosen to send notification
  • create file name org.kie.internal.utils.NotificationPublisher under META-INF/services and paste the fully qualified class name of your implementation
  • usually it's a good practice to implement isActive method with system property to allow to disable given publisher on runtime without a need to remove it from class path 
  • package everything in a jar file and place it on class path e.g. kie-server.war/WEB-INF/lib
And that's all, your notification publisher is now active and will be used whenever someone is mentioned in the comments.

See it in action

Below is a screencast showing it in action using Case management showcase app.


As usual stay tuned and comments are more than welcome.

2017/11/14

Tune async execution in Kie Server

jBPM comes with handy feature for executing async jobs - essentially it allows to put a wait state almost at any place in the process definition. Either by marking node to be executed asynchronously or by using async work item handler.
For improved performance JMS based trigger has been provided (long time ago ... in version 6.3 - read up more on it here) and since then it's being used more and more.

With this article I'd like to explain how to actually tune it so it performs as expected (instead relying on defaults). It will focus on the JMS tuning as this brings the most powerful execution model.

So let's define simple use case to test this.


Basic process that has two script nodes (Hello and output) and work item node - Task 1. Task 1 is backed by async work item handler meaning it will use jBPM executor to execute that work asynchronously. The output script node will then hold execution (by using Thread.sleep) for 3 seconds for the sake of the test. This aims at making sure that given thread will not be directly available for other async jobs so it will actually illustrate tuned environment compared with defaults.

Default configuration

in WildFly (tested on version 10) there are two sides of the story:
  • The number of sessions that the JCA RA can use concurrently to consume messages.
  • The number of MDB instances available to concurrently receive messages from the JCA RA's sessions.
First is controlled by the "maxSession" activation configuration property on the MDB.
Second is controlled by the bean-instance-pools configured for MDBs in the "ejb3" subsystem in the server configuration (e.g. standalone-full.xml)

So maxSession activation configuration property is not set at all on KIEServer MDBs (none of them) and thus a default value is used which is 15.
Then "ejb3" subsystem default configuration is set to be derived from worker pools. Which will then differ based on your actual environment


The actual values can be inspected using jconsole (make sure you use jconsole from WildFly instead of default that comes with Java).

Below is the default setting for consumer count - refers to maxSession for given MDB



Below is default pool size derived from worker pools.


Execution on default configuration

Knowing what is the default configuration lets test its execution. The test will consists of executing 100 instances of the above process which will then result in executing 100 async jobs. With default settings we know that max number of concurrently executed async jobs will be 15 so total execution time will certainly be longer than expected.

And here are the results:
  • start time:09:44:37,021
  • end time: 09:44:58,513
Total execution time to complete all 100 process instances was around 21 seconds. Quite a lot as each instance sleeps for 3 seconds so the time to complete all 100 seconds should not take 7 times more time then it should.

Though this makes perfect sense as it will process these instances in 15 instances batches. As that is the limit to how many concurrent messages can be consumed.


Tuning time

It's time to do some tuning to make this execute much faster than on the defaults configuration. As mentioned, there are two settings that must be altered:

  • maxSession for KieExecutorMDB
  • max pool size in the "ejb3" subsystem
Activation config property can be set either via annotation on the MDB class or via xml descriptor. Since we don't want to recompile the code we'll go with the xml descriptor that overrides the annotations if they overlap.
To do so, edit kie-server.war/WEB-INF/ejb-jar.xml and add/merge following code:

<ejb-jar id="ejb-jar_ID" version="3.1"
      xmlns="http://java.sun.com/xml/ns/javaee"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
                          http://java.sun.com/xml/ns/javaee/ejb-jar_3_1.xsd">


    <enterprise-beans>    
    <message-driven>
      <ejb-name>KieExecutorMDB</ejb-name>
      <ejb-class>org.kie.server.jms.executor.KieExecutorMDB</ejb-class>
      <transaction-type>Bean</transaction-type>
      <activation-config>
        <activation-config-property>
          <activation-config-property-name>destinationType</activation-config-property-name>
          <activation-config-property-value>javax.jms.Queue</activation-config-property-value>
        </activation-config-property>
        <activation-config-property>
          <activation-config-property-name>destination</activation-config-property-name>
          <activation-config-property-value>java:/queue/KIE.SERVER.EXECUTOR</activation-config-property-value>
        </activation-config-property> 
        <activation-config-property>
          <activation-config-property-name>maxSession</activation-config-property-name>
          <activation-config-property-value>100</activation-config-property-value>
        </activation-config-property> 
      </activation-config>
    </message-driven>
  </enterprise-beans>
</ejb-jar>

adjust as needed value of the maxSession config property - here set to 100.

Edit standalone-full.xml of the server and navigate to "ejb3" subsystem. Configure the strict max pool for pool named mdb-strict-max-pool. To do that add new attribute called max-pool-size with value that you want to have. Please also remove the attribute derive-size as they are mutually exclusive - you cannot have both present in the strict-max-pool element.

 <pools>
    <bean-instance-pools>
        <strict-max-pool name="slsb-strict-max-pool" derive-size="from-worker-pools" instance-acquisition-timeout="5" instance-acquisition-timeout-unit="MINUTES"/>
        <strict-max-pool name="mdb-strict-max-pool" max-pool-size="200" instance-acquisition-timeout="5" instance-acquisition-timeout-unit="MINUTES"/>
    </bean-instance-pools>
</pools>
That's all that is needed to tune the async execution for jBPM in Kie Server.

To validate, let's look at jconsole to see if the actual values are as they were set

First look at consumer count

Next, looking at pool size for mdbs

All looks good, max session that was set to 100 (instead of default 15) is now set as consumer count, and max pool size is now set to 200 (instead of default 128).


Execution on the tuned configuration

so now let's rerun exactly the same scenario as before and measure the results - starting 100 process instances of the above process definition.

And here are the results:
  • start time:09:47:40,091
  • end time: 09:47:45,986
Total execution time to complete all 100 process instances was bit more than 5 seconds. This proves that once the environment is configured properly it will result in expected outcome.

Note that this was tested on rather loaded local laptop so on actual server like machine will produce much better results.

Final words regarding tuning

last but not least I'd like to stress out few other things to keep in mind when going for more throughput for KIE Server:
  • make sure your data source is configured with enough connection - it defaults to 20
  • make sure your JVM is configured properly and will have enough memory (to say the least)
  • make sure number of threads in particular pool is configured properly - according to needs
  • make sure you don't use Singleton runtime strategy for your project (kjar) - use either per process instance or request

And that's it for today. Hope this will be helpful to get the performance and throughput the way you need it.

2017/10/31

Sub cases for case instance and ... process instance

This time I'd like to present support for Sub Cases. It actually provides very powerful concept as it gives option to compose advanced cases that consists of other cases. That way a large and complex case can be split into multiple layers of abstraction (and even multiple case projects). Similar as process can be split into multiple sub processes.

Let's take a look at what case is - or to be more precise, a case definition. Case definition is set of activities that might happen while the case instance is active. It is completely dynamic and thus what's in the definition is not the only activities that might be invoked in case instance. Dynamic activities can also be included on runtime (on already active case instance).

Case instance, is a single instance of case definition and thus should encapsulate given business context. All case instance data is stored in case file which is then accessible to all process instances that might participate in the given case instance. Each case instance is completely isolated from other and same applies for case file. Only owning case instance can access case file.

So with that short recap of case terminology let's take a look at sub cases. Sub case is another case definition that is invoked from within another case instance (or even regular process instance). It has all the capabilities of regular case instance:
  • has dedicated case file
  • isolated from any other case instance
  • its own set of case roles
  • its own case prefix
  • etc
Sub case activity is available in the process designer, in category Cases as shown in the below screenshot


Sub case supports following set of input parameters that allow to properly configure it and start it:
  • CaseDefinitionId - mandatory case definition id that should be started
  • DeploymentId - optional deployment id (container id in context of kie server) that indicates where the case definition (that should be started) belongs to
  • Independent - optional indicator that tells the engine if the case instance is independent and thus main case instance will not wait for its completion - defaults to false
  • DestroyOnAbort - optional indicator that tells engine how to deal when the sub case activity was aborted - cancel case or destroy - defaults to true (will destroy the sub case - remove case file)
  • Data_XXX - optional data mapping from this case instance/process to sub case, where XXX is the name of the data in sub case this mapping should target. Can be given as many times as needed
  • UserRole_XXX - optional user to case role mapping. this case instance role names can be referenced here - meaning an owner of main case can be mapped to an owner of the sub case that way whoever was assigned to main case will be automatically assigned to sub case, where XXX is role name and value is the user role assignment
  • GroupRole_XXX - optional group to case role mapping. this case instance role names can be referenced here - meaning a participants of main case can be mapped to participants of the sub case that way whatever group was assigned to main case will be automatically assigned to sub case, where XXX is role name and value is the group role assignment
  • DataAccess_XXX - optional data access restrictions where XXX is the name of data item and the value is access restrictions.


Regardless of the settings of the Independent flag, there will always be output variable available named:
  • CaseId - this is the case instance id of the sub cases after it was started.
In addition to that, current snapshot of case file data will also be given. And thus can be mapped to main case data. They will be named exactly the same as in sub case instance's case file. As presented in above screen shot, sub case file item named name is mapped upon completion to main case instance case file item named name.

You can take a look at following screencast to see it in action


Sub case support also applies to regular processes. That allows users to start a case from within a process instance. This can be seen in the next screen cast


As usual, comments are more than welcome


2017/10/27

Improved KIE Server documentation

In the upcoming 7.5 release, users will finally see more information about the available REST endpoints of KIE Server. The runtime documentation has been completely rewritten and is now based on Swagger. It replaces previous docs and thus is available at exact same location

http://localhost:8080/kie-server/docs

You might need to adjust the host, port and context path depending on your system setup.

So what's new in this documentation?

  • finally it actually documents the endpoints and not only list them
  • presents only enabled KIE server extensions' endpoints
  • is built as KIE Server extension so it can be disabled, e.g. production deployments
  • provides direct possibility to try given endpoint out
  • shows all possible parameters and they expected values
  • gives curl like examples if needed to try it out from command line

Personally I think that the most important features are:
  • possibility to directly try it out
  • shows only endpoints that are active at given server
Second feature is very useful as it does not make unnecessary noise when some of the extensions are disabled. For instance when KIE Server is used as decision service (BRM and DMN extensions only) then there is no need to show endpoints from BPM (which is the most verbose one - in terms of number of endpoints).

Take a look at this short screencast that illustrates it in action - showing how you can actually use the Swagger docs to run processes and user tasks.



An advantage is that now there is a rule in place that makes it mandatory to describe that way every and each REST endpoint being added to KIE Server.

Custom extensions can do that too, just make sure your custom endpoint class has swagger annotation on the type level and once added to KIE Server (and restarted) they will show up in the docs.

@Api(value="NAME OF YOUR RESOURCE")

An example can be found here 

It's all thanks to contribution


I would like to thank Duncan Doyle who was the initiator of this work and actually contributed the replacement of the docs. Another great example that contribution is a win win :)

Note that this is first version of the docs and I'd like to encourage everyone to contribute to improve the docs. Now it's way easier than it was before so ... it's up to you too now :)

2017/10/17

Case management improvements - data authorisation

It's been a while since the last article about case management so I thought it's the time to refresh that topic a bit. And the best way to do it is with the face lift of sample case application - IT Orders App.

So what's new in case management since last time?

  • case file authorisation
  • case comments authorisation
  • case close with comment 
  • index for case file items for searching 

So let's start with the most significant improvements - case file and case comments authorisation. Prior to this, all data and comments were visible to all participants of the case instance. So as long as user has access to case instance (is assigned to any of the roles) she will have access to all the data in the case file. Similar will have access to all comments.

This actually prevents of using rather common mechanism within data driven applications - access control and visibility. Let's take into account situation where there are sensitive data that only certain roles in the case instance should be able to view or maybe there are needs to post private comments. Without the authorisation within the case instance users would not be able to deal with such use cases.

jBPM 7.5 will bring solution to this by allowing users to define restrictions for both comments and data within case file. 

Case comments authorisation 

Case comment can be restricted to roles within the case instance when the comment is:
  • created 
  • updated
Whenever comment is updated or removed authorisation checks are performed to make sure that the action is done by a privileged user. 
When comments are retrieved they are filtered by authorisation to ensure comments eligible for given user will be returned.

Case file authorisation 

Case file data is protected individually, each item in the case file can have its own access restrictions. Similar to comments, whenever data is put into the case file or removed from it, authorisation checks are performed. When case instance or its data is retrieved it is filtered by authorisation to ensure only eligible data will be present in the data set.

Case file data access restrictions can be set in following ways:
  • within case definition by setting custom meta data called customCaseDataAccess that supports multiple case file items with one or more roles for them. It expects following format:

         item:role1,role2;another:role2,role3
  • when creating a case instance (start a case) access restrictions can be given as part of the case file, next to data and role assignments
  • when putting data into a case file access restrictions can be provided

Close case with comment

Case instance can be completed when there are no more activities to be performed and the business goal was achieved or it can be closed prematurely. In the second case, it's quite often needed to make a comment why the case instance is being closed. This feature has now been provided as part of jBPM 7.5 so that can help to keep track of various scenarios that led to case instance being closed.


Indexing of case file items

Case file data is really powerful when case instance is running, rules can be built on top of them, they can be added or removed at any point in time. But when it comes to searching by that data, going through each case instance to find out if it matches given criteria or not does not sound like a good idea. 
To provide more efficient support for such operation, case file data is indexed (similar to how process instance variables are). They are stored in data base table and constantly being updated so it represents the latest state of the case file.

This allows then to easily and efficiently:
  • find case instances that contain given data (by name and value)
  • filter case file by data name or type
  • build up UI based on the index rather than loading all the data - keep in mind that case file might contain any type of information like large documents etc

Some of these features are included in the face lift of the IT Order Case App that you can watch below.


As usual, comments are more than welcome... ideas for improvements and real case studies for case mgmt even more :)

2017/10/11

where did the authoring go? getting started with workbench and kie server on 7.3 and onwards

Recently there were number of questions on mailing list about authoring perspective missing in recent versions of jBPM and/or drools... that lead me to take few minutes to make this write up and attached screen case to show that ... it was not removed :)

It was actually redesigned for easier access and to improve user experience when using the workbench. Though for people who get used to the "old" way could be a bit lost at first. So here it goes:



What you can see in this screen cast is:

  • where to go to see the projects
  • how to create new project
  • how to import new project 
  • how to import project from external git server
  • how to find the settings and various configuration options of the project
  • how to build and deploy
  • where to find deployed projects into execution servers
  • where to find process definitions
  • where to start new process instance
Nothing fancy but might get you started much easier after moving from previous version.

Comments are more than welcome in terms of improvements and feedback about usability.

2017/09/08

Integrate systems with processes - jBPM workitems

Processes are almost always completed in several steps, or activities. These activities work together to achieve common goals that define the process they are part of. Let’s just jump straight into an example:
Mary's goal is to go on vacation. To achieve this goal she has to complete several activities such as schedule time off with her employer, book a flight and a hotel, notify her friends on social media about her travels, find transportation from the airport to her hotel, and so on. There can be also a number of unforeseen activities that happen during her travels such as updating health insurance info in case of an illness/injury or checking the weather forecast in case of a storm at her destination location. The point here is that all of  Mary’s activities involve existing systems such as booking, bank, online systems etc.. Mary’s activities have to be able to integrate with these systems in order to achieve her goal.
If we look at Mary’s activities making up a process which we have to define and implement and then look at any process runtime engine out there it is clear that none come out-of-the box with every single integration point Mary needs. It’s clear that the same pretty much applies to any process out there so we need an easy and intuitive way to define and implement integration points to other existing systems.
So how do we deal with this integration of processes and systems in jBPM? Well the good news is that jBPM out-of-the-box provides integration with all existing and future systems out there. The bad news is that's not really true. What jBPM does provide however is the ability for you to define and implement these integration points rather quickly and start using them right away to first of course test and then run your processes. This integration is done through (to use the jBPM lingo) “custom work items”, process activities where you as developer can define how Mary books her flight, what information she will need to make the booking, what company she will book from, etc etc.
Some people when reading this might think like “Oh, here comes yet another proprietary integration technology that I’ll be stuck with forever and eternity.” but that is far from the case. The short answer here is that the integration between systems and processes with jBPM is made of two parts, the markup part (BPMN2) which is completely portable and the definition/implementation part which is just your own implementation code, so it’s reusable between different systems. For people that are concerned about the setup time for all this, be assured that it is not bad as jBPM provides tooling (Maven archetype, installation of custom work items in the workbench etc) that help you along the way to get your job done. All of this will be explained in detail in the rest of this text so if you have gotten this far and are still interested in what jBPM has to offer here stick around and read along. As always comments and suggestions are more than welcome, and in case you are a cool developer that would like to contribute to this or any other parts of jBPM here is the starting page to get your started on that: https://www.jbpm.org/.
  1. Let’s get started - jBPM workitems module
If you are starting to use or are already a jBPM pro you should/will know where the code is on github: https://github.com/kiegroup/jbpm. jBPM is made up of many different modules but the one we are concerned about here is the jbpm-workitems module: https://github.com/kiegroup/jbpm/tree/master/jbpm-workitems. This module in itself contains many more modules so let’s take a look at what’s there:
  1. Pre-existing integration points: jbpm-workitems module contains a number of pre-existing integration points such as Email, Jabber, Rest, Webservice, etc.  These are the “out-of-the-box” integration points that you can use as-is or start extending for your own process needs.
  2. Light-weight core: the jbpm-workitems-core module contains the core utilities (and test utilities) to start creating and testing your new custom work item with jBPM. All custom workitem modules extend this core.
  3. Workitems Maven archetype: jbpm-workitems-archetype is a Maven archetype you can use to easily create your own workitems maven project. We will go through all steps on how to use this archetype in the sections below so don’t worry. This archetype not only helps with workitems project generation but also helps you with automatic generation of your integration configuration definition so all you have to be concerned about is your own implementation of the integration. Another nice thing about using the workitems archetype is that the maven project it creates for you has the same structure as any other module jbpm-workitems so if you can are are willing to contribute your integration solution to jBPM and share with the whole jBPM community the archetype makes that very simple for you.
      2. Learning by doing - let’s help Mary!
For the sake of all of our times here we will focus on one specific part of our process to help Mary during her vacation planning activities. We will help her determine what type of clothes she should pack depending on the weather forecast during the time of her travels. In our example this will be the only activity we implement as an integration point and we will integrate with the free Yahoo weather service (via an also free Java api for this service) to find out everything we need to help Mary make her clothing decisions.
    a) Install jBPM
There are a number to start using jBPM locally. You can find the download links and instructions  here. You can also clone the jBPM git repo directly and install it to your local Maven repo with:
mvn clean install -DskipTests
Whatever installation path you chose at the end you should have the jBPM workitems archetype registered locally and be able to start using it.
    b) Build your workitems module
Create a directory (let’s say “myworkitems”) where you want your workitems module to be. Inside that directory let’s run:
Here is the Maven archetype:generate command so you can cut/paste it. Make sure to change the artifactId and the classPrefix to whatever you want (you can also change the version of your workitem if needed, it defaults to 1.0).
The archetype:generate command will prompt you to confirm the given inputs through the above command. Just type “Y” to confirm if everything is per your liking. The command should produce output similar to this:
You should now see the “weather-workitem” maven project generated for you inside your myworkitems directory. This is our base workitems project so let’s import it into your favorite IDE and start working on it (I use IntelliJ but you can easily upload maven projects inside Eclipse etc).
The workitem archetype has generated for us a multi-module Maven project which already includes our base workitem handler (the Java code that will get executed when our process reaches our weather workitem, a base JUnit test for our workitem handler as well as Maven assembly instructions on how to package our workitem so that it is easily uploaded to the Kie Workbench or the jBPM workitem repository.
    c) Implement workitem details
Now let’s expand on top of what the generic project that the jbpm workitems archetype has created for us to start plugging in the integration logic to the Yahoo weather service. For this let’s start editing our WorkItemHandler class:
Ok, so this need a little explanation. A WorkItemHandler is a class that will get executed during process execution when the process reaches our weather activity. The jBPM runtime will then call our handlers executeWorkItem method passing in a WorkItem instance which includes our activity’s input parameters (which we specify). After our integration implementation code is completed we can chose to return back some results, or simply complete our workitem handler by calling the completeWorkItem method on the passed in WorkItemManager instance.
I’m sure you also noticed all the Java annotations on top of our class definition. They are here to help us generate our workitem configuration metadata which then is used by process editors (such as the jbpm-designer) inside the Kie workbench to allow us to drag/drop our workitem activity as we can any other BPMN2 type onto our canvas during process modelling. This configuration is done in a separate file and uses MVEL format but we don’t have to be concerned with it as we can use simple annotations to define our workitem configuration and our project already has assembly instructions to generate this configuration file and package it properly for us.
There are 3 annotations that we need to tackle in order to correctly set up our workitem configuration, namely @Wid, @WidParameter, and @WidMavenDepends (by the way Wid stands for workitem definition). The @Wid annotation is the parent annotation where we specify configuration information about our workitem such as its display name, description, display icon, the Java handler which should be executed when our workitem is active during process execution (defaultHandler), the input and output parameters of our workitem as well as all maven dependencies we need in order to properly execute our workitem implementation during process execution. With this information both the BPMN2 editor as well as the jBPM runtime will be able to know how to display our workitem as well as how to execute it.
The workitem archetype we used has already set up all the parameters with default values for us. All we need to do now is expand on that and implement our own integration with the weather provider system.
After some “fun” coding our workitem handler could look like this:
Let’s see what we have changed from the default handler code generated by the archetype. Well first we have defined one input and one output for our workitem namely LocationZip and WhatToWear. LocationZip is the information Mary has to provide and is the zip code of the location where she is traveling to. WhatToWear is the response of our workitem once we have the weather information for her location from the our weather integration system of choice. We have also updated the mavendepends configuration section to add maven dependency to the Java api which we use here to communicate with the Yahoo weather system. This is needed so that all proper dependencies are available when our workitem handler is being executed.
The actual implementation in the executeWorkItem method is rather simple (and just a test impl for this example). We contact our weather service passing it the zip code Mary provided and retrieve the weather forecast. The evaluation of this data should ideally be implemented via business rules (Drools) in a separate Business Rule Task inside our process but for the sake of this example we make a simple (and most likely wrong) assumption that if the average temperature in the next 10 days at the specified locations is less than 20 degrees Celsius it will be “cold” and if it’s above it will be “warm” enough to pack a swimsuit. Yes, rather simplistic but you can go all out and implement this any way you feel in your workitem handler.
By the way, don’t forget to also add the maven dependency of all apis you may use in your projects pom.xml. For this example we had to just add to the dependencies section:
<!-- weather provider api (yahoo) -->
   <dependency>
     <groupId>com.github.fedy2</groupId>
     <artifactId>yahoo-weather-java-api</artifactId>
     <version>2.0.2</version>
   </dependency>
At this point you can also update the already generated workitem test class and run the JUNIT tests to make sure that everything is fine.
    d) Build, deploy and start using your new workitem
You have your workitem implementation done and tested and now we are ready to build our workitem project and start using it inside the Kie Workbench. Yay!.
First thing let’s build our project. Under our project directory run:
mvn clean install
This will create build our workitem project, run our tests and produce a zip file which includes everything we need to start using our workitem. The zip file we need is located after project build in your $project_dir$/target directory and is called for our example jbpm-workitems-weather-workitem-1.0.zip. Create a new directory anywhere on your local file system and then let’s unzip our jbpm-workitems-weather-workitem-1.0.zip there, you should have something like:
The file structure we extracted from the zip file not only includes already all the files we need to start using our workitem in the jBPM tooling and the Kie Workbench but also already has the exact format needed to be uploaded to the jBPM Workitem Repository.  We will go into this repository in more detail in the next writings but if you are already familiar with it then note that you can upload the files here as-is and there is no changes needed to start installing your workitem via the jbpm-designer in the workbench.
e) Install your workitem to the Kie Workbench
In order for our workitem implementation to be recognized and available inside the Kie Workbench we need to install it first. There are a couple of ways you can do this depending on what’s best for you:
  1. Install on appserver startup: This can be done by adding the following options to the appserver startup command:
./standalone.sh -Dorg.jbpm.service.repository=file:///Users/tsurdilovic/devel/tmp/myworkitems/weather-workitem/jbpm-workitems-weather-workitem/target/tmp -Dorg.jbpm.service.servicetasknames=WeatherWorkitem
-Dorg.jbpm.service.repository specifies the location on our file system to the directory where we extracted our workitem zip file, and -Dorg.jbpm.service.servicetasknames specifies the names of all our workitems (in this case just one) that we want to install once the server starts up.
With this approach on app server startup our workitem will be installed directly.
   
    2) Create a business process in Kie workbench and install it from your workitems repo:
With this option you can either first upload the contents of our created zip file to your remote jBPM workitem repository or you can use the file system directly to point jBPM designer to install our workitem assets directly from where we extracted the zip file. Here is how that would look:

This allows you to connect to your jBPM workitems repository via url and will list all available workitems that are in the repository. We can now click on the little wrench icon next to our weather workitem and the workbench will install it for us. Once installed you will need to save and re-open your process in order for it to be present in the palette under “Service Tasks” (note that if in our handler you specify a category for our workitem configuration our workitem will be under that section in the palette.
f) Create our business process and execute it
Once we have started our appserver running the kie workbench war and have given it instructions to install our created workitem we can go ahead and create a new business process that can look something like this:
As you can see here our workitem was installed and is ready to be used within the jbpm-designer palette on the left-hand side. Remember the icon we specified in the @Wid annotation of our workitem handler class? Well I changed the default icon that is provided by the archetype to a more “weather-ish” icon here and you can do the same (any png you want, prefered size is 16x16 pixels). Our simple business process is very simple - when the process starts Mary will be prompted to enter in a zip code of the location she is traveling to. The process then will run through our workitem handler executeWorkItem method when our workitem is activated and then map our workitem results to a process variable which is then just printed to the console via the “print results” script task...and that’s it!
You will notice that since we asked our appserver to automatically install our workitem it took care of setting our projects dependencies for us as well as add the handler information in the projects deployments descriptor as well for us:
Now all that is left is for you to compile and deploy your project (you will need execution server running on the same or another app server and connect it to your workbench) and start playing around with your test process.
I hope this text shows how easy it is to start integrating different services inside your processes in jBPM. We plan to further increase the automation and lessen the amount of work you have to do to get up and running with this and many other aspects of jBPM in the near future.

2017/08/24

Cloud runtime architectures for jBPM

In the days where more and more software is moving to cloud, I'd like to take a moment (or two) to describe various runtime architectures that jBPM can be deployed with.

This article mainly talks about version 7 and onwards though some of the aspects are also applicable for version 6.

Terminology



  • Admin Console - is workbench (or it's lighter version that includes only runtime views) with embedded controller
  • Controller - is KIE Controller used with KIE Servers that are running in managed mode
  • Smart Router - is a optional component that acts as kind of intelligent load balancer as it can both route requests to individual KIE Servers and aggregate data from different KIE Servers
  • Managed KIE Server - KIE Server that is connected to Controller and takes the configuration from the controller, overriding anything it has locally (even included in the image)
  • Unmanaged KIE Server - KIE Server that runs completely standalone, does not require any other component to be fully functional
  • Managed Smart Router - router that is connected to Controller though it owns the right to dynamically update server template it represents 


Architecture 1: Immutable unmanaged KIE Servers with Smart Router and Admin Console

This architecture is as cloud native as possible, it promotes the immutable execution servers paradigm that means both execution server itself and all KJARs should be colocated and included in the image itself (with all dependencies). That way any instance of that service (regardless when it starts) will always be identical.
It then uses Managed Smart Router to benefits from routing and aggregation. Smart Router will dynamically update Controller with new containers coming in so Admin Console can properly setup clients to interact with it.


In this architecture end users interact with KIE Servers always via Smart Router - either by using Admin Console and its runtime views or by another application.

Individual KIE Servers can come and go at anytime and register/unregister in the Smart Router. These KIE Servers might be with the same id - meaning representing the same image or different images. In this case (since images are immutable) they will represent other set of kjars.

As an example, let's look at the diagram above:

  • There are three KIE Servers behind the router - each represents independent image (meaning it has different kjars included)
    • KIE Server - ABC
    • KIE Server - DEF
    • KIE Server - GHI
  • There could be multiple instances of given KIE Server image - multiple PODs in OpenShift terminology
  • KIE Server starts completely independent from each other and smart router (though it will constantly attempt to register in Smart Router in case it was not up at the time KIE Server started)
  • KIE Containers to be started is included in the image


Architecture 2: Immutable managed KIE Servers with Admin Console and optionally Smart Router

Immutable managed architecture is slight variation of the first architecture though all KIE Servers are managed by controller thus Smart Router is then optional component as users can access individual KIE Servers directly as they are managed. Smart Router is needed when end users should be able to look at all KIE Servers at once instead of grouped by server template.

The images are still immutable, meaning they include KJARs that should be running, but the final word what set of KJARs (included in the image) should be started has controller. This is to secure that both Admin Console and KIE Server have same set of KIE Containers defined.

In practice, this architecture requires additional step in the deployment (as part of the deployment pipeline) to create immutable server template in the controller that would match kjars included in the image. Main reason for this is to protect the Admin Console from being affected by wrong image connecting with template id that it should not. So Admin Console completely relies on the controller configuration rather than runtime.


In this architecture, users can use both router or individual KIE Servers to interact with their capabilities. Same rules apply as for first architecture when it comes to KIE Server images and their instances. Whenever new instance of the KIE Server starts it register itself in the controller and optionally in the Smart Router.

Architecture 3: "Empty" managed KIE Servers with Admin Console and optionally Smart Router

Another architecture moves into another direction, instead of making the images immutable it promotes the dynamic behaviour of KIE Server. "Empty" KIE Server means that the image is without any KJARs included, it's pure KIE Server runtime that when started has nothing deployed to it.

With managed capabilities, controller can dynamically instruct KIE Servers what needs to be deployed. So the additional deployment step (as in architecture 2) can be used. By making the server template immutable similar architecture as in 2 can be achieved, though it might be impacted by differences in downloaded artefacts - this might be especially visible in case snapshots are used.

Benefit is that there is single image of KIE Server used (per release version) that, when started, is given list of parameters that defines it behaviour:

  • KIE Server ID that refers to server template in controller
  • switches to turn off capabilities
  • url to controller
  • optionally url to smart router


The main difference here is that controller should have server templates defined (with kie containers) for all the KIE Servers in the diagram:
  • ABC
  • DEF
  • GHI
In case the server template is not there, upon connection from KIE Server such template will be created, though it will be completely empty meaning nothing to execute on it.

Another flavour of this architecture could be that the server template is not immutable and thus at any point in time new kie containers can be added or removed. This will then make the environment completely dynamic which might be actually a good fit for development environments.

Interaction with KIE Servers is exactly the same as for architecture 2, can be done directly to individual servers or via smart router.


Architecture 4: Immutable unmanaged KIE Servers with Smart Router

This is another aspect of the immutable images though simplified as there is no need for controller and thus Admin Console is not used. With that in mind users will still have KIE Server image per set of KJARs to ensure immutability though there is no "managed" client for it.

This architecture targets mainly setups where there will be other components (applications/services) interacting with KIE Servers via Smart Router.


KIE Servers behave exactly the same as in architecture 1 and allow to add new instances or images at any time. Smart Router will constantly update the routing table to make sure it provides access to all available server with efficient balancing.


Architecture 5: "Empty" unmanaged KIE Servers with Smart Router

This architecture is pretty much the same as above (4) though it does provide the dynamic deployment feature. So instead of having immutable images of KIE Server and all KJARs included, it  gives at start "empty" KIE Server image that can be manually (since it is unmanaged) configured - deploy/undeploy KJARs at any point in time. 

This gives the most flexibility but at the same time it requires the most manual configuration of the runtime environment. So it is most likely suitable for simple environments. Though it still might be good fit for some use cases.


Conclusion 


Of course final selection of the architecture will depend on number of factors but overall recommendation would be to follow the order of these architectures in the article. 

The most cloud "friendly" seems to be the first one as it nicely fits into the continuous delivery approach with least additional steps. 

Second one adds ability to control individual servers from the controller and use admin console for selected server templates in isolation.

Third, provides really flexible (though not immutable) environment with small number of KIE Server images to managed. Might be an option for certain uses cases where the dynamic behaviour of the business logic is required but does not require complete image redeployment.

Forth, removes the controller and admin console from the landscape so might be a good fit for lighter setups where the business automation is used with another UI and managed completely by the cloud infrastructure - immutable images.

Fifth, is most likely just for the environments where deployment is managed by external system and thus keeps track of all possible KIE Servers being active.

Hope this gives a nice overview and helps with selecting the right runtime architecture based on requirements.

2017/08/23

Elasticsearch empowers jBPM

As a follow up on article that introduced NoSQL experimental support for jBPM, this article aims at illustrating one potential integration to enhance search capabilities and potentially routing support for larger environments.

Elasticseach will be used as additional data store where both process instances and tasks will end up being indexed. Please keep in mind that at this point in time it is rather basic integration though has already proven to be extremely valuable. Before jumping into details let's look at what use cases this integration brings:
  • ability to collect process instances and tasks from different sources - e.g. different execution servers connected to different dbs
  • ability to search for process instances and tasks using full text search - indexed values etc
  • ability to search for process instances and tasks by their variables, multiple variables (both name and value) in single request
  • ability to retrieve variables with search results in single request
  • and all other things that Elasticsearch provides :)

Implementation


The actual implementation to integrate Elasticsearch with jBPM based on PersistenceEventManager hooks is actually simple - it consists of single class that implements EventEmitter interface - ElasticSearchEventEmitter

It utilises Elasticseach REST API - to be precise its _bulk REST endpoint. It does push all events in single HTTP call. This consists of both types of instances
  • ProcessInstanceView
  • TaskInstanceView
all views are serialised as JSON documents. This integration uses:
  • http://localhost:9200 as the location where Elasticsearch server is
  • jbpm as the name of the index
  • processes as the type for ProcessInstanceView documents
  • tasks as the type for TaskInstanceView documents

Location of the Elasticsearch server and name of the index is configurable via system properties:
  • org.jbpm.integration.elasticsearch.url
  • org.jbpm.integration.elasticsearch.index

There is one more file in the project and this is the ServiceLoader services file providing information on emitter implementation for discovery on runtime.

ElasticSearchEventEmitter is delivering actual events in an async way to not hold back thread that was used to execute the process so the impact (performance wise) on process engine is minimal. Moreover thanks to default PersistenceEventManager implementation, this emitter will only be invoked when transaction is completed successfully, meaning in case process instance is rolled back that information won't be in Elasticsearch.

Installation


For this who would like to try this out, first of all install Elastisearch on your box (or wherever you prefer as you can point it to any server via system property).
Next, build the elastisearch-jbpm project locally (it's not yet included in the regular jBPM builds) and drop it into KIE Server web app (inside WEB-INF/lib) and that's it!

Now when you execute any processes you will have it's data in Elasticsearch as well so you can nicely query them in very advanced way.


In action

Let's now look at short screen cast that shows this in action. This demo illustrates still rather small subset of data (around 12 000 process instances and 12 000 tasks instances) that will be queried. Anyway, what this will show is:
  • speed of execution
  • query in a way that neither JPA nor jBPM advanced queries allows to do without additional setup
  • data retrieved directly from the query


In details:

  • first search for all active process instances was done in workbench - this uses data sets / advanced queries - though it is slightly slower due to it collects execution errors so that does affect the performance and it's under investigation
  • Then it does the same query over KIE Server REST api - that uses JPA underneath 
  • Last it does the same query over Elasticsearch
  • Next it shows a bit more advanced queries by multiple variables, people assignment etc

What can be found in the screencast illustrates benefits but on small scale, more will be seen where there are several independent execution servers so you can search across them.


Main difference is that Elasticsearch directly returns process instance variables. Similar for user tasks, though it does provide much more information - both task inputs and outputs plus people assignments - e.g. potential owners, business admins and excluded owners.

Expect more integration with other NoSQL data stores to come... so stay tuned.

NoSQL enters jBPM ... as an experiment ... so far

Quite frequently there are questions around jBPM if there is anyway to use NoSQL as data store for persistable setup. From the very beginning persistence in KIE projects (drools and jBPM) was designed to be pluggable. In versions prior to 7 it was though rather tight integration which resulted in dependencies to JPA being still needed. With version 7 persistence layer was refactored (thanks to Mariano De Maio who did majority of work) and enabled much cleaner integration with different (than default) persistence store.

That opened the door for more research on how to utilise NoSQL data stores to benefit the overall projects. With that in mind, we started to think what options are valuable and initial set of them are as follows:

  • complete replace of JPA based persistence layer with another data store (e.g. NoSQL)
  • enhance persistence layer with additional data store tailored with its capabilities 

Replacement of default persistence layer with NoSQL - MapDB

When it comes to the first approach, it's rather self explanatory - it completely replaces entire persistence layer thus freeing it up from any JPA based mechanism. This actually follows Mariano's work on providing persistence mechanism based on MapDB. You can find that work here that provides rather complete replacement of JPA and covers:
  • drools use cases - persistence of KieSession
  • jBPM use cases - persistence of 
    • KieSession, 
    • WorkItem, 
    • ProcessInstance, 
    • Task
  • jBPM runtime manager use cases - mainly around PerProcessInstance and PerCase strategies
  • jBPM services use cases - additional implementation of RuntimeDataService and DeploymentService to take advantage of MapDB store - does not persist all audit log data so some of the methods from RuntimeDataService (like node instances or variables related) won't work
  • KIE Server use cases - an alternative implementation of jBPM KIE Server extension that uses MapDB as backend store instead of RDBMS - though it does have limited capabilities - only operations on process instances and tasks are supported, no async execution (jBPM executor)
The good thing with MapDB is that it's a transactional store so it fits nicely with jBPM infrastructure. 

Though it didn't prove (with basic load tests*) to be faster than RDBMS based store. Quite the opposite it was 2-3 times slower on single box. But that does not mean there is no value in that. 

Personally I think the biggest value of this experiment was to illustrate that a complete replacement of the persistence layer is possible (up to KIE Server). Although it is quite significant work required to do so and there might be some edge cases that could limit or change available features.

Nevertheless it's an option in case some environments can't use RDBMS for whatever reason.

* basic load tests consists of two types of requests - 1) just to start a process with human task, 2) start a process with human task and complete it.


Enhance persistence layer with additional data store

Alternative approach (and in my opinion that brings much more value and less work) is to enhance the persistence layer with additional data store. This means that default and used by the internal services data store is still JPA and thus requires RDBMS though it can be offloaded for certain use cases to another data store as it might be much better suited for that.

Some of the use cases we are exploring are:
  • aggregation of data from various execution servers (different dbs)
  • aggregation of business data and process data
  • analytics e.g. BAM, stream processing, etc
  • advanced search capabilities like full text search
  • replication across data centres for searchability 
  • routing across data centres that runs individual process engines
  • and more... in case you have any ideas feel free to comment

This was sort of possible already in jBPM by utilising event listeners (ProcessEventListenr, TaskLifeCycleEventListener) though it was slightly too fine grained and required to have a bit of plumbing code to deal with how the engine behaves - mainly around transactions. 

So to ease with this work, jBPM provides few hooks to allow easier integration and let developers to only focus on actual integration code with external data store instead of knowing all the details in the process engine. 

So the main two hook points are:
  • PersistenceEventManager - that is responsible for receiving information from the engine when instances (ProcessInstance, Task) are in anyway updated - created, updated or deleted. The other responsibility is to collect all those events and at some point push to the event emitter implementation for actual delivery to external data store.
  • EventEmitter - this is the interface that must be implemented to activate the PersistenceEventManager - if there is no emitter found PersistenceEventManager acts in no-op way. Event emitter has two main responsibilities:
    • provides EventCollection implementation that decides how to deal with events that are added (new instance), updated (updated instance), removed (deleted instance) - different implementation of the EventCollection can decide on individual events e.g. in case single instance is added and removed in the same scope (transaction) then collection can decide to drop it from itself and deal only with still active instances.
    • integrates with the external data store - encapsulate client api of the external system 


Implementations that comes out of the box


PersistenceEventManager

There is a default PersistenceEventManager provided that integrates with transactions. That means there is no need (in most of the cases) to implement new PersistenceEventManager. Default implementation collects events from single transaction and deliver them to emitter at:
  • beforeCompletion of the transaction, manager will invoke deliver method of the emitter - this is mainly to give a complete list of events in case emitter wants to send these events in transactional way - for example JMS transactional delivery
  • afterCompletion of the transaction, this will again deliver same list of events as on beforeCompletion and is more for emitters that can't send events in transactional way e.g. REST/HTTP call. Manager will invoke:
    • apply method of the emitter in case transaction was successfully committed
    • drop method of the emitter in case transaction was rolled back


EventCollection

There is also default EventCollection implementation BaseEventCollection that will collect all events (instances regardless of their event type - create, update, delete) though will eliminate duplicates, meaning it will have only the last state of the instance.

Events

Now let's take a look what is an event - this is maybe a bit overused term but it does fit well in this scenario - it is fired when things happen in the engine - these events mainly represent instances that process engine is managing:
  • ProcessInstance
  • Task
Currently only these two types are managed but the hooks within the engine allow to plugin more, for example async jobs.

As soon as instance is updated (created, updated, deleted) that instance is wrapped with an InstanceView type and delivered to PersistenceEventManager - over its dedicated method representing type of the event - create, update remove.

InstanceView will have dedicated implementation to provide access to individual instance details though every implementation will always provide the link to the actual source of this view. Why there is a need for the *View types? Mainly to simplify consumption of them - InstanceView type is designed to be serialisable - for example to JSON or XML without too much hassle.

Out of the box there are two implementations of the InstanceView:
InstanceView might decide when the data should be copied from source though at latest it will be invoked by the PersistenceEventManager before calling the deliver method - so it's important that in case InstanceView implementation copies data earlier it should mark as copied itself to avoid double copy.



That concludes the introduction into how jBPM looks into support for NoSQL. Following article will show some of the implementations of the second approach to empower jBPM with additional capabilities.

I'd like to encourage everyone to share their opinion on how NoSQL would provide value for jBPM or what use cases you see are good fit for NoSQL and thus jBPM should support that better.