Minikube on KVM on Linux Mint 19.1

February 16, 2019, 12:41 am

≫ Next: Filesystem events to Elasticsearch / Kibana through Kafka Connect / Kafka

≪ Previous: Some challenges with Oracle Reports 12.2.1.3

In a previous blog post I wrote about running Minikube on Windows. I ended with the suggestion that getting Minikube working might be much easier on Linux. Thus I installed Linux Mint (as dual-boot) on my laptop and gave it a shot. The steps I took to get it working are described here.

Challenges installing Mint

UEFI

First I was having some issues getting Mint installed as dual boot OS. First I already had an UEFI Windows 10 installed. I needed to figure out how to create an USB boot disk which would also be UEFI compatible. I used Rufus for this and used DD mode to write the ISO to the USB stick. I installed the boot loader on my primary disk (which was selected as default). Also for ease of use I did not create a separate swap partition (my laptop has 32Gb of RAM) or home partition. Also make sure Secure Boot is disabled (BIOS setting) and Windows 10 fast startup is also disabled.

Graphics

I had quite a new NVidia graphics card (GTX 1060) and the open source NVidia drivers (Nouveau) did not work for me thus I needed to boot with the 'nomodeset' kernel parameter. After the installation of proprietary drivers and reboot, still it did not work. I did the following to add a repository with newer graphics drivers and used the Mint Driver Manager to install a newer version. After that it worked.

sudo add-apt-repository ppa:graphics-drivers/ppa

Minikube on Mint

Installing KVM

There are of course various virtualization technologies available. I'm used to VirtualBox but thought I'd try out KVM for a change. I installed KVM and related things using the following:

sudo apt-get update
sudo apt-get install qemu-kvm libvirt-bin ubuntu-vm-builder bridge-utils libvirt-clients libvirt-daemon-system apt-transport-https ebtables iptables dnsmasq virt-manager

Next I added my current user to the libvirt group. This is required so the user can use virtualization. You can use newgrp to add the group to the current session without having to logout and login again. Libvirt provides an abstraction for various virtualization technologies making it easier to manage them with various tools such as virt-manager, virsh or OpenStack.

I had some struggles with firewalld. I uninstalled it. You need to restart libvirtd after uninstalling firewalld for Minikube to be able to create VMs.

Installing Docker

After having installed KVM, I installed Minikube and related.In order to work with Minikube you don't need just Minikube but also some additional tools like Kubectl and Docker.

The installation of Docker is described here. Linux Mint is Ubuntu based so the following works:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt-get update
sudo apt-get install -y docker-ce

Installing Minikube

Install Minikube
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && chmod +x minikube

Install the KVM2 driver
curl -LO https://storage.googleapis.com/minikube/releases/latest/docker-machine-driver-kvm2 && sudo install docker-machine-driver-kvm2 /usr/local/bin/

Start Minikube
minikube start --vm-driver=kvm2

Minikube dashboard

First start the dashboard with

minikube dashboard

Next start a proxy

kubectl proxy

And open it in a browser by going to: http://localhost:8001/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/#!/overview?namespace=default

Have fun!

↧

Filesystem events to Elasticsearch / Kibana through Kafka Connect / Kafka

February 26, 2019, 8:39 am

≫ Next: Using Python to performancetest an Oracle DB

≪ Previous: Minikube on KVM on Linux Mint 19.1

Filesystem events are useful to monitor. They can indicate a security breach. They can also help understanding how a complex system works by looking at the files it reads and writes.

When monitoring events, you can expect a lot of data to be generated quickly. The events might be interesting to process for different systems and at a different pace. Also it would be nice if you could replay events from the start. Enter Kafka. In order to put data in Kafka from the filesystem events (output file), the Kafka Connect FileSourceConnector is used. In order to get the data from Kafka to Elasticsearch, the Kafka Connect ElasticsearchSinkConnector is used. Both connectors can be used without Enterprise license.

Filesystem events

In order to obtain filesystem events, I've used inotify-tools. On Ubuntu like systems they can be installed with

sudo apt-get install inotify-tools

inotify-tools contains two CLI utilities. inotifywait which can be used to output events to a file in a specific format. inotify-wait can generate statistics. Since we'll do our analyses in Kibana, we want individual events from inotifywait. Using the following command, the /home/developer directory is watched. Events are put in a JSON format in a file called /tmp/inotify.txt

inotifywait -r -m /home/developer -o /tmp/inotify.txt --timefmt "%FT%T%z" --format '{"time": "%T","watched": "%w","file":"%f","events":"%e"}'

Events can look like;

{"time": "2019-02-26T13:52:15+0000","watched": "/home/developer/","file":"t","events":"OPEN"}
{"time": "2019-02-26T13:52:15+0000","watched": "/home/developer/","file":"t","events":"ATTRIB"}
{"time": "2019-02-26T13:52:15+0000","watched": "/home/developer/","file":"t","events":"CLOSE_WRITE,CLOSE"}
{"time": "2019-02-26T13:52:17+0000","watched": "/home/developer/","file":"t","events":"DELETE"}

In the above example I touched /home/developer/t and removed it. When watching your home folder, it is interesting to see what is happening there! I could not add a more specific moment of the event since the format is printf-like and uses strftime for formatting of date/time. strftime does not support anything more fine grained than seconds. The order of events however is correct. I could not add more specific information myself since the allowed replacement variables to be replaced is limited to a specific set. Want to know more? man inotifywait

Filesystem events to Kafka

To install the Confluent platform I did:

wget -qO - https://packages.confluent.io/deb/5.1/archive.key | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/5.1 stable main"
apt-get update
apt-get -y install confluent-platform-2.11

Starting Kafka and related can be done with

confluent start

Using the Kafka Connect FileStreamSource connector (available without Enterprise license), it is relatively easy to monitor the file which is written by notifywait. Kafka Connect can run in distributed mode and in standalone mode. Since it needs to save information on what it has already processed, storage is required. In standalone mode, this can be a file. In distributed mode these are Kafka topics. I choose to go with the standalone mode since removing the file (to do the loading of events again) is quite easy. Drawback of using standalone mode is that the connector cannot be monitored by the Kafka Control Center. Benefit of running distributed is also that it could easily be run in containers since the connector itself is stateless; state is in Kafka.

I used the following filesource.properties:

name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=/tmp/inotify.txt
topic=connect-test

And the following worker.properties

offset.storage.file.filename=/tmp/example.offsets
bootstrap.servers=localhost:9092
offset.flush.interval.ms=10000
rest.port=10082
rest.host.name=localhost
rest.advertised.port=10082
rest.advertised.host.name=localhost
internal.key.converter=org.apache.kafka.connect.storage.StringConverter
internal.value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
plugin.path=/usr/share/java

There is also a JsonConverter available, but that wrapped my JSON into a string (escapes characters) and adds schema information (unless disabled). Both I did not want. I did not find a way to disable the behavior that my Json message became an escaped string. This caused issues on the Elasticsearch side. Kafka Connect should in my use-case be a 'dumb pipe' and the StringConverter does a nice job at that!

To start this connector, I can do:

/usr/bin/connect-standalone worker.properties filesource.properties

Elasticsearch and Kibana

For this I've used the following docker-compose.yml file

version: '3.3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:6.6.1
container_name: elasticsearch
environment:
- node.name=es01
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
nproc: 65535
memlock:
soft: -1
hard: -1
cap_add:
- ALL
privileged: true
deploy:
mode: global
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
ports:
- 9200:9200
- 9300:9300
kibana:
image: docker.elastic.co/kibana/kibana-oss:6.6.1
container_name: kibana
environment:
SERVER_NAME: localhost
ELASTICSEARCH_URL: http://elasticsearch:9200/
ports:
- 5601:5601
ulimits:
nproc: 65535
memlock:
soft: -1
hard: -1
cap_add:
- ALL
deploy:
mode: global
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s

Starting it is easy with

docker-compose up

Next you can access Kibana at http://localhost:5601

Getting data from Kafka to Elasticsearch

In order to get the data from the connect-test topic to Elasticsearch, we can again use a standalone Kafka Connect connector. The ElasticsearchSinkConnector which is also available without Enterprise license. You can configure this as follows:

elasticsearch.properties

name=elasticsearch-sink
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
tasks.max=1
topics=connect-test
key.ignore=true
schema.ignore=true
connection.url=http://localhost:9200
type.name=kafka-connect

worker2.properties

offset.storage.file.filename=/tmp/example2.offsets
bootstrap.servers=localhost:9092
offset.flush.interval.ms=10000
rest.port=10083
rest.host.name=localhost
rest.advertised.port=10083
rest.advertised.host.name=localhost
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
key.converter.schemas.enable=false
plugin.path=/usr/share/java

You can start the connector with:

/usr/bin/connect-standalone worker2.properties elasticsearchsink.properties

Some differences with the FileSourceConnector worker.properties are:

a different port for the REST API is used
a different offset.storage.file is used
the key and value converters are Json instead of string

View the results in Kibana

Next you can view the results in Kibana

Also notice that in order for the data from Elasticsearch to be visible in Kibana, not only data needs to be available but also an index needs to be there. Since a JSON document is offered, a default index is created.

↧

Using Python to performancetest an Oracle DB

March 16, 2019, 5:17 am

≫ Next: 6 tips to make your life with Vagrant even better!

≪ Previous: Filesystem events to Elasticsearch / Kibana through Kafka Connect / Kafka

Performance testing is a topic with many opinions and complexities. You can not do it in a way which will make everyone happy. It is not straightforward to compare measures before and after a change. Environments are often not stable (without change in itself and its environment). When performing a test, the situation at the start or end of the test are also often not the same. For example the test might write data in a database.

There are various ways to look at performance. You can look at user experience, generate load similar to what application usage produces or you can do more basic things like query performance. What will you be looking at? Resource consumption and throughput are the usual suspects.

I'll look at a simple example in this blog post. I'll change database parameters and look at throughput of various actions which are regularly performed on databases. This takes away the complexity of distributed systems. I used a single Python script for this which can be downloaded here.

Summary of conclusions: Exposing database functionality using a DAD is not so much influenced by the tested settings. Setting FILESYSTEMIO_OPTIONS to SETALL improved the performance of almost all database actions. This has also been observed at different customers. Disabling Transparent HugePages and enabling the database to use HugePages seemed to have little effect. PL/SQL native compilation also did not cause a massive improvement. From the tested settings FILESYSTEMIO_OPTIONS is the easiest to apply. Query performance and actions involving a lot of data improved with all (and any of) these settings.

Test setup

System used

I did not test with parallel load or a true production like or application load. Just plain operations. I did this to determine effects on low level of certain changes. I conducted the test on RHEL 7.6 within a VirtualBox environment. The environment was assigned 16Gb of RAM and 6 (of 12 available) cores. I've used an 12R2 database for this test.

What does a database do?

Actually this question is difficult to answer when looking at an Oracle database. It can do a lot. It can send e-mails, provide webservices, store data, process data, do complex calculations (including machine learning). For this example I'm looking at some basic features and a slightly more advanced one (which is already deprecated in 12c);

Provide a client with a connection
Create and remove database objects (table, sequence, trigger)
Select, insert, delete, commit data
Host PL/SQL procedures using an HTTP interface using the PL/SQL gateway

Testing some database settings

Which settings would be interesting to change? There are some settings which often provide better performance but if set incorrectly, can cause detrimental effects. The below settings are also often considered 'scary' since they alter basic database behavior. These are (among others of course, relevant Oracle support notes are also mentioned):

HugePages / Transparent HugePages
ALERT: Disable Transparent HugePages on SLES11, RHEL6, RHEL7, OL6, OL7, and UEK2 and above (Doc ID 1557478.1). HugePages on Linux: What It Is... and What It Is Not... (Doc ID 361323.1)
Enabling HugePages and disabling transparent HugePages is described here.
FILESYSTEMIO_OPTIONS
Database Initialization Parameters for Oracle E-Business Suite Release 12 (Doc ID 396009.1). ORA-1578 ORA-353 ORA-19599 Corrupt blocks with zeros when filesystemio_options=SETALL on ext4 file system using Linux (Doc ID 1487957.1)
Setting the FILESYSTEMIO_OPTIONS parameter is described here.
PL/SQL native compilation
Database Initialization Parameters for Oracle E-Business Suite Release 12 (Doc ID 396009.1). How To Convert the Entire DB From INTERPRETED Mode to NATIVE or Vice Versa for PL/SQL (Doc ID 1340441.1)

Some worries about these settings

HugePages

The main worry about using HugePages is that AMM (automatic memory management) cannot be used anymore. Thus you are required to manually determine, set and monitor the SGA and PGA size. This luckily is relatively easy and can be done using Advisors in the Enterprise Manager or by using queries described at: Tuning SGA_TARGET Using V$SGA_TARGET_ADVICE (Doc ID 1323708.1) and for the PGA here. Within the SGA, memory can still automatically be managed by using ASMM (see: ASMM versus AMM and LINUX x86-64 Hugepages Support (Doc ID 1134002.1)). Disabling transparent hugepages is recommended on Oracle support in an Alert even. Setting HugePages requires changing some things on your OS. This can be challenging if OS and DBA are managed by different people, departments or organizations even (think Cloud).

FILESYSTEMIO_OPTIONS

A lot has been written about the different settings for this option. Important things to know are that if you run on a very old OS with an outdated kernel, database block corruption can occur (see above mentioned Note). Also I've heard doubts about if asynchronous IO can be trusted just as much as synchronous IO since the database might expect data to be written to disk which actually isn't yet. These doubts are unfounded. Tom Kyte indicates in the following here that even with asynchronous IO, the database still waits for a callback from the filesystem. Only requests are provided more than 1 at a time to the filesystem. This is faster then using the overhead of multiple database writer processes to achieve the same.

PL/SQL native compilation

PL/SQL is stored in a compiled state in the database. Thus processing PL/SQL becomes faster since interpretation is not needed at the moment of execution. This causes PL/SQL to be executed faster. Drawback is that PL/SQL debugging does not work anymore.

How can you use Python to connect to the database?

Python, being a very popular (the most popular according to some reports) programming language around of course can easily do all kinds of interesting things with the Oracle database and has been able to do so for quite a while. You can use the cx_Oracle Python module (available through pip) to provide an easy way to do various things with the Oracle database. It does require an Oracle Client which needs to be available in the path. I recommend this one here. If you want to try it out yourself, you can get the free Oracle XE database. For Python IDE you can use whatever you like, for example PyCharm, Spyder or Jupyter (do mind that indentation has meaning in Python and IDE support for this helps).

Using cx_Oracle is relatively easy.

import cx_Oracle
db_hostname = 'localhost'
db_port = '1521'
db_sid = 'XE'
db_username = 'TESTUSER'
db_password = 'TESTUSER'

dbstring = db_username + '/' + db_password + '@' + db_hostname + ':' + db_port + '/' + db_sid

con = cx_Oracle.connect(dbstring)
cur = con.cursor()
cur.execute('select to_char(systimestamp) from dual')
res=cur.fetchone()
con.close()

This creates a connection, performs a selection and returns the result to the Python procedure.

An alternative is to use the service instead of the SID.

dbstring = cx_Oracle.makedsn(db_hostname, db_port, service_name=db_sid)
con = cx_Oracle.connect(user=db_username, password=db_password, dsn=dbstring)

Create and test the PL/SQL gateway

Accessing your environment

Make sure your database listener is listening on the specific interface you are connecting to. For example:

Also make sure your firewall is not blocking your ports. Use below for development only. In a more managed environment, you'll want to specifically open a single port.

sudo systemctl stop firewalld
sudo systemctl disable firewalld

Else you might get errors like:

Or something similar from a Python script.

In order to easily start the database when your server starts (container + pluggable) you can look here.

Configuring the embedded PL/SQL gateway

The following here describes how you can enable the Embedded PL/SQL Gateway. This can be done with a few commands. Do mind that in 12c you have pluggable databases.

(as system user, my database was called ORCLPDB)

ALTER SESSION SET CONTAINER = ORCLPDB
ALTER PLUGGABLE DATABASE OPEN READ WRITE

Check if the HTTP port for XDB is already set:

SELECT DBMS_XDB.gethttpport FROM dual;

If it is not enabled, enable it:

EXEC DBMS_XDB.sethttpport(8080);

This will register itself with a listener and make configured Database Access Descriptors (DADs) available on that port. Below the DAD,

In order to create a DAD the following can be used:

BEGIN
DBMS_EPG.create_dad (
dad_name => 'my_epg_dad',
path => '/my_epg_dad/*');
END;

Next create a user

CREATE USER testuser IDENTIFIED BY Welcome01;
GRANT CONNECT,RESOURCE,DBA TO testuser;

Authorize your user to use the DAD

BEGIN
DBMS_EPG.authorize_dad (
dad_name => 'my_epg_dad',
user => 'TESTUSER');
END;

Make debugging easier

exec dbms_epg.set_dad_attribute('my_epg_dad', 'error-style', 'DebugStyle');

Set the username

BEGIN
DBMS_EPG.set_dad_attribute (
dad_name => 'my_epg_dad',
attr_name => 'database-username',
attr_value => 'TESTUSER');

END;

Next enable the user anonymous to allow unauthenticated access to the service. Yes, you are right, we are unlocking user ANONYMOUS on the container database instead of the pluggable database. The status of the ANONYMOUS user in the pluggable database is irrelevant for the gateway. See here.

ALTER SESSION SET CONTAINER=CDB$ROOT
ALTER USER anonymous ACCOUNT UNLOCK;
ALTER USER anonymous IDENTIFIED BY Welcome01 CONTAINER=ALL

Next add allow-repository-anonymous-access to the XDB configuration.

SET SERVEROUTPUT ON
DECLARE
l_configxml XMLTYPE;
l_value VARCHAR2(5) := 'true'; -- (true/false)
BEGIN
l_configxml := DBMS_XDB.cfg_get();

IF l_configxml.existsNode('/xdbconfig/sysconfig/protocolconfig/httpconfig/allow-repository-anonymous-access') = 0 THEN
-- Add missing element.
SELECT insertChildXML
(
l_configxml,
   '/xdbconfig/sysconfig/protocolconfig/httpconfig',
   'allow-repository-anonymous-access',
   XMLType('<allow-repository-anonymous-access xmlns="http://xmlns.oracle.com/xdb/xdbconfig.xsd">' ||
   l_value ||
   '</allow-repository-anonymous-access>'),
   'xmlns="http://xmlns.oracle.com/xdb/xdbconfig.xsd"'
   )
INTO l_configxml
FROM dual;

DBMS_OUTPUT.put_line('Element inserted.');
ELSE
-- Update existing element.
SELECT updateXML
(
DBMS_XDB.cfg_get(),
'/xdbconfig/sysconfig/protocolconfig/httpconfig/allow-repository-anonymous-access/text()',
l_value,
'xmlns="http://xmlns.oracle.com/xdb/xdbconfig.xsd"'
)
INTO l_configxml
FROM dual;

DBMS_OUTPUT.put_line('Element updated.');
END IF;

DBMS_XDB.cfg_update(l_configxml);
DBMS_XDB.cfg_refresh;
END;
/

Now you can access whatever TESTUSER is allowed access to by going to http://db_host:8080/my_epg_dad/username.procedure?parameter=value

Should you have issues, the following script helps: rdbms/admin/epgstat.sql. Also the following displays the XDB configuration.

DECLARE
l_configxml XMLTYPE;
l_value VARCHAR2(5) := 'true'; -- (true/false)
BEGIN
l_configxml := DBMS_XDB.cfg_get();
dbms_output.put_line(l_configxml.getCLOBVal());
END;

TESTUSER is allowed to create procedures in this example so first I create a procedure to determine the configured DAD and endpoint. Next I create a procedure which can be accessed by an unauthenticated user anonymous (grant execute on ... to public), call this procedure a number of times and remove it again.

Also note that the SHARED_SERVERS and DISPATCHERS database parameters are important for the performance of this gateway. The performance can easily be improved by increasing SHARED_SERVERS to 10 and DISPATCHERS to (pro=tcp)(dis=5).

Performance results

Before any changes

Testing create connection, create cursor, query, close connection
function [runtestcon] finished in 4525 ms
function [runtestcon] finished in 4712 ms
function [runtestcon] finished in 4584 ms
Average 4607 ms

Testing create cursor, query, return result
function [runtestindb] finished in 12362 ms
function [runtestindb] finished in 12581 ms
function [runtestindb] finished in 12512 ms
Average 12485 ms

Testing creating and removing objects
function [runtestcreateremoveobjects] finished in 3204 ms
function [runtestcreateremoveobjects] finished in 3145 ms
function [runtestcreateremoveobjects] finished in 3189 ms
Average 3180 ms

Inserting data, commit, deleting data, commit
function [runtestinsdel] finished in 3318 ms
function [runtestinsdel] finished in 3232 ms
function [runtestinsdel] finished in 3303 ms
Average 3285 ms

Inserting single row, commit, delete single row, commit
function [runtestinsdelcommit] finished in 6258 ms
function [runtestinsdelcommit] finished in 4592 ms
function [runtestinsdelcommit] finished in 4468 ms
Average 5106 ms

Testing DAD EPG
function [urltest] finished in 51584 ms
function [urltest] finished in 51707 ms
function [urltest] finished in 51711 ms
Average 51667 ms

Transparent HugePages

For disabling transparent hugepages, I've followed the instructions here.

There did not seem to be any clear change in performance. Perhaps a slight improvement in some areas.

Testing create connection, create cursor, query, close connection
function [runtestcon] finished in 4459 ms
function [runtestcon] finished in 4472 ms
function [runtestcon] finished in 4115 ms
Average 4349 ms

Testing create cursor, query, return result
function [runtestindb] finished in 12396 ms
function [runtestindb] finished in 12575 ms
function [runtestindb] finished in 9895 ms
Average 11622 ms

Testing creating and removing objects
function [runtestcreateremoveobjects] finished in 3158 ms
function [runtestcreateremoveobjects] finished in 3713 ms
function [runtestcreateremoveobjects] finished in 3215 ms
Average 3362 ms

Inserting data, commit, deleting data, commit
function [runtestinsdel] finished in 3210 ms
function [runtestinsdel] finished in 3295 ms
function [runtestinsdel] finished in 3332 ms
Average 3279 ms

Inserting single row, commit, delete single row, commit
function [runtestinsdelcommit] finished in 4427 ms
function [runtestinsdelcommit] finished in 5208 ms
function [runtestinsdelcommit] finished in 4955 ms
Average 4863 ms

Testing DAD EPG
function [urltest] finished in 51649 ms
function [urltest] finished in 51731 ms
function [urltest] finished in 51720 ms
Average 51700 ms

HugePages

For me the recommended setting from the script at Doc ID 401749.1 was: vm.nr_hugepages = 1187

I've updated Doc ID 401749.1 with vm.nr_hugepages = 1189

Added two lines to /etc/security/limits.conf (1187*2048)

* soft memlock 2435072
* hard memlock 2435072

After a reboot I made sure HugePages were enabled:

[oracle@rhel ~]$ cat /proc/meminfo | grep Huge
AnonHugePages: 0 kB
HugePages_Total: 1189
HugePages_Free: 12
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB

To be sure the database was only using the HugePages, I did:

ALTER SYSTEM SET use_large_pages=only SCOPE=SPFILE;

If the HugePages cannot be allocated, the database will not start with this setting.

Testing create connection, create cursor, query, close connection
function [runtestcon] finished in 4880 ms
function [runtestcon] finished in 5307 ms
function [runtestcon] finished in 4551 ms
Average 4912 ms

Testing create cursor, query, return result
function [runtestindb] finished in 11790 ms
function [runtestindb] finished in 13198 ms
function [runtestindb] finished in 11889 ms
Average 12292 ms

Testing creating and removing objects
function [runtestcreateremoveobjects] finished in 3099 ms
function [runtestcreateremoveobjects] finished in 3089 ms
function [runtestcreateremoveobjects] finished in 3101 ms
Average 3096 ms

Inserting data, commit, deleting data, commit
function [runtestinsdel] finished in 3087 ms
function [runtestinsdel] finished in 3123 ms
function [runtestinsdel] finished in 3293 ms
Average 3168 ms

Inserting single row, commit, delete single row, commit
function [runtestinsdelcommit] finished in 5324 ms
function [runtestinsdelcommit] finished in 6059 ms
function [runtestinsdelcommit] finished in 4784 ms
Average 5389 ms

Testing DAD EPG

function [urltest] finished in 51735 ms

function [urltest] finished in 51690 ms

function [urltest] finished in 51791 ms

Average 51739 ms

FILESYSTEMIO_OPTIONS

The default setting for this parameter was NONE on my installation. I changed it to SETALL.

Testing create connection, create cursor, query, close connection
function [runtestcon] finished in 4323 ms
function [runtestcon] finished in 4240 ms
function [runtestcon] finished in 4192 ms
Average 4252 ms

Testing create cursor, query, return result
function [runtestindb] finished in 12303 ms
function [runtestindb] finished in 10631 ms
function [runtestindb] finished in 9692 ms
Average 10875 ms

Testing creating and removing objects
function [runtestcreateremoveobjects] finished in 3096 ms
function [runtestcreateremoveobjects] finished in 3165 ms
function [runtestcreateremoveobjects] finished in 3183 ms
Average 3148 ms

Inserting data, commit, deleting data, commit
function [runtestinsdel] finished in 3049 ms
function [runtestinsdel] finished in 3059 ms
function [runtestinsdel] finished in 3064 ms
Average 3057 ms

Inserting single row, commit, delete single row, commit
function [runtestinsdelcommit] finished in 4266 ms
function [runtestinsdelcommit] finished in 4339 ms
function [runtestinsdelcommit] finished in 4346 ms
Average 4317 ms

Testing DAD EPG
function [urltest] finished in 51887 ms
function [urltest] finished in 52329 ms
function [urltest] finished in 51834 ms
Average 52017 ms

This setting caused improvement in almost all areas! Except the DAD which remained the same.

PL/SQL native compilation

I did the following:

ALTER SYSTEM SET PLSQL_CODE_TYPE=NATIVE SCOPE=both;

Startup the database in upgrade mode (STARTUP UPGRADE)
Run the following

@$ORACLE_HOME/rdbms/admin/dbmsupgnv.sql TRUE

Followed by

@$ORACLE_HOME/rdbms/admin/utlrp.sql

alter session set container = orclpdb

Run the following

@$ORACLE_HOME/rdbms/admin/dbmsupgnv.sql TRUE

Followed by

@$ORACLE_HOME/rdbms/admin/utlrp.sql

Checking can be done with:

select name, type, plsql_code_type from user_plsql_object_settings

Testing create connection, create cursor, query, close connection
function [runtestcon] finished in 4511 ms
function [runtestcon] finished in 4417 ms
function [runtestcon] finished in 4308 ms
Average 4413 ms

Testing create cursor, query, return result
function [runtestindb] finished in 11973 ms
function [runtestindb] finished in 12444 ms
function [runtestindb] finished in 12446 ms
Average 12287 ms

Testing creating and removing objects
function [runtestcreateremoveobjects] finished in 3285 ms
function [runtestcreateremoveobjects] finished in 3381 ms
function [runtestcreateremoveobjects] finished in 3296 ms
Average 3320 ms

Inserting data, commit, deleting data, commit
function [runtestinsdel] finished in 3126 ms
function [runtestinsdel] finished in 3129 ms
function [runtestinsdel] finished in 3141 ms
Average 3132 ms

Inserting single row, commit, delete single row, commit
function [runtestinsdelcommit] finished in 4248 ms
function [runtestinsdelcommit] finished in 4291 ms
function [runtestinsdelcommit] finished in 4328 ms
Average 4289 ms

Testing DAD EPG
function [urltest] finished in 51650 ms
function [urltest] finished in 51693 ms
function [urltest] finished in 51671 ms
Average 51671 ms

Conclusions

Some of the differences were marginal but the setting FILESYSTEMIO_OPTIONS caused improvement in almost all areas. The settings seemed to have little effect on the responsetimes from the DAD. Probably there most of the time the actions performed were not limited by PL/SQL processing time, disk IO or memory. Combining the settings does not give a much better performance. Query performance and actions involving a lot of data improved with all (and any one of) these settings.

Of course you can do many other load tests, also parallel and do other things to the database like space management, statistics, indexes, SQL tuning, etc. Most of these require effort. These settings are also sometimes not straightforward. Native compilation could require a long period of downtime. HugePages require settings on Linux level. FILESYSTEMIO_OPTIONS only requires a restart.

Generally speaking, from the suggested settings, I would go just for FILESYSTEMIO_OPTIONS = SETALL for most systems if performance becomes an issue. Default for my system the setting was set to NONE. The setting is easy to do and will improve most of the database actions in probably most environments.

Python is a great language to implement performancetests in against an Oracle database. It does require some manual scripting. This is just a quick example. The database challenges (specifically with the PL/SQL gateway and the CDB/PDB structure of the multitenant database) were far greater than the effort required to write the test.

↧

6 tips to make your life with Vagrant even better!

March 23, 2019, 8:59 am

≫ Next: Securing Oracle Service Bus REST services with OAuth2 (without using additional products)

≪ Previous: Using Python to performancetest an Oracle DB

HashiCorp Vagrant is a great tool to quickly get up and running with a development environment. In this blog post I'll give some tips to make your life with Vagrant even better! You can find an example which uses these tips here.

Tip 1: The disksize plugin

Ever created a VM with a specific size for the disk and after using it for a while, you find out you would have liked to have a larger size disk? You can resize the VDI in for example VirtualBox, however that that is not sufficient to increase the size of logical volumes and partitions within the VM -> make the extra space available to use. If you ever tried to manually assign the extra space (like I did here), you'll know this can be some work and has some risks involved.

Enter the vagrant-disksize plugin.

Install it with

vagrant plugin install vagrant-disksize

And you can use it in your Vagrantfile like:

config.disksize.size ='75GB'

It will figure out what to do and do it!

Tip 2: The vbguest plugin

Getting tired of installing guest additions after each and every new installation and keeping them updated after a new version of VirtualBox is released? Mounting an ISO, installing the required dependencies, running the installer... You don't have to worry about those things anymore! The vagrant-vbguest plugin takes care of this for you.

Install it with:

vagrant plugin install vagrant-vbguest

And add to your Vagrantfile a line like:

config.vbguest.auto_update =true

to your Vagrantfile. This installs the correct version of the guest additions and fixes dependencies.

Tip 3: The reload plugin

The vagrant-reload plugin which is available here allows you to incorporate VM reboots inside your Vagrantfile. This can be quite powerful since it for example allows things like replacing kernel boot parameters and continue with the parameters set after the reboot.

First install the plugin:

vagrant plugin install vagrant-reload

And add something like below to your Vagrantfile and you're set to go!

config.vm.provision :reload

You can add provisioners above and below this command. This is something which is not possible with tools like kickstart (RHEL derivatives) and preseed (Debian derivatives) which automate the installation of Linux OSs. HashiCorp Packer also has a similar option (for Windows and probably also for Linux).

Tip 4: Install missing plugins

I want to give you my Vagrantfile and want you to be able to get started with it as soon as possible. However my Vagrantfile uses the above mentioned plugins. You need to install them first! Luckily this is easy to automate within your Vagrantfile like below:

unlessVagrant.has_plugin?("vagrant-disksize")
puts'Installing vagrant-disksize Plugin...'
system('vagrant plugin install vagrant-disksize')
end

unlessVagrant.has_plugin?("vagrant-vbguest")
puts'Installing vagrant-vbguest Plugin...'
system('vagrant plugin install vagrant-vbguest')
end

unlessVagrant.has_plugin?("vagrant-reload")
puts'Installing vagrant-reload Plugin...'
system('vagrant plugin install vagrant-reload')
end

Inspiration from the Oracle provided Vagrantfiles here.

Tip 5: Execute a command once after a reboot

Suppose you want to have a specific command executed only once after a reboot. This can happen because a specific command requires for example the vagrant user not to be logged in. You can do this with a pre-script (prepare), the reload plugin and a post script (cleanup). The below example should work for most Linux distributions:

Inside your Vagrantfile:

config.vm.provision :shell, path: "prescript.sh"
config.vm.provision :reload
config.vm.provision :shell, path: "postscript.sh"

Pre-script: prescript.sh

chmod +x /etc/rc.d/rc.local
echo'COMMAND_YOU_WANT_TO_HAVE_EXECUTED'>> /etc/rc.d/rc.local

Post-script: postscript.sh

chmod -x /etc/rc.d/rc.local
sed -i '$ d' /etc/rc.d/rc.local

Tip 6: Installing docker and docker-compose

For installing docker and docker-compose (and running containers) there are 3 main options.

Using Vagrant plugins

docker-compose and docker
vagrant plugin install vagrant-docker-compose

config.vm.provision :docker
config.vm.provision :docker_compose

Using these plugins, you might have to look into how you can force it to use a specific version should you require it.

Using a provisioning script and OS repositories

This is the option I usually take. It provides more flexibility.

For Oracle Linux 7:

#Docker
yum install yum-utils -y
yum-config-manager --enable ol7_addons
yum-config-manager --enable ol7_optional_latest
yum -y update
/usr/bin/ol_yum_configure.sh
yum -y update
yum install docker-engine -y
systemctl start docker
systemctl enable docker

For Ubuntu bionic:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
apt-cache policy docker-ce
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
apt-get update
apt-get -y install apt-transport-https ca-certificates curl gnupg-agent software-properties-common docker-ce docker-compose

Using the docker / docker-compose provided convenience scripts

#docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

#docker-compose
sudo curl -L "https://github.com/docker/compose/releases/download/1.23.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

Beware that the above docker-compose installation does not automatically get updated (in contrast to docker and the first 2 installation options).

↧

Securing Oracle Service Bus REST services with OAuth2 (without using additional products)

October 26, 2018, 8:37 am

≫ Next: Performance! 3 reasons to stick to Java 8 for the moment

≪ Previous: 6 tips to make your life with Vagrant even better!

OAuth2 is a popular authentication framework. As a service provider it is thus common to provide support for OAuth2. How can you do this on a plain WebLogic Server / Service Bus without having to install additional products (and possibly have to pay for licenses)? If you just want to implement and test the code, see this installation manual. If you want to know more about the implementation and choices made, read on!

OAuth2 client credentials flow

OAuth2 supports different flows. One of the easiest to use is the client credentials flow. It is recommended to use this flow when the party requiring access can securely store credentials. This is usually the case when there is server to server communication (or SaaS to SaaS).

The OAuth2 client credentials flow consists of an interaction pattern between 3 actors which all have their own roll in the flow.

The client. This can be anything which supports the OAuth2 standard. For testing I've used Postman
The OAuth2 authorization server. In this example I've created a custom JAX-RS service which generates and returns JWT tokens based on the authenticated user.
A protected service. In this example I'll use an Oracle Service Bus REST service. The protection consists of validating the token (authentication using standard OWSM policies) and providing role based access (authorization).

When using OAuth2, the authorization server returns a JSON message containing (among other things) a JWT (JSON Web Token).

In our case the client authenticates using basic authentication to a JAX-RS servlet. This uses the HTTP header Authorization which contains 'Basic' followed by Base64 encoded username:password. Of course Base64 encoded strings can be decoded easily (e.g. by using sites like these) so never use this over plain HTTP!

When this token is obtained, it can be used in the Authorization HTTP header using the Bearer keyword. A service which needs to be protected can be configured with the following standard OWSM policies for authentication: oracle/http_jwt_token_service_policy and oracle/http_jwt_token_over_ssl_service_policy and a custom policy for role based access / authorization.

JWT

JSON Web Tokens (JWT) can look something like:

eyJraWQiOiJvYXV0aDJrZXlwYWlyIiwiYWxnIjoiUlMyNTYifQ.eyJzdWIiOiJ3ZWJsb2dpYyIsImlzcyI6Ind3dy5vcmFjbGUuY29tIiwiZXhwIjoxNTQwNDY2NDI4LCJpYXQiOjE1NDA0NjU4Mjh9.ZE8wMnFyjHcmFpdswgx3H8azVCPtHkrRjqhiKt-qZaV1Y5YlN9jAOshUnPIQ76L8K4SAduhJg7MyLQsAipzCFeT_Omxnxu0lgbD2UYtz-TUIt23bjcsJLub5pNrLXJWL3k7tSdkcVxlyHuRPYCvoLhLZzCksqnRdD6Zf9VjxGLFPktknXwpn7_aOAdzXEatj-Gd9lm321R2BdFL7ii9sXh9A1KL8cblLbhLlrXGwTF_ifTxuHSBz1B_p6xng6kmOfIwDIAJQ9t6KESQm8dQQeilcny1uRmhg4o85uc4gGzhH435q1DRuHQm22wN39FHbNT4WP3EuoZ49PpsTeQzSKA

This is not very helpful at first sight. When we look a little bit closer, we notice it consists of 3 parts separated by a '.' character. These are the header, body and signature of the token. The first 2 parts can be Base64 decoded.

Header

The header typically consists of 2 parts (see here for an overview of fields and their meaning). The type of token and the hashing algorithm. In this case the header is

{"kid":"oauth2keypair","alg":"RS256"}

kid refers to the key id. In this case it provides a hint to the resource server on which key alias to use in its key store to validate the signature.

Body

The JWT body contains so-called claims. In this case the body is

{"sub":"weblogic","iss":"www.oracle.com","exp":1540466428,"iat":1540465828}

The subject is the subject for which the token was issued. www.oracle.com is the issuer of the token. iat indicates an epoch at which the token was issued and exp indicates until when the token is valid. Tokens are valid only for a limited duration. www.oracle.com is an issuer which is accepted by default so no additional configuration was required.

Signature

The signature contains an encrypted hash of the header/body of the token. If those are altered, the signature validation will fail. To encrypt the signature, a key-pair is used. Tokens are signed using a public/private key pair.

Challenges

Implementing the OAuth2 client credentials flow using only a WebLogic server and OWSM can be challenging. Why?

Authentication server. Bare WebLogic + Service Bus do not contain an authentication server which can provide JWT tokens.
Resource Server. Authentication of tokens. The predefined OWSM policies which provide authentication based on JWT tokens (oracle/http_jwt_token_service_policy and oracle/http_jwt_token_over_ssl_service_policy) are picky to what tokens they accept.
Resource Server. Authorization of tokens. OWSM provides a predefined policy to do role based access to resources: oracle/binding_permission_authorization_policy. This policy works for SOAP and REST composites and Service Bus SOAP services, but not for Service Bus REST services.

How did I fix this?

Create a simple authentication server to provide tokens which conform to what the predefined OWSM policies expect. By increasing the OWSM logging and checking for errors when sending in tokens, it becomes clear which fields are expected.
Create a custom OWSM policy to provide role based access to Service Bus REST resources

Custom components

Authentication server

The authentication server has several tasks:

authenticate the user (client credentials)

using the WebLogic security realm

validate the client credentials request

using Apache HTTP components

obtain a public and private key for signing

from the OPSS KeyStoreService (KSS)

generate a token and sign it

using the Nimbus JOSE+JWT library

Authentication

User authentication on WebLogic Server of servlets consists of 2 configuration files.

A web.xml. This file indicates

which resources are protected
how they are protected (authentication method, TLS or not)
who can access the resources (security role)

The weblogic.xml indicates how the security roles map to WebLogic Server roles. In this case any user in the WebLogic security realm group tokenusers (which can be in an external authentication provider such as for example an AD or other LDAP) can access the token service to obtain tokens.

Validate the credentials request

From Postman you can do a request to the token service to obtain a token. This can also be used if the response of the token service conforms to the OAuth2 standard.

By default certificates are checked. With self-signed certificates / development environments, those checks (such as host name verification) might fail. You can disable the certificate checks in the Postman settings screen.

Also Postman has a console available which allows you to inspect requests and responses in more detail. The request looked like

Thus this is what needed to be validated; an HTTP POST request with a body containing application/x-www-form-urlencoded grant_type=client_credentials. I've used the Apache HTTP components org.apache.http.client.utils.URLEncodedUtils class for this.

After deployment I of course needed to test the token service. Postman worked great for this but I could also have used Curl commands like:

curl -u tokenuser:Welcome01 -X POST -d "grant_type=client_credentials" http://localhost:7101/oauth2/resources/tokenservice

Accessing the OPSS keystore

Oracle WebLogic Server provides Oracle Platform Security Services.

OPSS provides secure storage of credentials and keys. A policy store can be configured to allow secure access to these resources. This policy store can be file based, LDAP based and database based. You can look at your jps-config.xml file to see which is in use in your case;

You can also look this up from the EM

In this case the file based policy store system-jazn-data.xml is used. Presence of the file on the filesystem does not mean it is actually used! If there are multiple policy stores defined, for example a file based and an LDAP based, the last one appears to be used.

The policy store can be edited from the EM

You can create a new permission:

Codebase: file:${domain.home}/servers/${weblogic.Name}/tmp/_WL_user/oauth2/-
Permission class: oracle.security.jps.service.keystore.KeyStoreAccessPermission
Resource name: stripeName=owsm,keystoreName=keystore,alias=*
Actions: read

The codebase indicates the location of the deployment of the authentication server (Java WAR) on WebLogic Server.

Or when file-based, you can edit the (usually system-jazn-data.xml) file directly

In this case add:

<grant>
<grantee>
<codesource>
<url>file:${domain.home}/servers/${weblogic.Name}/tmp/_WL_user/oauth2/-</url>
</codesource>
</grantee>
<permissions>
<permission>
<class>oracle.security.jps.service.keystore.KeyStoreAccessPermission</class>
<name>stripeName=owsm,keystoreName=keystore,alias=*</name>
<actions>*</actions>
</permission>
</permissions>
</grant>

At the location shown below

Now if you create a stripe owsm with a policy based keystore called keystore, the authentication server is allowed to access it!

The name of the stripe and name of the keystore are the default names which are used by the predefined OWSM policies. Thus when using these, you do not need to change any additional configuration (WSM domain config, policy config). OWSM only supports policy based KSS keystores. When using JKS keystores, you need to define credentials in the credential store framework and update policy configuration to point to the credential store entries for the keystore password, key alias and key password. The provided code created for accessing the keystore / keypair is currently KSS based. Inside the keystore you can import or generate a keypair. The current Java code of the authentication server expects a keypair oauth2keypair to be present in the keystore.

Accessing the keystore and key from Java

I defined a property file with some parameters. The file contained (among some other things relevant for token generation):

keystorestripe=owsm
keystorename=keystore
keyalias=oauth2keypair

Accessing the keystore can be done as is shown below.

AccessController.doPrivileged(new PrivilegedAction<String>() {
public String run() {
try {
JpsContext ctx = JpsContextFactory.getContextFactory().getContext();
KeyStoreService kss = ctx.getServiceInstance(KeyStoreService.class);
ks = kss.getKeyStore(prop.getProperty("keystorestripe"), prop.getProperty("keystorename"), null);
} catch (Exception e) {
return "error";
}
return "done";
}
});

When you have the keystore, accessing keys is easy

PasswordProtection pp = new PasswordProtection(prop.getProperty("keypassword").toCharArray());
KeyStore.PrivateKeyEntry pkEntry = (KeyStore.PrivateKeyEntry) ks.getEntry(prop.getProperty("keyalias"), pp);

(my key didn't have a password but this still worked)

Generating the JWT token

After obtaining the keypair at the keyalias, the JWT token libraries required instances of RSAPrivateKey and RSAPublicKey. That could be done as is shown below

RSAPrivateKey myPrivateKey = (RSAPrivateKey) pkEntry.getPrivateKey();
RSAPublicKey myPublicKey = (RSAPublicKey) pkEntry.getCertificate().getPublicKey();

In order to sign the token, an RSAKey instance was required. I could create this from the public and private key using a RSAKey.Builder method.

RSAKey rsaJWK = new RSAKey.Builder(myPublicKey).privateKey(myPrivateKey).keyID(prop.getProperty("keyalias")).build();

Using the RSAKey, I could create a Signer

JWSSigner signer = new RSASSASigner(rsaJWK);

Preparations were done! Now only the header and body of the token. These were quite easy with the provided builder.

Claims:

JWTClaimsSet claimsSet = new JWTClaimsSet.Builder()
.subject(user)
.issuer(prop.getProperty("tokenissuer"))
.expirationTime(expires)
.issueTime(new Date(new Date().getTime()))
.build();

Generate and sign the token:

SignedJWT signedJWT = new SignedJWT(new JWSHeader.Builder(JWSAlgorithm.RS256).keyID(rsaJWK.getKeyID()).build(), claimsSet);
signedJWT.sign(signer);
String token = signedJWT.serialize();

Returning an OAuth2 JSON message could be done with

String output = String.format("{ \"access_token\" : \"%s\",\n" + " \"scope\" : \"read write\",\n" + " \"token_type\" : \"Bearer\",\n" + " \"expires_in\" : %s\n}", token,expirytime);

Role based authorization policy

The predefined OWSM policies oracle/http_jwt_token_service_policy and oracle/http_jwt_token_over_ssl_service_policy create a SecurityContext which is available from the $inbound/ctx:security/ctx:transportClient inside Service Bus. Thus you do not need a custom identity asserter for this!

However, the policy does not allow you to configure role based access and the predefined policy oracle/binding_permission_authorization_policy does not work for Service Bus REST services. Thus we need a custom policy in order to achieve this. Luckily this policy can use the previously set SecurityContext to obtain principles to validate.

Challenges

Provide the correct capabilities to the policy definition was a challenge. The policy should work for Service Bus REST services. Predefined policies provide examples, however they could not be exported from the WSM Policies screen. I did 'Create like' a predefined policy which provided the correct capabilities and then copied those capability definitions to my custom policy definition file. Good to know: some capabilities required the text 'rest' to be part of the policy name.

Also I encountered a bug in 12.2.1.2 which is fixed with the following patch: Patch 24669800: Unable to configure Custom OWSM policy for OSB REST Services. In 12.2.1.3 there were no issues.

An OWSM policy consists of two deployments

A JAR file

This JAR contains the Java code of the policy. The Java code uses the parameters defined in the file below.
A policy-config.xml file. This file indicates which class is implementing the policy. Important part of this file is the reference to restUserAssertion. This maps to an entry in the file below

A policy description ZIP file

This contains a policy description file.

The description ZIP file contains a single XML file which answers questions like;

Which parameters can be set for the policy?
Of which type are the parameters?
What are the default values of the parameters?
Is it an authentication or authorization policy?
Which bindings are supported by the policy?

The policy description file contains an element which maps to the entry in the policy-config.xml file. Also the ZIP file has a structure which is in line with the name and Id of the policy. It is like;

Thus the name of the policy is CUSTOM/rest_user_assertion_policy

This name is also part of the contents of the rest_user_assertion_policy file. You can also see there is again a reference to the implementation class and the restUserAssertion element which is in the policy-config.xml file is also there. The capabilities of the policy are mentioned in the restUserAssertion attributes.

Finally

As mentioned before, the installation manual and code can be found here. Of course this solution does not provide all the capabilities of a product like API Platform Cloud Service, OAM, OES. Usually you don't need all those capabilities and complexity and just a simple token service /policy is enough. In such cases you can consider this alternative. Of course since it is hosted on WebLogic / Service Bus, it needs some extra protection when exposed to the internet such as a firewall, IP whitelisting, SSL offloading, etc.

↧

Performance! 3 reasons to stick to Java 8 for the moment

June 1, 2019, 7:11 am

≫ Next: How to achieve a graceful shutdown of workers in Python and JavaScript running in Docker containers

≪ Previous: Securing Oracle Service Bus REST services with OAuth2 (without using additional products)

It is a smart thing to move to newer versions of Java! Support such as security updates and new features are just two of them but there are many more. Performance might be a reason to stick to Java 8 though. In this blog post I'll show some results of performance tests I have conducted showing Java 8 has slower startup times and slightly slower throughput compared to Java 8 when using the same Java code. Native images (a GraalVM feature) have greatly reduced startup time and memory usage at the cost of throughput. You can only compile Java 8 byte-code to a native image though (at the moment).

Worse performance for Java 11

I've done extensive measures on performance of different JVMs and different microservice frameworks running on them. You can browse the scripts used here and view a presentation which describes the test set-up here. The test I did was create minimal implementations of several frameworks, compiled them to Java 8 and Java 11 byte-code respectively, ran them in Docker containers on a specific JVM and put load on them. I send a message, waited for a response and then I send the next message (all synchronous). I've done the same test, which ran for 15 minutes per framework/JVM combination (millions of requests), several times and the results are reproducible. I paid specific attention to make sure the load test and the JVM used separate resources. Also, I first wrote the measurements in memory before writing them to disk at the end of the test. I did this to be able to measure sub-millisecond differences between the JVMs and frameworks and not my disk performance.

Slightly worse throughput (response times)

For every framework (Akka, Vert.x, Spring Boot, WebFlux, Micronaut, Quarkus, Spring Fu) I noticed that the performance on Java 8 (tested on Azul Zing, Oracle JDK, OpenJDK and OpenJ9) was slightly worse than on Java 11 (though less than a tenth of a millisecond). For OpenJ9 it is beneficial to go to Java 11 though when running Akka or Spring Boot.

This could be related to the different garbage collection algorithm which is used by default. For Java 8 this is Parallel GC while for Java 11 this is G1 GC. G1 GC with 2Gb of heap performs slightly worse during my test than Parallel GC as you can see in the below graph. When reducing the heap though, G1 GC starts to outperform Parallel GC.

I have not tried Java 11 with Parallel GC (-XX:+UseParallelGC) to confirm the drop in performance is caused by the GC algorithm. As you can see though the performance difference from Java 8 to Java 11 is consistent over the different JVMs (with exception of Akka, Spring Boot on OpenJ9) using different GC algorithms. For example Zing 11 performs slightly worse then Zing 8 and Zing did not make the change in the default GC algorithm. Thus it is likely that when using the same algorithm for OpenJDK and Oracle JDK 8 and 11, there will still be slightly worse performance for JDK 11.

Worse startup time

As you can see in the above graph for the same Spring Boot application, startup time, running on a JVM with 2Gb of heap, for Java 11, was longer. I just showed the graph for Spring Boot but the same holds true for the other microservice frameworks.

GraalVM native compilation only supports Java 8

GraalVM 19 is currently (1st of June 2019) only available in a Java 8 variant. The native compilation option is still in an early adopters phase at the moment, but, even though throughput is worse (see the above graphs, Substrate VM), start-up time is much better! Throughput might be worse because pre-compilation could prevent some runtime optimizations (I have not confirmed this). See for example the below graph of Quarkus startup time on different JVMs. Also it is said memory usage is better when using native images but I've not confirmed this yet by own measurements.

If you want to use native images, for the moment you are stuck on Java 8. A good use-case for native images can be serverless applications where startup time matters and memory footprint plays a role in determining how much the call costs you.

Finally

Java 11 specific features were not used

These tests were performed using the same code but compiled to Java 8 and Java 11 byte-code respectively to exclude effects of compilation (although OpenJDK was used to compile to Java 8 and 11 byte-code in all cases). Usually you would write different code (using new features) when writing it to run on Java 11. For example, the Java Platform Module System (JPMS) could have been implemented or maybe more efficient APIs could have been used. Thus potentially performance on Java 11 could be better than on Java 8.

Disclaimer

Use these results at your own risk! My test situation was specific and might differ from your own situation thus performing your own tests and base your decisions on those results is recommended. This blog post just indicates that moving from Java 8 to Java 11 might not improve performance and prevents you from using native compilation at the moment.

↧

How to achieve a graceful shutdown of workers in Python and JavaScript running in Docker containers

June 6, 2019, 5:29 am

≫ Next: A transparent Spring Boot REST service to expose Oracle Database logic

≪ Previous: Performance! 3 reasons to stick to Java 8 for the moment

You might encounter a situation where you want to fork a script during execution. For example if the amount of forks is dependent on user input or another specific situation. I encounter such a situation in which I wanted to put load on a service using multiple concurrent processes. In addition, when running in a docker container, only the process with PID=1 receives a SIGTERM signal. If it has terminated, the worker processes receive a SIGKILL signal and are not allowed a graceful shutdown. In order to do a graceful shutdown of the worker processes, the main process needs to manage them and only exit after the worker processes have terminated gracefully. Why do you want processes to be terminated gracefully? In my case because I store performance data in memory (disk is too slow) and only write the data to disk when the test has completed.

This seems relatively straightforward, but there are some challenges. Also I implemented this in JavaScript running on Node and in Python. Python and JavaScript handle forking differently.

Docker and the main process

There are different ways to start a main process in a Dockerfile. A best practice (e.g. here) is to use ENTRYPOINT exec syntax which accepts a JSON array to specify a main executable and fixed parameters. A CMD can be used to give some default parameters. The ENTRYPOINT exec syntax can look like:

ENTRYPOINT ['/bin/sh']

This will start sh with PID=1.

ENTRYPOINT also has a shell syntax. For example:

ENTRYPOINT /bin/sh

This does something totally different! It actually executes '/bin/sh -c /bin/sh' in which the first /bin/sh has PID=1. The second /bin/sh will not receive a SIGTERM when 'docker stop' is called. Also CMD and RUN commands after the second ENTRYPOINT example will not be executed while they will in the first case. A benefit of using the shell variant is that variable substitution takes place. The below examples can be executed with the code put in a Dockerfile and the following command:

docker build -t test .
docker run test

Thus

FROM registry.fedoraproject.org/fedora-minimal
ENV greeting hello
ENTRYPOINT [ "/bin/echo", "$greeting" ]
CMD [ "and some more" ]

Will display '$greeting and some more' while /bin/echo will have PID=1.

While

FROM registry.fedoraproject.org/fedora-minimal
ENV greeting hello
ENTRYPOINT /bin/echo $greeting
CMD [ ' and some more' ]

will display 'hello' and you cannot be sure of the PID of /bin/echo

You can use arguments in a Dockerfile. If however you do not want to use the shell variant of ENTRYPOINT, here's a trick you can use:

FROM azul/zulu-openjdk:8u202
VOLUME /tmp
ARG JAR_FILE
COPY ${JAR_FILE} app.jar
ENTRYPOINT ["java","-jar","/app.jar"]

You can use the argument in a copy statement and make sure the target file is always the same. This way you can use the same ENTRYPOINT line while running in this case an app.jar file which is determined by a supplied argument.

Forking and signal handling

Python

See the complete sample code here

In Python you need only two imports to do signal handling: os and signal

I can fork a process by calling

pid=os.fork()

What this does is split the code execution from that point forward with one important difference between master and worker process. The value of pid in the master is 0 while in the worker it has a value greater than 0. You can base logic on the value of pid which is specific to master or worker. Do not mistake the result of the pid variable with the result of os.getpid(). Both processes can have different os.getpid() values of greater than 0.

If you want the master to be able to signal the workers, you can save the pid of the workers in a variable in the master. You can register a signal handler using: signal.signal(signal.SIGINT, exit_signal_handler). In this case the function exit_signal_handler is called when SIGINT is received. You can kill a worker by doing os.kill(worker_pid, signal.SIGINT) in the cleanup procedure of the master. Do not forget to wait until the worker is finished with finished = os.waitpid(worker_pid, 0) or else the master might be finished before the worker causing the worker to be killed in a not so graceful matter.

JavaScript

See the complete sample code here

In JavaScript, forking might be a less obvious thing to do when comparing to Python since in JavaScript it is a good practice to code as much non-blocking as possible. The event loop and workers which pick up tasks, will take care of threading for you. It is a common misconception that JavaScript running on Node is single threaded. It is not; there are multiple worker threads handling tasks. Every fork in this example has its own thread pool and worker threads thus the total amount of threads JavaScript uses when forking is (much) higher than Python.

A drawback of counting on the workers which pull events from the event loop is that it is difficult to obtain fine grained control and thus predictability. If I put something on the event loop, I won't have a guarantee about when it will be picked up. Also I can't be sure the callback handler is called immediately after execution. In my case I needed that control so eventually gave up on Node. I did however create a similar implementation to the one above for Python.

In JavaScript you can use cluster and process to provide you with forking and signal handling capabilities

var cluster = require('cluster');
var process = require('process');

Forking can be done with:

cluster.fork();

This works differently though than with Python. A new process with a new PID is created. This new process though starts the code from the start with some differences. cluster.isMaster is false in the workers and the worker contains an array of workers: cluster.workers. This can be used to signal the workers and wait for them to have gracefully shutdown. Also do mind that master and workers do not share similar variable values since the worker is started as an entirely new process not splitting execution at the fork command like with Python..

Signal handling can be done like;

process.on('SIGTERM', (error, next) => {

mylogger("INFO\t"+pid+"\tSIGTERM received");

cleanup();

});

Signalling the workers can be done with:

for (const id in cluster.workers) {

mylogger("INFO\t"+pid+"\tSending SIGINT to worker with id: "+String(id));

cluster.workers[id].process.kill();

}

The command process.kill() on the worker waits until worker has gracefully shutdown. Do mind that in the cleanup function you need to do different things for the master and the workers. Also mind that the id of the worker in the master is not the PID. The PID is process.pid.

Putting it together

In order to put everything together and make sure when a docker stop is issued, even the workers get a chance at graceful shutdown, several things are needed;

The master needs to have PID=1 so it can receive the SIGTERM which is issued by docker stop. This can be achieved by using the ENTRYPOINT exec syntax
The master needs a signal handler for SIGTERM in order to respond and inform the workers
The master needs to know how to signal the workers (by pid for Python and by id for JavaScript). In JavaScript an array of workers is available by default. In Python you need to keep track yourself.
The master needs to signal the workers
The master needs to wait for the workers to finish with their graceful shutdown before exiting itself. Else the workers are still killed in a not so graceful manner. This works out of the box in JavaScript. In Python, it needs an explicit os.waitpid.
The workers need signal handlers to know when to initiate a graceful shutdown

You now know how to do all of this in Python and JavaScript with available sample code to experiment with. Have fun!

↧

A transparent Spring Boot REST service to expose Oracle Database logic

July 26, 2019, 4:39 am

≫ Next: Apache Camel and Spring Boot: Calling multiple services asynchronously and merging results

≪ Previous: How to achieve a graceful shutdown of workers in Python and JavaScript running in Docker containers

Sometimes you have an Oracle database which contains a lot of logic and you want to expose specific logic as REST services. There are a variety of ways to do this. The most obvious one to consider might be Oracle REST Data Services. ORDS needs to be hosted on an application server and requires some specific configuration. It is quite powerful though (for example supports multiple authentication mechanisms like OAuth) and it is a product supported by Oracle. Another option might be using the database embedded PL/SQL gateway This gateway however is deprecated for APEX and difficult to tune (believe me, I know).

Sometimes there are specific requirements which make the above solutions not viable. For example if you do not want to install and manage an application server or have complex custom authentication logic implemented elsewhere which might be difficult to translate to ORDS or the embedded PL/SQL gateway. Also ORDS and the PL/SQL gateway are no container native solutions thus features like automatic scaling when load increases, might be difficult to implement.

You can consider creating your own custom service in for example Java. The problem here however is that it is often tightly coupled with the implementation. If for example parameters of a database procedure are mapped to Java objects or a translation from a view to JSON takes place in the service, there is often a tight coupling between the database code and the service.

In this blog post I'll provide a solution for a transparent Spring Boot REST service which forwards everything it receives to the database for further processing without this tight coupling, only to to a generic database procedure to handle all REST requests. The general flow of the solution is as follows:

The service receives an HTTP request from a client
Service translates the HTTP request to an Oracle database REST_REQUEST_TYPE object type
Service calls the Oracle database over JDBC with this Object
The database processes the REST_REQUEST_TYPE and creates a REST_RESPONSE_TYPE Object
The database returns the REST_RESPONSE_TYPE Object to the service
The service translates the REST_RESPONSE_TYPE Object to an HTTP response
The HTTP response is returned to the client

How does it work?

What is a REST request? Well... REST is an architectural style. You're also not talking about SOA or EDA requests are you? We're talking about HTTP requests in this case but the method can be applied to other protocols like gRPC, JMS, Kafka if you like. This requires some changes to the code though.

First, if you want a transparent solution to forward requests to the database and return responses from the database, we first have to know what a request and a response is.

What is an HTTP request?

You can read on some basics of HTTP requests here. The following picture taken from previously mentioned link gives a nice summary;

An HTTP request consists of;

A method. GET, POST, etc.
An URL
A list of HTTP headers such as Content-Type, Host, Accept. Security people like these because web-browsers tend to interpret them. See for example the OWASP Secure Headers Project
A body

What is an HTTP response

The below image has been taken from here.

Generally speaking an HTTP response consists of

A status code
HTTP headers
A body

So what does the service do?

It accepts HTTP requests and translates them to Oracle objects which describe the HTTP request. This is then used to call a database procedure over JDBC (protocol translation). Additionally, you can use the service to do things like security, request logging, service result caching, etc.

Oracle database

PL/SQL limitations

PL/SQL has some limitations to deal with. For example, you cannot define object types in package specifications. And you cannot create an associative array (for storing HTTP headers) inside an Oracle Object type.

How to work around these limitations

In order to deal with these limitations, an Oracle Object structure is a good choice. See here. In a body of a package, you can then use these types. Also they can be transported over JDBC. The service (of which you can view the code here) calls the procedure with the required parameters.

Java service

JDBC limitations

JDBC in general does not provide specific Oracle database functionality and datatypes. The Oracle JDBC driver in addition also has some limitations (read the FAQ): Oracle JDBC drivers do not support calling arguments or return values of the PL/SQL types TABLE (now known as indexed-by tables), RESULT SET, RECORD, or BOOLEAN. There are currently no plans to change this. Instead people are encouraged to use RefCursor, Oracle Collections and Structured Object Types. I decided to use Object types since they are easy to use in the database and allow nesting. Main challenges in the service was constructing the correct Objects.

How to run this example

Get a database

First of course have a running Oracle database. You can of course use an existing database in your application landscape or for testing purposes install one yourself. If you're familiar with Vagrant, an easy way to get up and running quickly can be found here. If you're not familiar with Vagrant, you can also install an Oracle database in a Docker image. For that you have two mechanisms. Build it yourself (see here). Or download it from Oracle's container registry. If you do not care about having the database isolated from the rest of your system you can also install it outside VirtualBox/Docker. I recommend XE if you want to go this path since the other database versions require more steps to install.

Create a database user

First login as a system user and create a user which is going to contain the dispatcher.

create user testuser identified by Welcome01
grant dba,resource,connect to testuser

Of course in an enterprise environment, you want to be a bit more specific with your grants.

Create the database objects

CREATE OR REPLACE TYPE HTTP_HEADER_TYPE AS OBJECT
(
name VARCHAR2(255),
value VARCHAR2(2014)
);
/
CREATE OR REPLACE TYPE HTTP_HEADERS_TYPE AS TABLE OF HTTP_HEADER_TYPE;
/
CREATE OR REPLACE TYPE REST_REQUEST_TYPE AS OBJECT
(
HTTP_METHOD VARCHAR2(16),
HTTP_URL VARCHAR2(1024),
HTTP_HEADERS HTTP_HEADERS_TYPE,
HTTP_BODY CLOB
);
/
CREATE OR REPLACE TYPE REST_RESPONSE_TYPE AS OBJECT
(
HTTP_STATUSCODE NUMBER,
HTTP_HEADERS HTTP_HEADERS_TYPE,
HTTP_BODY CLOB
);

Create the database package.

Below is a minimal example. Of course here you can write your own implementation as long as the specification remains the same your Java code does not require changing.
CREATE OR REPLACE PACKAGE gen_rest AS

PROCEDURE dispatcher (
p_request IN rest_request_type,
p_response OUT rest_response_type
);

END gen_rest;
/
CREATE OR REPLACE PACKAGE BODY gen_rest AS

PROCEDURE dispatcher (
p_request IN rest_request_type,
p_response OUT rest_response_type
) AS
l_httpheader http_header_type;
l_httpheaders http_headers_type := http_headers_type();
BEGIN
l_httpheader := http_header_type('Content-Type', 'application/json');
l_httpheaders.extend;
l_httpheaders(l_httpheaders.count) := l_httpheader;
p_response := rest_response_type(200, l_httpheaders, '{"response":"Hello World"}');
END dispatcher;

END gen_rest;

Download and install the JDBC driver locally

Download the driver from here.
Make sure you have Apache Maven installed and present on the path. If you're familiar with Chocolatey, use: 'choco install maven'.
Install the JDBC driver to your local Maven repository
mvn install:install-file -Dfile="ojdbc8.jar" -DgroupId="com.oracle" -DartifactId=ojdbc8 -Dversion="19.3" -Dpackaging=jar

Download and run the Java service

You can find the code here. The actual Java code code consists of two classes and a configuration file. The configuration file, application.properties contains information required by the Hikari connection pool to be able to create connections. This is also the file you need to update when the database has a different service name or hostname.

The service itself is a Spring Boot service. After you have downloaded the code you can just run it like any old Spring Boot service.

Go to the folder where the pom.xml is located

mvn clean package
java -jar .\target\RestService-1.0-SNAPSHOT.jar

Now you can open your browser and go to http://localhost:8080/api/v1/blabla (or any URL after /v1/)

Finally

Considerations

This setup has several benefits;

There is only a single location which has business logic
Business logic it is located in the database and not in the service. You might argue that is not the location where this logic should be, however in my opinion better in a single location than distributed over two locations. If the current situation is that the database contains the logic, it is often easiest to keep it there. In the long term however, this causes a vendor lock-in.
A custom service is flexible

The Java service is container ready and easily scalable.
The Java service is thin/transparent. You know exactly what happens (not much) and it has the potential to be a lot faster than products which provide more functionality which you might not need.
The service can be enriched with whatever custom functionality you like. Products such as ORDS and the PL/SQL gateway are often more difficult to extend and you are not allowed to alter the (closed source) products themselves.

Not so tight coupling between service and database.
The database code is immutable and only a single version of the service is required. If the messages change which are exchanged (because of changes in the database code), the service does not need to be changed. If the service is build by another team as the database code, these teams do not need to coordinate their planning and releases.

There are of course some drawbacks;

Some changes still require redeployment of the service
If the database itself changes, for example gets a new hostname or requires a new JDBC driver to connect to, the service most likely needs to be redeployed. In a container environment however, you can do this with a rolling zero-downtime upgrade.
Custom code is your own responsibility
The service is quickly put together custom code which has not proven itself for production use. I can only say: 'trust me, it works.. (probably) ;)'

There has not been extensive testing. I didn't take the effort of mocking an Oracle database (JDBC, Oracle database with custom objects, procedures) in a test. Sorry about that.
Documentation is limited to this blog post and the comments in the code.
There is no software supplier who you can go to to ask for support, report bugs or you can use to avoid the responsibility of having to deal with issues yourselves.

Your database developers will create functionality
You're completely dependent on your database developers to implement service functionality. This can be a benefit or drawback, dependent on the people you have available.
This solution is Oracle database specific
You're going to use PL/SQL to implement services. It is not easily portable to other databases. If you do not have a specific reason to implement business logic in your database, do not go this way and cleanly split data and logic preferably in different systems.

JSON in the Oracle database

The example service which has been provided offers little functionality. Functionality is of course customer specific. A challenge can be to process a body and formulate a resposponse from the database. A reason for this is that the request and response body might contain JSON. JSON functionality has only recently been introduced in the Oracle database. A few packages/procedures in 12c and a lot more functionality in 18c and 19c. 11g however offers close to nothing. For 11g there are some alternatives to implement JSON. See for example here. Installing APEX is the easiest.This provides the APEX_JSON package which has a lot of functionality. This package is part of the APEX runtime so you do not need to install the entire development environment. An alternative is the open source library PL/JSON here or if you don't care about breaking license agreements, you can use the following (of course without any warranties or support).

Suggested improvements to the Java service

The sample service is provided as a minimal example. It does not catch errors and create safe error messages from them. This is a security liability since information on the backend systems can arrive at the user of the service. Also of course as indicated, the service is not secured. Anyone who can call the service can access the database procedure. I've not looked at tuning the connection pool yet. Of course you should pay attention to the PROCESSES, SESSIONS, OPEN_CURSORS settings and others of the database. Especially if the service receives lots of calls and has a lot of instances. I've not looked at behavior at high concurrency. The service could be re-implemented using for example Spring WebFlux and reactive JDBC drivers to make a single instance more scalable. Of course you can consider implementing a service result cache, preferable by using an external cache (to share state over service instances).

↧

Apache Camel and Spring Boot: Calling multiple services asynchronously and merging results

August 9, 2019, 8:04 am

≫ Next: Microservice framework startup time on different JVMs

≪ Previous: A transparent Spring Boot REST service to expose Oracle Database logic

Sometimes you have multiple services you want to call at the same time and merge their results when they're all in (or after a timeout). In Enterprise Integration Patterns (EIP) this is a Splitter followed by an Aggregator. I wanted to try and implement this in Spring Boot using Apache Camel so I did. Since this is my first quick try at Apache Camel, I might not have followed much best practices. I used sample code from Baeldungs blog, combined it with this sample of sending async requests using Futures. You can browse my code here.

Run the sample

First clone https://github.com/MaartenSmeets/camel-samples.git

Next
mvn clean package
java -jar .\target\simplerouter-1.0-SNAPSHOT.jar

What does it do?

I can fire off a request to an api-splitter endpoint with a JSON message. It then calls 2 services in parallel (using separate threads) and merges their results. The combined result is returned.

It is pretty fast; response times on my laptop around 10ms.

How does it work?

Two separate services are hosted to accept requests. The api-splitter and the api itself. Of course next to the api-docs which are in Swagger v2 format so you can easily import that in a tool like Postman.

Camel uses Components. 'http' is such a Component and so is 'direct'. These components have their specific configuration. There is a REST DSL available to expose endpoints and indicate which requests are accepted and should go where. The DSL can indicate to which component a request should go. 'direct' components can be called from the same CamelContext. Here most of the 'heavy lifting' happens in my example. The from and to syntax for forwarding requests is pretty straightforward and the RouteBuilder is easy to use.

There is an object available to map the JSON request to in this example. You're not required to map the JSON to a Java object but for processing inside your Java code, this can come in handy. The api-splitter calls a direct:splitter which creates two Futures do do the async calls to the local api (which maps the JSON to a Java object and does some processing). The result when received is then parsed to JSON and the results from both services are merged in a single array. Below a small picture of how this looks.

Finally

A nice first experience with Apache Camel. I was having some challenges with TypeConverters and getting the body in the correct shape/type but beyond that the experience was quite good. It is relatively easy to use (in this small scenario), integrates well with Spring Boot and my first impression is that it is quite flexible. Also the default logging provides useful information on the service call stack.

Of course what I didn't implement was a message agostic routing mechanism and I haven't checked what happens when one of the services doesn't respond; you want to provide a timeout you can do so in the HTTP Component. The code around the Futures will require some nice exception handling though in order to return a nice message if one of the calls fails and the other doesn't.

↧

Microservice framework startup time on different JVMs

September 3, 2019, 11:28 am

≫ Next: Calling an Oracle DB stored procedure from Spring Boot using Apache Camel

≪ Previous: Apache Camel and Spring Boot: Calling multiple services asynchronously and merging results

When developing microservices, a fast startup time is useful. It can for example reduce the amount of time a rolling upgrade of instances takes and reduce build time thus shortening development cycles. When running your code using a 'serverless' framework such as for example Knative or FnProject, scaling and getting the first instance ready is faster.

When you want to reduce startup time, an obvious thing to look at is ahead of time (AOT) compilation such as provided as an early adopter plugin as part of GraalVM. Several frameworks already support this out of the box such as Helidon SE, Quarkus and Micronaut. Spring will probably follow with version 5.3 Q2 2020. AOT code, although it is fast to startup, still shows differences per framework. Which framework produces the native executable which is fastest to start?

If you need specific libraries which cannot be natively compiled (not even when using the Tracing Agent), using Java the old-fashioned JIT way is also an option. You will not achieve start-up times near AOT start-up times but by choosing the right framework and JVM, it can still be acceptable.

In this blog post I'll provide some measures which I did on start-up times of minimal implementations of several frameworks and an implementation with only Java SE. I've looked at both JIT and AOT (wherever this was possible) and ran the code on different JVMs.

Disclaimer

These measures have been conducted on specific hardware (my laptop) using specific test scripts on specific though minimal implementations (but comparable). This is of course not the same as a full blown application running in production on specialized hardware. Use these measures as an inspiration to get a general idea about what differences between startup time might be. If you want to know for sure if these differences are similar for you, conduct your own tests which are representative for your situation on your hardware and see for yourself.

Setup

At the end of this blog post you can find a list of framework versions which have been used for my tests. The framework implementations which I've used can be found here.

Measuring startup time

In order to determine the startup time, I looked at the text line in the logging where the framework indicated it was ready.

Helidon SE
WEB server is up!
Micronaut
Startup completed in
Microprofile (Open Liberty)
server is ready to run a smarter planet
Spring Boot and related
JVM running for
Vert.x
Succeeded in deploying verticle
Akka
Server online at
Quarkus
started in

I wanted to measure the time between the java command to run the JAR file and the first occurance of the above lines. I found the magic on how to do this here. Based on this I could execute the following to get the wanted behavior.

expect -c "spawn JAVACOMMAND; expect \"STRING_TO_LOOK_FOR\" { close }"> /dev/null 2>&1

Next I needed to time that command. In order to do that, I did the following:

ts=$(date +%s%N)

expect ...

echo ADDITIONAL_INFO_ABOUT_MEASURE,$((($(date +%s%N) - $ts)/1000000)) >> output.txt

I did this instead of using the time command because of the higher accuracy and because of the way I piped the expect output to /dev/null.

Implementing a timeout

I noticed sometimes the expect command left my process running. I did not dive into the specifics as to why this happened, but it caused subsequent tests to fail since the port was already claimed. I installed the 'timelimit' tool and specified both a WARN and KILL signal timeout (timelimit -t30 -T30). After that I did a 'killall -9 java' just to be sure. The tests ran a long time and during that time I couldn't use my laptop for other things (it would have disturbed the tests). Having to redo a run can be frustrating and is time consuming. Thus I want to be sure that after a run the java process is gone.

JVM arguments

I used the following JVM arguments:

java8222cmd="/jvms/java-8-openjdk-amd64/bin/java -Xmx1024m -Xms1024m -XX:+UseG1GC -XX:+UseStringDeduplication -jar"

java1104cmd="/jvms/java-11-openjdk-amd64/bin/java -Xmx1024m -Xms1024m -XX:+UseG1GC -XX:+UseStringDeduplication -jar"

javaopenj9222="/jvms/jdk8u222-b10/bin/java -Xmx1024m -Xms1024m -Xshareclasses:name=Cache1 -jar"

javaoracle8221="/jvms/jdk1.8.0_221/bin/java -Xmx1024m -Xms1024m -XX:+UseG1GC -XX:+UseStringDeduplication -jar"

The script to execute the test and collect data

I created the following script to execute my test and collect the data. The Microprofile fat JAR generated a large temp directory on each run which was not cleaned up after exiting. This quickly filled my HD. I needed to clean it 'manually' in my script after a run.

Results

The raw measures can be found here. The script used to process the measures can be found here.

JIT

The results of the JIT compiled code can be seen in the image above.

Of the JVMs, OpenJ9 is the fastest to start for every framework. Oracle JDK 8u221 (8u222 was not available yet at the time of writing) and OpenJDK 8u222 show almost no difference.
Of the frameworks, Vert.x, Helidon and Quarkus are the fastest.
Java 11 is slightly slower than Java 8 (in case of OpenJDK). Since this could be caused by different default garbage collection algorithms I forced them both to G1GC. This has been previously confirmed here. Those results cannot be compared to this blog post one-on-one since the tests in this blog post have been executed (after 'priming') on JVMs running directly on Linux (and on different JVM version). In the other test, they were running in Docker containers. In a Docker container, more needs to be done to bring up an application such as loading libraries which are present in the container, while when running a JVM directly on an OS, shared libraries are usually already loaded, especially after priming.
I also have measures of Azul Zing. However I tried using the ReadyNow! and Compile Stashing features to reduce startup time, I did not manage to get the startup time even close to the other JVMs. Since my estimate is I must have done something wrong, I have not published these results here.

AOT

JIT compiled code startup time does not appear to correspond to AOT code start-up time in ranking. Micronaut does better than Quarkus in the AOT area. Do notice the scale on the axis. AOT code is a lot faster to startup compared to JIT code.

Finally

File sizes

See the below graph for the size of the fat JAR files which were tested. The difference in file size is not sufficient to explain the differences in startup time. Akka is for example quite fast to startup but the file size is relatively large. The Open Liberty fat JAR is huge compared to the others but its start-up time is much less than to be expected based on that size. The no framework JAR is not shown but it was around 4Kb.

Servlet engines

The frameworks use different servlet engines to provide REST functionality. The servlet engine alone however says little about the start-up time as you can see below. Quarkus, one of the frameworks which is quick to start-up, uses Reactor Netty, but so do Spring WebFlux and Spring Fu which are not fast to start at all.

Other reasons?

An obvious reason Spring might be slow to start, can be because of the way it does its classpath scanning during startup. This can cause it to be slower than it could be without this. Micronaut for example, processes annotations during compile time taking away this action during startup and making it faster to start.

It could be a framework reports itself to be ready while the things it hosts might not be fully loaded. Can you trust a framework which says it is ready to accept requests? Maybe some frameworks only load specific classes at runtime when a service is called and others preload everything before reporting ready. This can give a skewed measure of startup time of a framework. What I could have done is fire off the start command in the background and then fire of requests to the service and record the first time a request is successfully handled as the start-up time. I however didn't. This might be something for the future.

↧

Calling an Oracle DB stored procedure from Spring Boot using Apache Camel

September 27, 2019, 6:57 am

≫ Next: Oracle Database: Write arbitrary log messages to the syslog from PL/SQL

≪ Previous: Microservice framework startup time on different JVMs

There are different ways to create data services. The choice for a specific technology to use, depends on several factors inside the organisation which wishes to realize these services. In this blog post I'll provide a minimal sample on how you can use Spring Boot with Apache Camel to call an Oracle database procedure which returns the result of an SQL query as an XML. You can browse the code here.

Database

How to get an Oracle Database?

Oracle provides many options for obtaining an Oracle database. You can think of using the Oracle Container Registry (here) or use an XE installation (here). I decided to build my own Docker image this time. This provides a nice and quick way to create and remove databases for development purposes. Oracle provides prepared scripts and Dockerfiles for many products including the database, to get up and running quickly.

git clone https://github.com/oracle/docker-images.git
cd docker-images/OracleDatabase/SingleInstance/dockerfiles
Download the file LINUX.X64_193000_db_home.zip from here and place it in the 19.3.0 folder
Build your Docker image: ./buildDockerImage.sh -e -v 19.3.0
Create a local folder. for example /home/maarten/dbtmp19c and make sure anyone can read, write, execute to/from/in that folder. The user from the Docker container has a specific userid and by allowing anyone to access it, you avoid problems. This is of course not a secure solution for in production environments! I don't think you should run an Oracle Database in a Docker container for other then development purposes. Consider licensing and patching requirements.
Create and run your database. The first time it takes a while to install everything. The next time you start it is up quickly.
docker run --name oracle19c -p 1522:1521 -p 5500:5500 -e ORACLE_SID=sid -e ORACLE_PDB=pdb -e ORACLE_PWD=Welcome01 -v /home/maarten/dbtmp19c:/opt/oracle/oradata oracle/database:19.3.0-ee
If you want to get rid of the database instance
(don't forget the git repo though)
docker stop oracle19c
docker rm oracle19c
docker rmi oracle/database:19.3.0-ee
rm -rf /home/maarten/dbtmp19c
Annnnd it's gone!

Create a user and procedure to call

Now you can access the database with the following credentials (from your host). For example by using SQLDeveloper.

Hostname: localhost
Port: 1522
Service: sid
User: system
Password: Welcome01

You can create a testuser with

alter session set container = pdb;

-- USER SQL

CREATE USER testuser IDENTIFIED BY Welcome01

DEFAULT TABLESPACE "USERS"

TEMPORARY TABLESPACE "TEMP";

-- ROLES

GRANT "DBA" TO testuser ;

GRANT "CONNECT" TO testuser;

GRANT "RESOURCE" TO testuser;

Hostname: localhost
Port: 1522
Service: pdb
User: testuser
Password: Welcome01

Create the following procedure. It returns information of the tables owned by a specified user in XML format.

CREATE OR REPLACE PROCEDURE GET_TABLES

(

p_username IN VARCHAR2,RESULT_CLOB OUT CLOB

) AS

p_query varchar2(1000);

BEGIN

p_query := 'select * from all_tables where owner='''||p_username||'''';

select dbms_xmlgen.getxml(p_query) into RESULT_CLOB from dual;

END GET_TABLES;

This is an easy example on how to convert a SELECT statement result to XML in a generic way. If you need to create a specific XML, you can use XMLTRANSFORM or create your XML 'manually' with functions like XMLFOREST, XMLAGG, XMLELEMENT, etc.

Data service

In order to create a data service, you need an Oracle JDBC driver to access the database. Luckily, recently, Oracle has put its JDBC driver in Maven central for ease of use. Thank you Kuassi and the other people who have helped making this possible!

<groupId>com.oracle.ojdbc</groupId>

<artifactId>ojdbc8</artifactId>

</dependency>

The Spring Boot properties which are required to access the database:

spring.datasource.url=jdbc:oracle:thin:@localhost:1522/pdb
spring.datasource.driver-class-name=oracle.jdbc.OracleDriver
spring.datasource.username=testuser
spring.datasource.password=Welcome01

The part of the code which actually does the call, prepares the request and returns the result is shown below.

The template for the call is the following:

sql-stored:get_tables('p_username' VARCHAR ${headers.username},OUT CLOB result_clob)?dataSource=dataSource

The datasource is provided by Spring Boot / Spring JDBC / Hikari CP / Oracle JDBC driver. You get that one for free if you include the relevant dependencies and provide configuration. The format of the template is described here. The example illustrates how to get parameters in and how to get them out again. It also shows how to convert a Clob to text and how to set the body to a specific return variable.

Please mind that if the query does not return any results, the OUT variable is Null. Thus getting anything from that object will cause a NullpointerException. Do not use this code as-is! It is only a minimal example

You can look at the complete example here and build it with maven clean package. The resulting JAR can be run with java -jar camel-springboot-oracle-dataservice-0.0.1-SNAPSHOT.jar.

Calling the service

The REST service is created with the following code:

It responds to a GET call at http://localhost:8081/camel/api/in

Finally

Benefits

Creating data services using Spring Boot with Apache Camel has several benefits:

Spring and Spring Boot are popular in the Java world. Spring is a very extensive framework providing a lot of functionality ranging from security, monitoring, to implementing REST services and many other things. Spring Boot makes it easy to use Spring.
There are many components available for Apache Camel which allow integration with diverse systems. If the component you need is not there, or you need specific functionality which is not provided, you can benefit from Apache Camel being open source.
Spring, Spring Boot and Apache Camel are solid choices which have been worked at for many years by many people and are proven for production use. They both have large communities and many users. A lot of documentation and help is available. You won't get stuck easily.

There is a good chance that when implementing these 2 together, You won't need much more for your integration needs. In addition, individual services scale a lot better and usually have a lighter footprint than for example an integration product running on an application server platform.

Considerations

There are some drawbacks to using these products to realize data services though

Spring / Spring Boot do not (yet) support GraalVMs native compilation out of the box. When running on a cloud environment and memory usage or start-up time matter, you could save money by for example implementing Quarkus or Micronaut. Spring will support GraalVM out of the box in version 5.3 expected Q2 2020 (see here). Quarkus has several Camel extensions available but not the camel-sql extension since that is based on spring-jdbc.
This example might require specific code per service (depending on your database code). This is custom code you need to maintain and might have overhead (build jobs, Git repositories, etc). You could consider implementing a dispatcher within the database to reduce the amount of required services. See my blog post on this here (consider not using the Oracle object types for simplicity). Then however you would be adhering to the 'thick database paradigm' which might not suite your tastes and might cause a vendor lock-in if you start depending on PL/SQL too much. The dispatcher solution is likely not to be portable to other databases.
For REST services on Oracle databases, implementing Oracle REST Data Services is also a viable and powerful option. Although it can do more, it is most suitable for REST services and only on Oracle databases. If you want to provide SOAP services or are also working with other flavors of databases, you might want to reduce the amount of different technologies used for data services to allow for platform consolidation and make your LCM challenges not harder than they already might be.

↧

Oracle Database: Write arbitrary log messages to the syslog from PL/SQL

October 26, 2019, 7:43 am

≫ Next: Microservices: What do you need to tweak to optimize throughput and response times

≪ Previous: Calling an Oracle DB stored procedure from Spring Boot using Apache Camel

Syslog is a standard for message logging, often employed in *NIX environments. It allows separation of the software that generates messages, the system that stores them, and the software that reports and analyzes them. Each message is labeled with a facility code, indicating the software type generating the message, and assigned a severity level.

In *NIX systems syslog messages often end up in /var/log/messages. You can configure these messages to be forwarded to remote syslog daemons. Also a pattern which often seen is that the local log files are monitored and processed by an agent.

Oracle database audit information can be send to the syslog daemon. See for example the audit functionality. If you however want to use a custom format in the syslog or write an entry to the syslog which is not related to an audit action, this functionality will not suffice. How to achieve this without depending on the audit functionality is described in this blog post. PL/SQL calls database hosted Java code. This code executes an UDP call to the local syslog. You can find the code here.

Syslog functionality

There are different ways to send data to the syslog.

By using the logger command
Using TCP
Using UDP

You can execute shell commands from the Oracle database by wrapping them in Java or C or by using DBMS_PIPE (see here). When building a command-line however to log an arbitrary message, there is the danger that the message will contain characters which might break your logger command or worse, do dangerous things on your OS as the user running your database. You can first write a file to a local directory from the database and send that using the logger command, but this is a roundabout way. Using UDP and TCP is more secure and probably also performs better (although I haven't tested this).

TCP in contrast to UDP works with an acknowledgement of a message. This is done in order to provide the sender some confirmation the packet has been received. With UDP, it is 'fire-and-forget' for the sender and you do not know if the receiver has received the packet. UDP is faster as you can imagine since no confirmation is send.

In this example I will be using UDP to send a message to the local syslog. In order to allow this, rsyslog needs to be installed.

For Fedora this can be done with:

dnf install rsyslog

Next configure UDP access by uncommenting the below two lines in /etc/rsyslog.conf

$ModLoad imudp

$UDPServerRun 514

If the daemon is not running, start it with:

systemctl start rsyslog

If you want to start it on boot, do:

systemctl enable rsyslog

You might have to configure your firewall to allow access from localhost/127.0.0.1 to localhost/127.0.0.1 UDP port 514

Java in the Oracle Database

The Oracle database has out of the box packages to do TCP (DBMS_TCP). However there is no such functionality for UDP available. In order to provide this, I've written a small Java class. It can be installed using just PL/SQL code. I've tried this on Oracle DB 19c (using the following Vagrant box) but it is likely to work on older versions.

Create a testuser

First create a testuser and grant it the required permissions:

create user testuser identified by Welcome01;

grant connect,dba,resource to testuser;

begin

dbms_java.grant_permission( 'TESTUSER', 'SYS:java.net.SocketPermission', 'localhost:0', 'listen,resolve' );

dbms_java.grant_permission( 'TESTUSER', 'SYS:java.net.SocketPermission', '127.0.0.1:514', 'connect,resolve' );

end;

Now create the Java code

SET DEFINE OFF

create or replace and compile

java source named "SysLogger"

import java.io.*;

import java.net.*;

public class Syslog {

// Priorities.

public static final int LOG_EMERG = 0; // system is unusable

public static final int LOG_ALERT = 1; // action must be taken immediately

public static final int LOG_CRIT = 2; // critical conditions

public static final int LOG_ERR = 3; // error conditions

public static final int LOG_WARNING = 4; // warning conditions

public static final int LOG_NOTICE = 5; // normal but significant condition

public static final int LOG_INFO = 6; // informational

public static final int LOG_DEBUG = 7; // debug-level messages

public static final int LOG_PRIMASK = 0x0007; // mask to extract priority

// Facilities.

public static final int LOG_KERN = (0 << 3); // kernel messages

public static final int LOG_USER = (1 << 3); // random user-level messages

public static final int LOG_MAIL = (2 << 3); // mail system

public static final int LOG_DAEMON = (3 << 3); // system daemons

public static final int LOG_AUTH = (4 << 3); // security/authorization

public static final int LOG_SYSLOG = (5 << 3); // internal syslogd use

public static final int LOG_LPR = (6 << 3); // line printer subsystem

public static final int LOG_NEWS = (7 << 3); // network news subsystem

public static final int LOG_UUCP = (8 << 3); // UUCP subsystem

public static final int LOG_CRON = (15 << 3); // clock daemon

// Other codes through 15 reserved for system use.

public static final int LOG_LOCAL0 = (16 << 3); // reserved for local use

public static final int LOG_LOCAL1 = (17 << 3); // reserved for local use

public static final int LOG_LOCAL2 = (18 << 3); // reserved for local use

public static final int LOG_LOCAL3 = (19 << 3); // reserved for local use

public static final int LOG_LOCAL4 = (20 << 3); // reserved for local use

public static final int LOG_LOCAL5 = (21 << 3); // reserved for local use

public static final int LOG_LOCAL6 = (22 << 3); // reserved for local use

public static final int LOG_LOCAL7 = (23 << 3); // reserved for local use

public static final int LOG_FACMASK = 0x03F8; // mask to extract facility

// Option flags.

public static final int LOG_PID = 0x01; // log the pid with each message

public static final int LOG_CONS = 0x02; // log on the console if errors

public static final int LOG_NDELAY = 0x08; // don't delay open

public static final int LOG_NOWAIT = 0x10; // don't wait for console forks

private static final int DEFAULT_PORT = 514;

/// Use this method to log your syslog messages. The facility and

// level are the same as their Unix counterparts, and the Syslog

// class provides constants for these fields. The msg is what is

// actually logged.

// @exception SyslogException if there was a problem

@SuppressWarnings("deprecation")

public static String syslog(String hostname, Integer port, String ident, Integer facility, Integer priority, String msg) {

try {

InetAddress address;

if (hostname == null) {

address = InetAddress.getLocalHost();

} else {

address = InetAddress.getByName(hostname);

}

if (port == null) {

port = new Integer(DEFAULT_PORT);

}

if (facility == null) {

facility = 1; // means user-level messages

}

if (ident == null)

ident = new String(Thread.currentThread().getName());

int pricode;

int length;

int idx;

byte[] data;

String strObj;

pricode = MakePriorityCode(facility, priority);

Integer priObj = new Integer(pricode);

length = 4 + ident.length() + msg.length() + 1;

length += (pricode > 99) ? 3 : ((pricode > 9) ? 2 : 1);

data = new byte[length];

idx = 0;

data[idx++] = '<';

strObj = Integer.toString(priObj.intValue());

strObj.getBytes(0, strObj.length(), data, idx);

idx += strObj.length();

data[idx++] = '>';

ident.getBytes(0, ident.length(), data, idx);

idx += ident.length();

data[idx++] = ':';

data[idx++] = '';

msg.getBytes(0, msg.length(), data, idx);

idx += msg.length();

data[idx] = 0;

DatagramPacket packet = new DatagramPacket(data, length, address, port);

DatagramSocket socket = new DatagramSocket();

socket.send(packet);

socket.close();

} catch (IOException e) {

return "error sending message: '" + e.getMessage() + "'";

}

return "";

}

private static int MakePriorityCode(int facility, int priority) {

return ((facility & LOG_FACMASK) | priority);

}

Make the Java code available from PL/SQL

create or replace

procedure SYSLOGGER(p_hostname in varchar2, p_port in number, p_ident in varchar2, p_facility in number, p_priority in number, p_msg in varchar2)

language java

name 'Syslog.syslog(java.lang.String,java.lang.Integer,java.lang.String,java.lang.Integer,java.lang.Integer,java.lang.String)';

Test the Java code

DECLARE

P_HOSTNAME VARCHAR2(200);

P_PORT NUMBER;

P_IDENT VARCHAR2(200);

P_FACILITY NUMBER;

P_PRIORITY NUMBER;

P_MSG VARCHAR2(200);

BEGIN

P_HOSTNAME := NULL;

P_PORT := NULL;

P_IDENT := 'Syslogtest';

P_FACILITY := NULL;

P_PRIORITY := 1;

P_MSG := 'Hi there';

SYSLOGGER(

P_HOSTNAME => P_HOSTNAME,

P_PORT => P_PORT,

P_IDENT => P_IDENT,

P_FACILITY => P_FACILITY,

P_PRIORITY => P_PRIORITY,

P_MSG => P_MSG

);

--rollback;

END;

Now check your local syslog (often /var/log/messages) for entries like

Oct 26 14:31:22 oracle-19c-vagrant Syslogtest: Hi there

Considerations

TCP instead of UDP

This example uses UDP. UDP does not have guaranteed delivery. You can just as well implement this with TCP. Using TCP you do not require custom Java code in the database but you do require Access Control List (ACL) configuration and have to write PL/SQL (using UTL_TCP) to do the calls to rsyslog. An example on how this can be implemented, can be found here.

Custom audit logging to syslog

Using the Oracle feature Fine Grained Auditing (FGA), you can configure a handler procedure which is called when a policy is triggered. Within this procedure you can call the PL/SQL which does syslog logging. The PL/SQL procedure has a SYS_CONTEXT available which contains information like the user, proxy user and even the SQL query and bind variables which triggered the policy (when using DB+EXTENDED logging).

If you want to store what a certain user has seen, you can use Flashback Data Archive (FDA) in addition to FGA. This feature is available for free in Oracle DB 12c and higher. In older versions this depends on the Advanced Compression option. If you combine the FDA and the FGA, you can execute the original query on the data at a certain point in time (on historic data). You can even store the SYS_CONTEXT in the FDA which allows for a more accurate reproduction of what happened in the past. When using these options, mind the performance impact and create specific tablespaces for the FDA and FGA data.

↧

Microservices: What do you need to tweak to optimize throughput and response times

December 13, 2019, 8:26 am

≫ Next: Apache Camel + Spring Boot: Different components to expose HTTP endpoints

≪ Previous: Oracle Database: Write arbitrary log messages to the syslog from PL/SQL

Performance tuning usually goes something like followed:

a performance problem occurs
an experienced person knows what is probably the cause and suggests a specific change
baseline performance is determined, the change is applied, and performance is measured again
if the performance has improved compared to the baseline, keep the change, else revert the change
if the performance is now considered sufficient, you're done. If not, return to the experienced person to ask what to change next and repeat the above steps

This entire process can be expensive. Especially in complex environments where the suggestion of an experienced person is usually a (hopefully well informed) guess. This probably will require quite some iterations for the performance to be sufficient. If you can make these guesses more accurate by augmenting this informed judgement, you can potentially tune more efficiently.

In this blog post I'll try to do just that. Of course a major disclaimer applies here since every application, environment, hardware, etc is different. The definition of performance and how to measure it is also something which you can have different opinions on. In short what I've done is look at many different variables and measuring response times and throughput of minimal implementations of microservices for every combination of those variables. I fed all that data to a machine learning model and asked the model which variables it used to do predictions of performance with. I also presented on this topic at UKOUG Techfest 2019 in Brighton, UK. You can view the presentation here.

Method

Variables

I varied several things

wrote similar implementations in 10 frameworks (see code here)
varied the number assigned cores
varied the assigned memory
varied the Java version (8,11,12,13)
varied the JVM supplier (OpenJ9, Zing, OpenJDK, OracleJDK)
varied the garbage collection algorithm (tried all possible algorithms for every JVM / version)
varied the number of concurrent requests

What did I do?

I measured response times and throughput for every possible combination of variables. You can look at the data here.

Next I put all data into a Random Forest Regression model, confirmed it was accurate and asked the model to provide me with feature importances. Which feature was most important in the generated model for determining the response time and throughput. These are then the features to start tweaking first. The features with low feature importance are less relevant. Of course as I already mentioned, the model has been generated based on the data I've provided. I had to make some choices, because even when using tests of 20s each, testing every combination took over a week. How accurate will the model be when looking at situations outside my test scenario? I cannot tell; you have to check for yourself.

Which tools did I use?

Of course I could write a book about this study. The details of the method used, explain all the different microservice frameworks tested, elaborate on the testtooling used, etc. I won't. You can check the scripts yourself here and I already wrote an article about most of the data here (in Dutch though). Some highlights;

- I used Apache Bench for load generation. Apache Bench might not be highly regarded by some but it did the job well enough and when for example comparing performance to wrk, there is not much difference (see here)
- I used Python for running the different scenario's. Easier than Bash, which I used before.
- For analyzing and visualization of the data I used Jupyter Notebook.
- I first did some warm-up / priming before starting the actual tests
- I took special care not to use virtualization tools such as VirtualBox or Docker
- I also looked specifically at avoiding competition for resources even though I measured on the same hardware as where I produced load. Splitting the load generation and service to different machines would not have worked since the performance differences, were sometimes pretty small (sub millisecond). These differences would be lost when transporting over a network.

Results

Confirm the model is accurate

In the below plot I've shown predicted values against actual values. The diagonal line indicates perfect accuracy. As you can see accuracy is pretty high of the model. Also the R^2 value (coefficient of determination) was around 0.99 for both response times and throughput which is very nice!

Feature importance

The below graphs show the results for feature importance of the different variables.

However I noticed feature importance becomes less accurate when the number of different classes differs per variable. In order to fix that I also looked at permutation feature importance. Permutation feature importance is determined by calculating the reduction in model accuracy when a specific variable is randomized. Luckily this looked very similar:

Conclusion

As you can see, the feature importance of the used framework/implementation was highest. This indicates the choice of implementation (of course within the scope of my tests) was more important than for example the JVM supplier (Zing, OpenJ9, OpenJDK, OracleJDK) for the response times and throughput. The JVM supplier was more important than the choice for a specific garbage collection algorithm (the garbage collection algorithm did not appear to be that important at all, even though when memory became limiting, it did appear to become more important). The Java version did not show much differences.

The least important features during these test were the number of assigned cores. Apparently assigning more cores did not improve performance much. Because I found this peculiar, I did some additional analyses on the data and it appeared certain frameworks are better in using more cores or dealing with higher concurrency then others.

You can check the notebook here.

↧

Apache Camel + Spring Boot: Different components to expose HTTP endpoints

December 23, 2019, 5:25 am

≫ Next: pgAdmin in Docker: Provisioning connections and passwords

≪ Previous: Microservices: What do you need to tweak to optimize throughput and response times

Apache Camel is an open source integration framework that allows you to integrate technologically diverse systems using a large library of components. A common use-case is to service HTTP based endpoints. Those of course come in several flavors and there is quite a choice in components to use.

In this blog post I'll take a look at what is available and how they differ with respect to flexibility to define multiple hosts, ports and URLs to host from using a single CamelContext. Depending on your use-case you will probably be using one of these. You can find my sample project here.

Components

REST DSL

The REST DSL is not an actual component but can be considered a wrapper for several components. It has integration with for example the Swagger module to generate documentation. It uses a RestConfiguration which can only occur once in a CamelContext (when using Spring Boot in combination with Apache Camel). The RestConfiguration specifies things like the base URL, the port and the component used as consumer / HTTP server.

Although easy to use, it is also limiting in that only a single RestConfiguration can be used within a CamelContext and that using multiple CamelContexts within a single application is not supported (see here). Thus this will not allow you to host services on different ports from the same Spring Boot / Apache Camel application. Also it is not allowed to define the same base path for different services in two different component-routes. You can run on a different port then the base Apache Camel / Spring Boot servlet engine though.

restConfiguration().component("netty-http").host("0.0.0.0").port(8084);
rest("/url1").get().id("test_route1").to("log:dummylog");

restConfiguration().component("netty-http").host("0.0.0.0").port(8085);
rest("/url2").get().id("test_route2").to("log:dummylog");

A small example of using the REST DSL above. Mind that both services will be run on port 8085 since there is only one restConfiguration! The REST DSL is powerful since it wraps several other components. If I want to switch to for example jetty instead of netty-http, I only have to change the component specification in the REST configuration and no other code (if I made sure not to use implementation specific features of the different components). You can also externalize the component specification to make switching your HTTP server as simple as changing a property file.

You can find the documentation here and an example on how to use it here.

REST

The REST component is quite powerful. It allows you to easily expose REST endpoints using various other components which implement the org.apache.camel.spi.RestConsumerFactory such as the http, undertow, servlet, netty-http, spark-rest and jetty. Also it has a flexible URI syntax. It integrates tightly with the REST DSL.

If you do not want to use the RestConfiguration class, you can directly configure the component from the CamelContext. The RestComponent however does not have a setPort method. I could not find a quick way to create multiple rest components running on different ports. This is similar as to using REST DSL, see below.

RestComponent sc = getContext().getComponent("rest",RestComponent.class);
sc.setConsumerComponentName("jetty");
from("rest:get:/:url1").id("test_route1").to("log:dummylog");

Mind that you explicitly need to specify the component to use if it is not present in the RestConfiguration and you have multiple components providing the RestConsumerFactory on your classpath. Else you will encounter the following error:

Caused by: java.lang.IllegalArgumentException: Multiple RestConsumerFactory found on classpath. Configure explicit which component to use

HTTP

The camel-http component (in Apache Camel 2.x use camel-http4 instead) allows you to consume HTTP services. It does not allow you to host services with. For that the jetty component is recommended.

Jetty

You can use the Jetty component to create multiple routes with the same or different ports and specify different URLs for the same port in different routes.

The below example will work.
from("jetty:http://0.0.0.0:8083/url1").id("test_route1").to("log:dummylog");
from("jetty:http://0.0.0.0:8083/url2").id("test_route2").to("log:dummylog");

Jetty has the challenge that Jetty libraries are regular inhabitants of most application servers. If you want the classes from your deployment to be used instead of the (often outdated) application server versions, you might need to manipulate dependencies and class loading (create web.xml files, deployment descriptors, etc).

Servlet

The servlet component allows you to host services at different endpoints, however you don't have the option to configure different ports or URLs outside of the path specified in the property camel.component.servlet.mapping.context-path and server-port in application.properties when using the Apache Camel Servlet Spring Boot starter. You can not easily run on a different port then the base Apache Camel / Spring Boot servlet engine.

//Both OK, however path below camel.component.servlet.mapping.context-path and uses the same port
from("servlet:/url1").id("test_route1").to("log:dummylog");
from("servlet:/url2").id("test_route2").to("log:dummylog");

Netty-http

The Netty-http component facilitates HTTP transport using the Netty component. The Netty component is low-level and socket based while netty-http allows you to easily do HTTP with Netty. Netty-http allows you to use the same port in different routes as long as the NettyServerBootstrapConfiguration is the same. This means the configuration of the route is the same; they use the same parameters in the URI. This creates in my opinion a lot of flexibility.

In the below example you can see that hosting on the same port with different URLs works without challenges.

The relevant code:
from("netty-http:http://0.0.0.0:8083/url1").id("test_route1").to("log:dummylog");
from("netty-http:http://0.0.0.0:8083/url2").id("test_route2").to("log:dummylog");

Undertow

The Undertow component allows you to host HTTP and WebSocket endpoints.

from("undertow:http://0.0.0.0:8083/myapp1").id("test_route1").to("log:dummylog");
from("undertow:http://0.0.0.0:8084/myapp2").id("test_route2").to("log:dummylog");

Using Undertow has the benefit of not conflicting with most application server libraries (such as WebLogic) if you want to deploy your Apache Camel / Spring Boot application to one of those.

Spark REST

The Spark REST component and Spark Framework have nothing to do with Apache Spark!

If you want to interface with Apache Spark, you should use the Spark component. The Spark REST component allows you to use Spark Framework. This is a micro framework for creating web applications in Kotlin and Java 8. Mind that the support for Java 8 in Apache Camel 3.x is best effort; you should be running Apache Camel 3.x on Java 11!

It uses Jetty and provides an alternative syntax to the REST DSL. You can however use the REST DSL instead of Spark but still use the Spark REST component. In that case you can still obtain the raw Spark request by using the getRequest method in org.apache.camel.component.sparkrest.SparkMessage and still do Spark specific things with that.

Supplying the port option in the URI did not work:

Caused by: org.apache.camel.ResolveEndpointFailedException: Failed to resolve endpoint: spark-rest://get:url1?port=8083 due to: There are 1 parameters that couldn't be set on the endpoint. Check the uri if the parameters are spelt correctly and that they are properties of the endpoint. Unknown parameters=[{port=8083}]

I discovered the port is a property of the Component instance and not the Endpoint. The Component instance can be obtained from the CamelContext. However, since there is a single Component instance of spark-rest in the CamelContext, you can only supply a single port. Using the RestConfiguration for this of course does not help either since there is also only one instance of that class available in the CamelContext. A workaround for this is adding more components to the CamelContext with different names and a different configuration.

SparkComponent sc = getContext().getComponent("spark-rest",SparkComponent.class);
sc.setPort(8083);

SparkConfiguration sc2config = sc.getSparkConfiguration();
SparkComponent sc2 = new SparkComponent();
sc2.setSparkConfiguration(sc2config);
sc2.setPort(8084);
getContext().addComponent("spark-rest2",sc2);

from("spark-rest://get:url1").id("test_route1").to("log:dummylog");
from("spark-rest2://get:url2").id("test_route2").to("log:dummylog");

This way you can run on multiple ports/URLs using a distinct Component instance per port. This code is however Spark specific and if you want to switch to for example plain Jetty, you have to change it. Also there is the application server challenge since it uses Jetty.

Finally

Summary

Please mind there are many other considerations for choosing a specific component. You can think of specific features like WebSocket support, performance, security related features, blocking (jetty, spark, servlet) or non-blocking (netty, undertow) nature, resource consumption, etc.

The below table summarizes my findings on components which can be used to run as Apache Camel HTTP server within Spring Boot. Red means no (no no expected issues is a good thing ;), green means yes, yellow means 'it depends' or 'not easily'. Undertow comes out quite nicely as being a flexible component!

Apache Camel 2.x vs Apache Camel 3.x

The differences between Camel 2.x and Camel 3.x are well described in the migration guide. Two things I directly encountered are listed below.

groupId of artifacts

In Apache Camel 2.x most of the Spring Boot related artifacts such as the starters are present under groupId org.apache.camel. For Apache Camel 3.x the documentation specifies org.apache.camel.springboot as the groupId. Here only artifacts for the 3.x version are present.

netty4-http and camel-http4

Apache Camel 2.x has netty-http which is deprecated and netty4-http which is a newer version of the component which should be the one to use. For the camel-http component it is similar. Don't use it, but use camel-http4. In Apache Camel 3.x however there is is no '4' version of both components and just netty-http and http. Thus for Apache Camel 2.x you should use netty4-http and camel-http4 while for Apache Camel 3.x you should use netty-http and http. This is also something to mind when migrating.

↧

pgAdmin in Docker: Provisioning connections and passwords

January 2, 2020, 5:32 am

≫ Next: HTTP benchmarking using wrk. Parsing output to CSV or JSON using Python

≪ Previous: Apache Camel + Spring Boot: Different components to expose HTTP endpoints

pgAdmin is a popular open source and feature rich administration and development platform for PostgreSQL. When provisioning Postgres database environments using containers, it is not unusual to also provision a pgAdmin container.

The pgAdmin image provided on Docker Hub does not contain any server connection details. When your pgAdmin container changes regularly (think about changes to database connection details and keeping pgAdmin up to date), you might not want to enter the connections and passwords manually every time. This is especially true if you use a single pgAdmin instance to connect to many databases. A manual step also prevents a fully automated build process for the pgAdmin container.

You can export/import connection information, but you cannot export passwords. It is a bother, especially in development environments where the security aspect is less important, to lookup passwords every time you need them. How to fix this and make your life a little bit easier?

In this blog I'll show how to create a simple script to automate creating connections and supply password information so the pgAdmin instance is ready for use when you login to the console for the first time! This consists of provisioning the connections and provisioning the password files. You can find the files here.

Getting started

In order to test creating connections, I need both a Postgres database and an pgAdmin instance. Docker-compose files are quite suitable to do this. I used the following docker-compose.yml:

version: '3.5'

services:

postgres:

container_name: postgres_container

image: postgres:12.1

environment:

POSTGRES_USER: ${POSTGRES_USER:-postgres}

POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-Welcome01}

PGDATA: /data/postgres

volumes:

- postgres:/data/postgres

ports:

- "5432:5432"

networks:

- postgres

restart: unless-stopped

pgadmin:

container_name: pgadmin_container

image: dpage/pgadmin4:4.16

environment:

PGADMIN_DEFAULT_EMAIL: ${PGADMIN_DEFAULT_EMAIL:-pgadmin4@pgadmin.org}

PGADMIN_DEFAULT_PASSWORD: ${PGADMIN_DEFAULT_PASSWORD:-admin}

volumes:

- pgadmin:/root/.pgadmin

ports:

- "${PGADMIN_PORT:-5050}:80"

networks:

- postgres

restart: unless-stopped

networks:

postgres:

driver: bridge

volumes:

postgres:

pgadmin:

After you've done docker-compose up, you'll see two containers:

[maarten@localhost postgres]$ docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

331f18838528 dpage/pgadmin4:4.16 "/entrypoint.sh" 17 hours ago Up 54 minutes 443/tcp, 0.0.0.0:5050->80/tcp pgadmin_container

7e7d3f3a51d2 postgres:12.1 "docker-entrypoint.s…" 17 hours ago Up 54 minutes 0.0.0.0:5432->5432/tcp postgres_container

Provisioning connections

In order to provision connections, you can use a JSON file and load that. The JSON file I've used is listed below:

servers.json

{

"Servers": {

"1": {

"Name": "pgadmin4@pgadmin.org",

"Group": "Servers",

"Host": "postgres",

"Port": 5432,

"MaintenanceDB": "postgres",

"Username": "postgres",

"SSLMode": "prefer",

"PassFile": "/pgpassfile"

}

In this case I'm creating a single connection to a database with hostname postgres. This is equal to the service name in the docker-compose.yml file above. Also important to note is that I'm referring to a password file: PassFile /pgpassfile.

If you have a running container already which has connections, you can export the connections as follows

docker exec -it pgadmin_container python /pgadmin4/setup.py --dump-servers /tmp/servers.json

docker cp pgadmin_container:/tmp/servers.json .

If you have a running container and want to import connections, you can do the following:

docker cp servers.json pgadmin_container:/tmp/servers.json

docker exec -it pgadmin_container python /pgadmin4/setup.py --load-servers /tmp/servers.json

You can import multiple server JSON files after each other (importing a servers.json file adds the servers) and even add servers with the same name. This is not recommended though.

Provisioning passwords

You now have connections. Passwords however cannot be exported in such a way (see here, 'Password fields cannot be imported or exported'). You have specified a password file however when creating the connection (the PassFile parameter). So how does this file need to look and where does it need to go inside the container?

In order to find out I created a folder using the pgAdmin web interface and searched for the folder name.

[maarten@localhost postgres]$ docker exec -it pgadmin_container find / -name 'New Folder'

/var/lib/pgadmin/storage/pgadmin4_pgadmin.org/New Folder

This location (/var/lib/pgadmin/storage/pgadmin4_pgadmin.org) is a per user location and not per connection. Per connection, a file in that directory can be specified with / being /var/lib/pgadmin/storage/pgadmin4_pgadmin.org/.

The format of the file is specified here. In my example, the postgres database and user are postgres and the password is Welcome01. Thus my password file (pgpassfile) is:

postgres:5432:postgres:postgres:Welcome01

When an instance is created from the image for the first time, the directory /var/lib/pgadmin/storage/pgadmin4_pgadmin.org does not exist yet. It should be created with permission mask 700 owned by user and group pgadmin if we want to provision it beforehand.

docker exec -u pgadmin:pgadmin -it pgadmin_container mkdir -m 700 /var/lib/pgadmin/storage/pgadmin4_pgadmin.org

When using docker cp to copy files into a container, the root user is used. It is not possible to specify a user/group to use within the container with this command. The directory created above is not writable by root. Thus in order to place the password file in the correct location we first copy it to /tmp, set the correct user/group and then from within the container move it to the correct location.

docker cp pgpassfile pgadmin_container:/tmp/pgpassfile

docker exec -it -u root pgadmin_container chown pgadmin:pgadmin /tmp/pgpassfile

docker exec -it pgadmin_container mv /tmp/pgpassfile /var/lib/pgadmin/storage/pgadmin4_pgadmin.org

The file can only be used when it has umask 600 (only readable, writable by pgadmin) so after we have moved the file, we need to set the correct permissions.

docker exec -it pgadmin_container chmod 600 /var/lib/pgadmin/storage/pgadmin4_pgadmin.org/pgpassfile

It would have been nice if the docker cp command had supplied functionality to set the target user, group and permissions in order to avoid such a workaround.

Running the example

In order to run this complete example and see it in action you can do the following:

git clone https://github.com/MaartenSmeets/db_perftest.git
cd db_perftest/pg_provision

bash ./create.sh

Now you can go to: localhost:5050 and login with user pgadmin4@pgadmin.org password admin and when you open the Servers entry on the left and the connection you have created, you don't need to enter a password.

If you want to remove the environment to start over again, you can do:

bash ./remove.sh

Finally

Podman and Docker

Podman is an alternative to docker and the default in recent versions of Fedora and Red Hat. Podman supports rootless containers without the need for a socket connection/daemon and uses systemd instead.

I first tried this with podman instead of docker on Fedora 31. I decided to go back to docker for several reasons:

I couldn't get podman to start containers. I kept getting the following error, even after having tried several podman configuration settings and kernel parameters: [conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied.
This probably wasn't going to be the last challenge I would encounter using podman. Many people appear to be having issues (see for example a recent post here).
I did not want to waste time on rewriting the docker-compose file to podman commands.

In order to get docker to work on Fedora 31 (Fedora switched to CgroupsV2 so Docker does not work out of the box anymore):

sudo yum install docker-ce

sudo systemctl enable docker

sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"

sudo reboot

After this, docker worked without difficulties. I'll take a look at podman again in the future when it using it is less challenging.

↧

HTTP benchmarking using wrk. Parsing output to CSV or JSON using Python

February 1, 2020, 3:14 am

≫ Next: Secure browsing using a local SOCKS proxy server (on desktop or mobile) and an always free OCI compute instance as SSH server

≪ Previous: pgAdmin in Docker: Provisioning connections and passwords

wrk is a modern HTTP benchmarking tool. Using a simple CLI interface you can put simple load on HTTP services and determine latency, response times and the number of successfully processed requests. It has a LuaJIT scripting interface which provides extensibility. A distinguishing feature of wrk compared to for example ab (Apache Bench) is that it requires far less CPU at higher concurrency (it uses threads very efficiently). It does have less CLI features when compared to ab. You need to do scripting to achieve specific functionality. Also you need to compile wrk for yourself since no binaries are provided, which might be a barrier to people who are not used to compiling code.

Parsing the wrk output is a challenge. It would be nice to have a feature to output the results in the same units as CSV or JSON file. More people asked this question and the answer was: do some LuaJIT scripting to achieve that. Since I'm no Lua expert and to be honest, I don't have any people in my vicinity that are, I decided to parse the output using Python (my favorite language for data processing and visualization) and provide you with the code so you don't have to repeat this exercise.

You can see example Python code of this here.

wrk output

See for example the following output of running wrk against a custom API:

Command: wrk --timeout 20s -d20s -c65536 -t5 http://localhost:8080/people

Output:
Running 20s test @ http://localhost:8080/people
5 threads and 65536 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   877.63ms 645.81ms   4.68s    69.42%
    Req/Sec   270.73    164.19     1.72k    72.01%
21424 requests in 20.10s, 10.00MB read
Socket errors: connect 64519, read 0, write 0, timeout 0
Non-2xx or 3xx responses: 12271
Requests/sec:   1065.82
Transfer/sec:    509.59KB

If you want to obtain data which can be used for analyses, it helps if the results are in the same units. This is not the case with wrk. For example, the Req/Sec field contains the average 270.73 but the max 1.72k. If you want to have both of them in the same units, 1.72k needs to be multiplied with 1000. The same applies to the Latency where the average is 877.63ms and the Max 4.68s. For really short durations, it can even go to us (microseconds). Here the factor is also 1000 but when the latency increases, this can go to minutes and hours, for which you have to multiply by 60. The Transfer/sec amount is for small amounts in Bytes but can go to KB, MB, GB, etc. The factor to be used here is 1024.
For my analyses I wanted all durations to be in milliseconds. All amounts of data in Bytes and all amounts in absolute numbers without suffix.

Looking at the wrk source here I found the cause of this challenge. A C file describing the units. This source is input for parsing since it indicates the scope of the units.

For numbers: base, k, M, G, T, P
For amounts of data: K, M, G, T, P
For durations: um, ms, s, m, h

The sentences in the wrk output are more or less structured so suitable to do some regular expressions on to extract the numbers and suffixes.

Obtaining numbers and suffixes

I used regular expressions to obtain the numbers and suffixes and output them as a dict, a Python datatype for an associative array.

Regular expressions are far easier to write than to read. I parse every line and check whether it is one of the following:

Output:
Running 20s test @ http://localhost:8080/people
5 threads and 65536 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   877.63ms 645.81ms   4.68s    69.42%
    Req/Sec   270.73    164.19     1.72k    72.01%
21424 requests in 20.10s, 10.00MB read
Socket errors: connect 64519, read 0, write 0, timeout 0 Non-2xx or 3xx responses: 12271
Requests/sec:   1065.82
Transfer/sec:    509.59KB

These lines contain the most relevant data. When I have the line I try to be as flexible as possible in obtaining the numbers. For example, the decimal separator can be there or not and a suffix can be there or not. Thus for example for an optional suffix I use \w?*. It can be there and if it is there, it can be multiple characters, like for example ms or MB. Also the case might differ for the suffix. Also some values are not always there in the wrk output such as the Socket errors line. For this I fill in some defaults (0 for no errors)

Of course when executing the command, I have information available like the number of threads, connections, duration and information about what I am testing.

The output of the function is something like:

{'lat_avg': 877.63, 'lat_stdev': 645.81, 'lat_max': 4680, 'req_avg': 270.73, 'req_stdev': 164.19, 'req_max': 1720, 'tot_requests': 21424, 'tot_duration': 20010.0, 'read': 10485760, 'err_connect': 64519.0, 'err_read': 0.0, 'err_write': 0.0, 'err_timeout': 0.0, 'req_sec_tot': 1065.82, 'read_tot': 521820.16}

Normalizing data

The next parsing challenge is normalizing the data. For this I created 3 functions for each type of data. get_ms for durations. get_bytes for amounts of data and get_number for amounts.

For example, the function to get a number (see the example code for the other functions here):

def get_number(number_str):
    x = re.search("^(\d+\.*\d*)(\w*)$", number_str)
    if x is not None:
        size = float(x.group(1))
        suffix = (x.group(2)).lower()
    else:
        return number_str

    if suffix == 'k':
        return size * 1000
    elif suffix == 'm':
        return size * 1000 ** 2
    elif suffix == 'g':
        return size * 1000 ** 3
    elif suffix == 't':
        return size * 1000 ** 4
    elif suffix == 'p':
        return size * 1000 ** 5
    else:
        return size

    return False

As you can see, this function also requires some flexibility. It is called with a string which is a float + optional suffix. I use a similar tactic as with parsing the wrk output lines. First I apply a regular expression to the input next I apply the calculations relevant to the specific suffixes. If there is no suffix, the number itself is returned.

Creating a CSV line

When you have a Python dict, it is relatively easy to make a CSV line from it. I created a small function to do this for me:

def wrk_data(wrk_output):
    return str(wrk_output.get('lat_avg')) + ',' + str(wrk_output.get('lat_stdev')) + ',' + str(wrk_output.get(
        'lat_max')) + ',' + str(wrk_output.get('req_avg')) + ',' + str(wrk_output.get('req_stdev')) + ',' + str(
        wrk_output.get(
            'req_max')) + ',' + str(wrk_output.get('tot_requests')) + ',' + str(
        wrk_output.get('tot_duration')) + ',' + str(wrk_output.get(
        'read')) + ',' + str(wrk_output.get('err_connect')) + ',' + str(wrk_output.get('err_read')) + ',' + str(
        wrk_output.get('err_write')) + ',' + str(wrk_output.get('err_timeout')) + ',' + str(wrk_output.get('req_sec_tot')) + ',' + str(wrk_output.get('read_tot'))

It is just a single return statement concatenating the values. This does have the liability though that if certain values cannot be found, wrk_output.get will throw a KeyError. It expects all the data to be there or have default values. Luckily this should always be the case.

Running the example

First download the wrk sources here (git clone https://github.com/wg/wrk.git) and compile them by executing the 'make' command in the cloned repository. Most *NIX systems should have make already installed. I tried it on a pretty bare Ubuntu installation and did not need to install additional dependencies to get this to work.

You can obtain my Python code here and can execute it using python3. First of course update the wrk command path line at the top of the script.

The actual processing is done in the main function:

def main():
    print("****wrk output: \n\n")
    wrk_output = execute_wrk(1, 2, 100, 5, 10, 'http://www.google.com')
    print(str(wrk_output) + "\n\n")
    print("****wrk output dict: \n\n")
    wrk_output_dict = parse_wrk_output(wrk_output)
    print(str(wrk_output_dict) + "\n\n")
    print("****wrk output csv line: \n\n")
    wrk_output_csv = wrk_data(wrk_output_dict)
    print(str(wrk_output_csv))

execute_wrk(1, 2, 100, 5, 10, 'http://www.google.com') executes wrk and captures the output. You can of course adjust the parameters to your use-case (don't make Google do to much work for your tests!). The parameters are the following:

cpuset
Which CPU core to use. Example value 1
threads
The number of threads wrk should use. Should be greater than the number of cores. Example value 2
concurrency
The number of concurrent requests to keep running. Example value 100.
duration
The duration of the test. Example value 5 means 5 seconds.
timeout
How long is a request allowed to take. Example value 10 means 10 seconds
url
The URL to call. Example value http://www.google.com

****wrk output:

Running 5s test @ http://www.google.com
2 threads and 100 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   576.71ms 612.14ms   4.57s    88.82%
    Req/Sec    89.09     41.48   171.00     66.67%
869 requests in 5.03s, 45.55MB read
Requests/sec:    172.61
Transfer/sec:      9.05MB

wrk_output_dict is the result of parsing the output to a dictionary using
parse_wrk_output (of course fields are in random order). This calls get_number and other functions to normalize the values. Printing the dict gives you a string which is, save the ' characters which should be ", JSON.

****wrk output dict:

{'lat_avg': 576.71, 'lat_stdev': 612.14, 'lat_max': 4570.0, 'req_avg': 89.09, 'req_stdev': 41.48, 'req_max': 171.0, 'tot_requests': 869.0, 'tot_duration': 5030.0, 'read': 47762636.8, 'req_sec_tot': 172.61, 'read_tot': 9489612.8, 'err_connect': 0, 'err_read': 0, 'err_write': 0, 'err_timeout': 0}

wrk_output_csv is the CSV output line. The complete output can look like:

****wrk output csv line:

576.71,612.14,4570.0,89.09,41.48,171.0,869.0,5030.0,47762636.8,0,0,0,0,172.61,9489612.8

↧

Secure browsing using a local SOCKS proxy server (on desktop or mobile) and an always free OCI compute instance as SSH server

February 23, 2020, 7:25 am

≫ Next: Performance of relational database drivers. R2DBC vs JDBC

≪ Previous: HTTP benchmarking using wrk. Parsing output to CSV or JSON using Python

Oracle provides several services as 'always free'. In contrast to Azure and Amazon, these include compute instances which remain 'forever' free to use. Although there are some limitations on CPU, disk, network resources, these instances are ideal to use as a remote SSH server and with a little effort a connection target for a locally running SOCKS proxy server. When you configure a browser to use that SOCKS proxy, your web traffic will be send through a secure channel (SSH tunnel) towards the OCI instance and the OCI instance will appear as your browsers client IP for remote sites you visit.

An SSH server in combination with a locally running SOCKS proxy server allows you to browse the internet more securely from for example public Wifi hotspots by routing your internet traffic through a secure channel via a remote server. If you combine this with DNS over HTTPS, which is currently at least available in Firefox and Chrome, it will be more difficult for other parties to analyse your traffic. Also it allows you to access resources from a server outside of a company network which can have benefits for example if you want to check how a company hosted service looks to a customer from the outside. Having a server in a different country as a proxy can also have benefits if certain services are only available from a certain country (a similar benefit as using a VPN or using Tor) or as a means to circumvent censorship.

Do check what is allowed in your company, by your ISP and is legal within your country before using such techniques though. I of course don't want you to do anything illegal and blame me for it ;)

How to configure OCI

The example configuration is based on Oracle Linux 7 but will most likely be the same for RHEL and CentOS. Mind that creating always free instances is only possible in your home region and that changing your home region after account creation is currently not possible. See here.

When configuring the OCI instances, there are some challenges when you are not that experienced with cloud providers such as creating an SSH key pair and making the instance accessible from the internet. After the instance is created, there are also some measures to take to keep the instance updated and to make using it as SOCKS proxy from a remote source easier by assigning the SSH port to 443 (which is usually used for HTTPS traffic).

Create an instance

Creating an OCI instance is relatively easy but consists of several steps.

Prepare the SSH public and private keys

First prepare an SSH key. There are several tools which allow you to do this. The below screenshots are from MobaXterm. You can also use PuttyGen, keytool (a command line tool), KeyStore Explorer, etc. I prefer MobaXterm since next to generating keys, it is also a powerful SSH client, provides a Linux like environment and has a nice SSH tunnel manager.

Do not supply a password. Next to saving the public and private key using the respective buttons, also save the top part starting with ssh-rsa. This is the part which OCI needs to configure the instance. The private key is the thing you use to login from a client.

Create the instance

Why Oracle Linux? I was having some difficulties with the Ubuntu image and I suspect running an Oracle OS on Oracle Cloud might make things easier in the future.

In the below step you copy the previously saved public key.

Now start creating the instance and wait until it is ready

Create a public IP

When using the free tier, you only have a single public IP address. You can create 2 compute instances though. I recommend using different accounts on a single compute instance if you want to allow different users to access it.

Assign the public IP

Note that if an IP is already assigned, you first have to indicate no public IP, apply and then change the setting to the wanted public IP.

Confirm client connectivity

You can confirm you can access your instance with MobaXterm using a regular SSH connection You use the assigned public IP at port 22 and your private key to login with user opc. The screenshot indicates port 443 but that is after you changed it as described below. It starts out with port 22, the default SSH port.

Make sure it auto-updates and restarts when necessary

Since your OCI instance will be accessible at a public IP address and has an open SSH port, it will be bashed with hack attempts. You can keep the SSH port closed until a certain sequence of connection attempts is executed (port knocking) but you might not be able to execute those through a company proxy server. If you keep the port open, it is important to keep your system updated in order to reduce the number of vulnerabilities which can be abused to gain access. Since manual maintenance of environments is no hobby of mine and I do like my system to remain up to date and do not care about reboots once in a while, I've automated this.

The below commands are based on RHEL 7 and variants like OL 7 and CentOS 7

sudo yum -y install yum-cron yum-utils
sudo systemctl enable --now yum-cron.service
sudo systemctl start yum-cron.service
sudo sed -i 's/apply_updates = no/apply_updates = yes/g' /etc/yum/yum-cron.conf
echo "$(echo '* 11 * * * /usr/bin/needs-restarting -r || sudo shutdown -r' ; crontab -l)" | crontab -

This does several things

It checks for updates regularly (interval specified in yum-cron.conf)
It applies the updates
It checks if updates require a restart daily using the needs-restarting command which is part of yum-utils
It executes the restart when required

Change the SSH port

Company proxy servers almost never block port 443. This is the port used to access HTTPS websites. In order to give you maximum flexibility to access your OCI instance, it is recommended to run the SSH server on port 443.

Change the port

sudo sed -i 's/#Port 22/Port 443/g' /etc/ssh/sshd_config
sudo semanage port -m -t ssh_port_t -p tcp 443
sudo firewall-cmd --permanent --zone=public --add-port=443/tcp
sudo firewall-cmd --reload
sudo systemctl restart sshd.service

Update the security list

In the below screenshots port 443 is publicly accessible.

Configure a local SOCKS proxy server

Linux / Unix (should probably also work on Mac)

This is by far the easiest since you don't need more than an SSH client which is there usually by default. Execute a command like:

nohup ssh -i ~/oraclecloudalwaysfree.key -D 8123 -f -C -v -N opc@132.145.250.238 -p 443

And you get an SSH SOCKS server which is available at localhost port 8123. Of course change this to your own IP and refer to your own private key. Output will be saved in ~/nohup.out. If the connection fails, you can check that file for the cause.

-D 8123 starts a SOCKS 4 and SOCKS 5 compliant proxy server on port 8123
-i indicates the private key to use
-f indicates background execution of SSH
-C requests compression of data
-v gives verbose output. Useful for debugging
-N indicates no remote command needs to be executed. we just need the tunnel functionality
-p indicates the port to connect to on the remote host.
opc@132.145.250.238 indicates the user and host to connect to

MobaXterm

I've used MobaXterm before to login using SSH normally. MobaXterm also has an easy to use tunnel interface

The last two icons indicate to MobaXterm to start the tunnel when the application is started and to automatically reconnect upon disconnect.

Android: ConnectBot

ConnectBot is an Android App which allows you to create SSH connections to remote servers, use private keys to login and configure SSH tunnels. If you have a rooted Android phone, you can even use the ProxyDroid app to configure the SOCKS proxy server globally and not specifically per app. The process on how to configure this is described here. For a secure connection to OCI, first load your private key in ConnectBot. Next create a connection to opc@yourhost. Next add a port forward of type Dynamic (SOCKS) with source port 8080. This will start a local SOCKS proxy server available at port 8080. This is what you can configure in webbrowsers.

iPhone

For iPhone it is probably also possible to run a SOCKS proxy locally and connect to it from a browser but since I have no iPhone available I'll leave that to others. You can read for example some discussion on this here.

Others

Bitvise SSH client can also easily be used to configure SSH tunnels. See my blog post about this here.

Configure clients to use the SOCKS proxy server

Firefox desktop

In Firefox on a desktop this is easy.

Firefox mobile

For Firefox on a mobile device this is slightly harder, but on for example Chrome, these settings are not available at all. In Firefox the same settings as described above are available but not nicely from a GUI. The following here describes the steps you need to take.

In the firefox URL bar, type 'about:config' and press enter to access advanced settings
Search for 'socks' and set the following settings:

network.proxy.socks = 127.0.0.1
network.proxy.socks_port = 8080
network.proxy.socks_remote_dns = true

Search for 'proxy.type' and set the following setting:

network.proxy.type = 1

Now confirm you can access the web using your OCI instance by going to

Torrent client on mobile

If you are looking for a torrent client which can run on your mobile phone and supports using a SOCKS server, checkout Flud or tTorrent. I'm using Flud.

Open Flud
Go to Menu > Settings > Network > Proxy Settings
Enter the settings as shown below

Proxy type: SOCKS5
Host: localhost
Port: 8080

Make sure to check 'Use proxy for peer connections' and uncheck 'Requires authentication'
Click 'Apply Proxy'
Done!

↧

Performance of relational database drivers. R2DBC vs JDBC

March 26, 2020, 9:21 am

≫ Next: The size of Docker images containing OpenJDK 11.0.6

≪ Previous: Secure browsing using a local SOCKS proxy server (on desktop or mobile) and an always free OCI compute instance as SSH server

R2DBC provides non-blocking reactive APIs to relational database programmers in Java. It is an open specification, similar to JDBC. JDBC however uses a thread per connection while R2DBC can handle more connections using less threads (and thus potentially use less memory). This could also mean threads are available to do other things like handle incoming requests and less CPU is required because less threads means less context switches. This seems compelling in theory but does R2DBC actually outperform JDBC and use less resources or are benefits only present under specific conditions? In this blog post I'll try and find that out.

I did several load-tests on REST services with a Postgres database back-end and varied

assigned cores to the load generator and service
connection pool sizes and with/without connection pool for R2DBC
concurrency (the number of simultaneous requests to be processed)
driver (JDBC or R2DBC)
framework (Spring, Quarkus)

I measured

response times
throughput
CPU used
memory used

What is there to gain in theory

Threads consume resources

Using less threads means

using less memory; threads require memory
using less CPU; less context switches

Thus in theory higher performance using the same resources at high concurrency.

Memory

Java threads have their own stack and thus require memory. Using less threads means your process will use less memory.

In Java 8, a single thread would cause around 1Mb of memory to be reserved and committed (read here). In Java 11 and higher this has improved; memory allocation for threads has become less aggressive. Although still around 1Mb per thread will be reserved, but it will no longer directly be mapped to actual RAM, meaning the actual RAM can be used for other things (and will only be claimed when used), which is a definite improvement. I would expect that applications using many threads running on Java 8 would benefit in terms of memory usage of going to Java 11.

CPU

Having a large number of concurrent threads running, also has additional CPU cost due to context switching (read here). CPUs consist of cores and cores can host a fixed number of threads (see here). Usually 2 threads per core (when using hyper-threading). My laptop has 6 cores so my system can run 12 threads simultaneously. Applications however are not limited to using only 12 threads. A scheduler assigns a portion of CPU thread time to an application thread and after that period has passed, another thread gets a turn. This switch has a CPU cost. The more threads you have, the more these switches take place. There is usually an optimum number of applications threads where the benefit of concurrency outweighs the additional CPU cost of context switches. If you cross that optimum, adding more application threads will reduce overall performance.

When you can handle more connections using less threads, you save CPU time which would otherwise be required to accommodate for context switches.

What did I measure?

I've created a functionally similar implementation of a service with a database backend (Postgres). I did requests on the services which returned 10 database records per request. You can find the sample implementations I used here.

I've used:

JaxRS with RxJava using JPA and a JDBC driver using the Hikari connection pool
Quarkus with RESTEasy using a JDBC driver with the AgroalPool connection pool
Quarkus with RESTEasy using a R2DBC driver with the R2DBC connection pool
Spring Boot using JPA JDBC and Hikari connection pool
Spring Boot using Spring REST Data with JPA, JDBC and Hikari connection pool
Spring Boot WebFlux with Spring Data using an R2DBC driver and no connection pool
Spring Boot WebFlux with Spring Data using an R2DBC driver and the R2DBC connection pool

I've assigned 1,2 and 4 CPUs to the service and tested with connection pool sizes of 5, 20 and 100. 100 was the maximum number of connections the Postgres database would allow (a default setting I did not change).

I ran compiled and ran the services on OpenJDK 11 with 2Gb of memory assigned and G1GC. The tests did not hit the memory limit thus garbage collection was limited.

wrk

I've used wrk to perform HTTP benchmarking tests at concurrency of 1, 2, 4, 10, 25, 50, 75, 100. wrk is more efficient in using CPU than for example Apache Bench when running at higher concurrency. Also I assigned 1,2 and 4 cores to the load generator (wrk). At the start of each test, I first 'primed' the service so it could build up connections, create threads and load classes by providing full load for a single second. After that I started the actual test of 60 seconds. From the wrk output I parsed (amongst other things) throughput and response times. This is described in my blog post here.

Measures

I've measured response time, throughput, CPU usage and memory usage. CPU is measured using /proc/PID/stat which is described here. Memory is measured using /proc/PID/smaps which is described roughly here. Private, virtual and reserved memory did not differ much thus I mostly looked at private process memory.

What were the results?

I've tested all the combinations of variables I've mentioned above (30 hours of running tests of 60 seconds each). You can find the raw data here. For every line in the findings, I could have shown a graph, but that would be too much information. If you want to have a specific question answered, I recommend loading the data in Excel yourself (it is plain CSV) and play around with a pivot table + pivot graph (do a bit of data exploration).

Effect of the R2DBC connection pool

I tested with and without an R2DBC connection pool using Spring Boot WebFlux.

Memory usage when using a connection pool was significantly higher than when not using a connection pool
CPU usage when using the R2DBC connection pool was significantly higher compared not to using the pool
The connection pool size did not matter much
Average latency was a lot higher (around 10x) when not using a pool
The number of requests which could be processed in 60 seconds when using a pool was a lot higher
Assigning more or less CPUs to the service or the load generator did not change the above findings

Summary: using an R2DBC connection pool allows higher throughput, shorter response times at the cost of higher memory and CPU consumption.

Blocking Quarkus JDBC vs non-blocking Quarkus R2DBC

Now it became more difficult to reach general conclusions

JDBC with a small connection pool was able to process most requests during the one minute test
At no concurrency (only one request running at the same time) JDBC outperformed R2DBC with about 33% better response times
There is an optimum concurrency where R2DBC starts to outperform JDBC in number of requests which can be processed in a minute. When you go higher or lower with concurrency, JDBC seems to do better
When concurrency is increased R2DBC did better with a large connection pool while JDBC started to perform worse when the connection pool was increased
Response times of R2DBC were generally worse than those with JDBC
JDBC took a lot more memory and CPU than R2DBC. This difference became larger at high concurrency.

Perhaps a concurrency of 100 was not enough to make R2DBC shine. R2DBC seems to react differently to connection pool sizes than JDBC with respect to response times and throughput. When short on resources, consider R2DBC since it uses less CPU and memory (likely due to it using less threads or using available threads more efficiently).

This graph was taken at a concurrency of 100. JDBC uses more memory than R2DBC

Quarkus vs Spring Boot vs Spring Boot WebFlux vs JaxRS/JavaRx

Response times and throughput

A complete blocking stack Quarkus + RESTEasy + JDBC gives best response times at a concurrency of 100 and also best throughput.
When using Spring Boot, you can get best response times and throughput at high concurrency by using WebFlux with an R2DBC driver and pool. This is a completely non-blocking stack which uses Spring Data R2DBC.
When using Quarkus, JDBC gives best performance at high concurrency. When using Spring Boot Webflux, R2DBC gives best performance at high concurrency.
Spring Data REST performs worse compared to 'normal' Spring Boot REST services of WebFlux. This is to be expected since Spring Data REST gives you more functionality such as Spring HATEOAS.
Non-blocking services with JAX-RS + RxJava and a blocking backend gives very similar performance to completely blocking service and backend (Spring Boot JPA using JDBC).
A statement like 'a completely non-blocking service and backend performs better at high or low concurrency than a blocking service and backend' cannot be made based on this data.

Summary: For best response times and throughput in Spring Boot use WebFlux + R2DBC + the R2DBC connection pool. For best response times and throughput in Quarkus use a blocking stack with JDBC.

Resources used

Quarkus with R2DBC uses least memory but Quarkus with JDBC uses most memory at high concurrency
Spring Boot memory usage at high concurrency between JDBC and R2DBC or between normal and WebFlux services does not differ much

When CPU is limited, Quarkus using R2DBC or JDBC are quite efficient in their usage. Spring Boot Webflux without an R2DBC pool however uses least CPU. Spring Data REST uses most CPU.

Of course when you want to further reduce resource usage, you can look at native compilation of Quarkus code to further reduce memory and disk-space used. Spring Framework 5.3 is expected to also support native images but that is expected to be released in October of 2020.

Finally

To summarize the results

For Quarkus
stick to JDBC when throughput/response times are important (even at high concurrency)
consider R2DBC when you want to reduce memory usage
For Spring (Boot)
Consider Webflux + R2DBC + R2DBC pool when response times, throughput and memory usage are important

Also

R2DBC cannot yet be used in combination with JPA
If you use an application server, most likely you are tied to JDBC and cannot easily switch to R2DBC
Currently, there are only R2DBC drivers for a handful of relational databases (Postgres, MariaDB, MySQL, MsSQl, H2). The Oracle database driver is noticeably lacking. New versions of the driver however already contain extensions to make reactive access possible. Keep an eye on this thread
Project Loom will introduce Fibers to the Java language. Using fibers, the resources used to service database requests could in theory be further reduced. What will the impact of the introduction of Fibers be to this mix? Will R2DBC adopt fibers? Will JDBC adopt fibers (and what will this mean for R2DBC)? Will a new standard emerge?

↧

The size of Docker images containing OpenJDK 11.0.6

March 31, 2020, 5:15 am

≫ Next: Oracle SOA: Sending delayed JMS messages

≪ Previous: Performance of relational database drivers. R2DBC vs JDBC

When running Java applications in containers, you need to be careful with your resources. If you're not careful with layering your images (for example using Google's Jib), you can quickly get into disk-space issues when running multiple instances, especially when your base image and/or application gets updated regularly. One of the ways you can save resources is by using small base images. In this blog post I determined the uncompressed size of several base images containing OpenJDK 11.0.6 which are available on Docker Hub.

Used script

I used the following Bash script to determine the size of the downloaded images (uncompressed / on my disk). Be careful when running it for yourself! The script cleans your Docker environment.

strings=(
adoptopenjdk:11.0.6_10-jdk-hotspot-bionic
azul/zulu-openjdk-alpine:11.0.6
azul/zulu-openjdk:11.0.6
openjdk:11.0.6-slim
openjdk:11.0.6-jdk-slim-buster
adoptopenjdk/openjdk11:x86_64-ubi-minimal-jdk-11.0.6_10
adoptopenjdk/openjdk11:jdk-11.0.6_10-ubuntu-slim
adoptopenjdk/openjdk11:jdk-11.0.6_10-slim
adoptopenjdk/openjdk11:jdk-11.0.6_10-ubuntu
adoptopenjdk/openjdk11:jdk-11.0.6_10
adoptopenjdk/openjdk11:jdk-11.0.6_10-ubi-minimal
adoptopenjdk/openjdk11:jdk-11.0.6_10-ubi-slim
adoptopenjdk/openjdk11:jdk-11.0.6_10-ubi
adoptopenjdk/openjdk11:jdk-11.0.6_10-debianslim-slim
adoptopenjdk/openjdk11:jdk-11.0.6_10-debianslim
adoptopenjdk/openjdk11:jdk-11.0.6_10-debian-slim
adoptopenjdk/openjdk11:jdk-11.0.6_10-debian
adoptopenjdk/openjdk11:jdk-11.0.6_10-centos-slim
adoptopenjdk/openjdk11:jdk-11.0.6_10-centos
adoptopenjdk/openjdk11:jdk-11.0.6_10-alpine-slim
adoptopenjdk/openjdk11:jdk-11.0.6_10-alpine
mcr.microsoft.com/java/jdk:11u6-zulu-alpine
mcr.microsoft.com/java/jdk:11u6-zulu-centos
mcr.microsoft.com/java/jdk:11u6-zulu-debian10
mcr.microsoft.com/java/jdk:11u6-zulu-debian8
mcr.microsoft.com/java/jdk:11u6-zulu-debian9
mcr.microsoft.com/java/jdk:11u6-zulu-ubuntu
)

for i in "${strings[@]}"; do
echo "$i">> output.txt
docker run --name jdk -it "$i" cat /etc/os-release | grep PRETTY_NAME | tail -n1 >> output.txt
docker images | awk '{print $NF}' | tail -n1 >> output.txt
docker stop `docker ps -qa`
docker rm `docker ps -qa`
docker rmi -f `docker images -qa `
docker volume rm $(docker volume ls -qf)
docker network rm `docker network ls -q`
done

dos2unix output.txt
cat output.txt | paste -d, - - -;

Results

As you can see

The Alpine Linux based images are smallest.
The RHEL/CentOS based images are (generally) largest.
The Microsoft images are generally larger than images with the same OS libraries from other suppliers which have been looked at.
The difference between the largest and smallest image is about 3 times.
The slim images, as their name implies, are smaller than their not so slim siblings.

Of course the images, although they all contain the same JDK, differ in functionality. The bigger ones probably have more tools available inside of them. Having a smaller image however, likely also makes it safer to use since there is less inside which can be abused.

↧

Oracle SOA: Sending delayed JMS messages

August 30, 2018, 8:55 am

≫ Next: Blocking vs non-blocking in a Spring stack: R2DBC vs JDBC and WebFlux vs Web MVC

≪ Previous: The size of Docker images containing OpenJDK 11.0.6

Sometimes you might want to put something on a JMS queue and make it available after a certain period has passed to consumers. How can you achieve this using Oracle SOA Suite?

Queue or connection factory configuration. Works but not message specific

You can set the Time-to-Deliver on the queue or on the connection factory. This indicates a period during which the message is visible in the WebLogic console but will not be seen by consumers (they will have state 'delayed').

Queue overrides

On queue level you can configure a Time-to-Deliver override. This will delay all messages which are being send to the queue. In this case however, we wanted to tweak the delay per message.

Connection Factory

On connection factory level you can configure the default Time-to-Deliver. This delay will be given by default to all messages using the specific connection factory. If you want to use multiple delays on the same queue you can connect to it using multiple connection factories. This again is configuration which is not message specific

JMSAdapter. Sending delayed messages is not possible

Producing delayed messages can be done by calling relevant Java classes (dependent on your JMS implementation) such as described here. When implementing Oracle SOA solutions however, it is more common to use the JMSAdapter instead of directly calling Java code.With the JMSAdapter you can set and get specific JMS header properties. See for example here.

JMSProperties

At first I tried to set the JMS header DeliveryTime. This header however is calculated when a message is produced to a queue or topic.I could not set this property externally
I also tried the property JMS_OracleDelay which can be used with the Oracle AQ JMS implementation. This also did not work with a JMS implementation which used a JDBC persistent store.

By setting specific JMS properties using the JMSAdapter, I did not manage to get this working. Maybe there was some other way using the JMSAdapter? I discovered the JMSAdapter does not call the relevant Java method to produce delayed messages (a feature the AQAdapter does provide). The JMSAdapter thus could not be used to achieve the required functionality. The method which needed to be called was: setTimeToDeliver on the weblogic.jms.extensions.WLMessageProducer.

Consuming messages

Using the JMSAdapter however, we can pick-up delayed messages. A benefit of using the JMSAdapter is that you can easily configure threads (even singleton over a clustered environment) and the delay between messages which are consumed. See for example the below snipplet from the composite.xml;

    <binding.jca config="MyServiceInboundQueue_jms.jca">
    <property name="minimumDelayBetweenMessages">10000</property>
    <property name="singleton">true</property>
    </binding.jca>

This makes sure only one message every 10 seconds is picked up from the queue.

BPEL Java embedding. Producing JMS messages without extending the BPEL engine classpath is not possible

Oracle SOA BPEL provides a feature to embed Java code. We thought we could use this Java code to produce JMS messages with delay since when using Java, we could call the required method. It appeared however that the classpath which was used by the BPEL engine was limited. Classes like javax.jms.* were not available. We could add additional libraries by configuring the BpelcClasspath property in the System MBean Browser to make these standard J2EE libraries available. See here. We did not want to do this however since this would make automatic deployment more challenging and we were unsure we would not introduce side-effects..

Spring component

It appeared the classpath which was available from the Spring component did contain javax.jms.* classes! We did fear however that the context in which the Spring component would run could potentially make it difficult to access the relevant connection factory and queue. Luckily this did not appear to be an issue. Additional benefit of using the Spring component is encapsulation of the Java code and better maintainability. Also in the BPEL process the callout to the Java code was more explicitely visible in form of an invoke.

In order to create a Spring component, the following needs to be done. See for a more elaborate example here.

Create a JDeveloper Java project with a library as dependency which contained the javax.jms.* classes such as 'JAX-WS Web Services'. For SOA Suite 11g make sure you indicate the Java SE version is 1.6. Create a deployment profile to be able to package the code as a JAR file.
Implement the Java code to put a message on the queue. See for example here and create a JAR file from it by compiling the code.
For JDeveloper 11g make sure the Oracle JDeveloper Spring, WebLogic SCA Integration plugin is installed.
Copy the previously created JAR file to your composite project folder subdirectory SCA-INF/lib
In the composite editor create a Spring component. Add XML code like for example below
The Spring component will display an interface. Drag it to the BPEL process where you want to use it. An XSD/WSDL will be generated for you and you can use an assign and invoke to call this service. If you update the interface file / replace the JAR file, you can remove the Spring component interface, add it again to the bean definition xml file, re-wire it to the BPEL component and it will regenerate the WSDL and XSD files.

Summary

The JMSAdapter does not allow enqueueing messages with a specific delay (time-to-deliver).
A (default) time-to-deliver (delay) can be configured on the queue but also on the connection factories
The Spring component uses a different classpath than Java embedded in BPEL
The Spring component can access the InitialContext which in turn allows access to WebLogics JNDI tree
Using the Spring component it is relatively easy to enqueue messages with a message specific delay

↧