There are many libraries out there that contain reusable code, but only a few of them persist data in relational database. Even if they do so, they behave normally as standalone project. It is hard to use their tables in custom queries of the project that embeds them.
Our goal is to create a technology stack that allows to embed logically related table sets into separate modules. There are many use-cases in which reusable persistent data structure can be defined. With a well defined API, complicated queries can be eased.
A reusable module with relational database tables must meet the following requirements:
- The module must provide an API that makes it possible to easily create or modify the records of the tables
- It must be possible to reference the tables with foreign keys from other modules
- The module should provide API to easily extend SQL queries
During a project many parts can be separated that can be implemented in a reusable way. This post shows some practical use-cases and the way how they can be implemented.
Imagine that you have a User entity with the following fields:
Task: Show a filtered and paginated list of users on a website. The list should be ordered by the localized country name
- Select all the users that meet the filter criteria. If we have 1 million results, than download all of them into memory.
- Translate the country name based on a ResourceBundle for each user
- Order the list of users
- Show the current page (Twenty records from the million)
High performance solution with the re-usable Localization module:
Create a table called localized_data with the following fields:
- default_locale (boolean)
Fill this table with the country codes and with their translations. Example:
We have the original SQL:
SELECT ..., c.country_name, ... FROM user u
We would like to change the SQL in the way that:
- Country name is selected from the localized_data table
- The query gives back 20 rows beginning with the row index 10.000 of the result set.
- If the country name is not available in the locale of the user (e.g.: en_US), select the country name with language of the user (e.g.: en)
- If the country name is not available in the language of the user, select the country name with a default language
- If the country name not available at all in the localized_data table, get the country code
The modified SQL would look like this:
SELECT ..., COALESCE( (SELECT ld.value FROM localized_data ld WHERE ld.key = u.country_code AND ld.locale = :userLocale), (SELECT ld.value FROM localized_data ld WHERE ld.key = u.country_code AND ld.locale = :userFallbackLocale), (SELECT ld.value FROM localized_data ld WHERE ld.key = u.country_code AND ld.default_locale), u.country_code) AS country_name, ... FROM user u ORDER BY country_name LIMIT 20 OFFSET 10000;
The SQL above looks difficult. Re-implementing this query at each place is almost impossible. It is time to find a technology that supports Object-Oriented SQL queries to be able to separate complex query parts to reusable functions. Some solutions that allow this:
In former times we used JPA Criteria API. Recently we changed to QueryDSL due to its benefits. The original query would look like this with QueryDSL:
QUser user = new QUser("u"); SQLQuery query = new SQLQuery(...); resultList = query.from(user).list(..., user.countryCode, ...);
It would be hard to write down the COALESCE logic every place where we want to have localized data as part of the result set. Let’s make a function called createLocalizedCoalesce(DSLExpression<String> key, Locale locale). With that function, it will be easy to write the robust query:
QUser user = new QUser("u"); Coalesce countryNameExpression = createLocalizedCoalesce(user.countryCode, Locale locale); DslExpression<String> countryNameAs = countryName.as(); SQLQuery query = new SQLQuery(...); resultList = query.from(user) .orderBy(countryNameAs.asc()) .limit(20).offset(10000) .list(..., countryNameExpression, ...);
Building a query with the complex localized_data logic is not that difficult anymore.
Initialize database schema
Defining tables, views, versions
We chose QueryDSL to write queries. However, we need a tool that can populate the database schema.
We investigated lots of the technologies that supports creating and modifying tables and Liquibase won. In our solution, each module contains a liquibase changelog file that defines the database schema. In the localization module it looks like the following:
<changeSet id="1.0.0" author="everit"> <createTable tableName="localized_data" schemaName="org.everit.osgi.localization.schema"> <column name="localized_data_id" type="bigint" autoIncrement="true"> <constraints primaryKey="true" nullable="false" /> </column> <column name="default_locale" type="boolean" > <constraints nullable="false" /> </column> <column name="key_" type="varchar(255)" /> <column name="locale_" type="varchar(10)" /> <column name="value_" type="varchar(2000)" /> </createTable> <addUniqueConstraint columnNames="key_, locale_" constraintName="unique_key_locale" schemaName="org.everit.osgi.localization.schema" tableName="localized_data" /> </changeSet>
Initializiation of the database
Our stack is based on OSGi and Declarative Services. Using the right modules and components we are able initialize the database before the business logic is activated. This goal can be achieved by the configuration of the following components:
- First of all, we need a JDBC driver that registers a DataSourceFactory as an OSGi service
- Everit DSF component picks up the DataSourceFactory service and registers an XADataSource OSGi service
- Everit – Commons DBCP component picks up the XADataSource service and by instantiating a BasicManagedDataSource it registers a DataSource OSGi service
- Everit Liquibase DataSource component
- picks up the DataSource service that was registered by Everit – Commons DBCP component
- tries to initialize/validate the database with the specified Liquibase changelog file
- if the initialization/validation is successful, the DataSource object will be registered again as an OSGi service but with different service properties
- Based on service properties, business components can pick up the DataSource that was registered by Liquibase DataSource component
With a configuration like this, we can be sure that the database is up-to-date when the business components are activated.
How does the Liquibase DataSource component know, where it should find the changelog file?
Each bundle that contains one or more Liquibase changelog files. They must also provide one or moreliquibase.schema capability. The capability can have the following attributes:
- name: The “name” of the schema that identifies the changelog file. By “schema” I mean the identifier of a logicaly related set of tables, views and sequences. The meaning of “schema” capability attribute is not the same as a database schema.
- resource: The path of the changelog file within the bundle
- custom attributes: Any custom attribute may be specified. They can be useful if we want to allow special filters on the consumer side (e.g.: version, dbtype, …)
In case of localization, the capability MANIFEST header looks like the following:
Everit – Liquibase DataSource component should be configured to pull this capability. To do that, simply specify the following at the liquibase.schema configuration property: org.everit.osgi.localization.schema.
It is possible to add directives in the configuration of the Liquibase DataSource component. E.g.: org.everit.osgi.localization.schema;filter:=(version>=1.0.0)
Referencing tables and views from other modules
By using Liquibase it is possible to include a changelog file from other changelogs. In case we use the Liquibase jar that is bundled by Everit, it is also possible to include changelog files from other bundles. To do that:
- The consumer bundle must have a Require-Capability MANIFEST header that refers to the capability offered by the producer bundle. By doing that, a wire will be created between the bundles. Our solution finds inclusions based on these wires.
- the include tag must define the path with the “eosgi:” prefix and add the name of the liquibase.schema capability as value instead of the path
If we want to use the Localization module in our project, we should include the liquibase changelog in the following way in the changelog of the application:
And specify the following requirement in the MANIFEST of the application bundle:
The wiring and inclusion above works transitively. E.g.: If you include authorization tables in your application, you include its dependencies transitively as well. As resource is included in authorization, the resource table will be created/validated during the application initialization.
Generating the QueryDSL metadata classes
Writing QueryDSL metadata classes manually would be a hard task. There is a tool called LQMG that generates the Metadata classes from the Liquibase capabilities for us. This tool has the following steps:
- Starts an embedded OSGi container (Equinox)
- Deploys the specified bundles. Those bundles should be specified that contain liquibase capabilities.
- Starts an embedded H2 server
- Based on the specified liquibase capabilities, initializes the embedded H2 database. This is done in the same way as Liquibae DataSource works
- Generates the QueryDSL metadata classes from the H2 databsae
It is also possible to specify renaming rules in LQMG to generate class names and properties that are easier to read in the source code.
If we use lqmg-maven-plugin, the dependencies of the projects will be handled as the OSGi bundles.
The Localization module will be available on GitHub soon.
The Resource project is already available at GitHub. Resource has no own functionality. It is simply a table that has a generated id. Resource is often used in other modules to wire their functionality to each other.
Databsae Sequence could have been used as well, but we wanted to let the developers find out if any other module uses the same resource id. In case a foreign key references the resource record, it cannot be deleted until all module allows it.
The authorization module has already been implemented by Everit based on JPA-OSGi technology stack. As we are changing our technology stack, it will be available soon based on Liquibase-QueryDSL at GitHub.
To have a reusable authorization module, we can use the following relational schema:
Defines a permission between two resources.
- authorized_resource: The one, who has a permission on something. E.g.: user, role, user group, external system, etc.
- target_resource: Resources which other resources can run action on. It might be a document that a user can open and edit or a user who can be deleted or modified
- action: The action of the permission. E.g.: If the authorized resource was a user and the target resource was a document, actions could be ‘open’, ‘edit’, ‘delete’, …
It is possible to define inheritance between authorized resources. The child resource inherits all of the permissions of the parent. A parent could be a user group, role or organization, while a child could be a user or user group. The table is abstract enough to let the designer of the application decide, how to describe roles, groups, users and the relation between them.
Functionality of authorization
First of all, the authorization module contains functions that make it possible to manipulate permissions between resources and the permission inheritance DAG. It also contains a permissionChecker that can
- calculate the inheritance and provide it from cache next time
- provide permission information for an action between two resources
- from the database
- from cache
And last but not least, the authorization component provides functionality to extend QueryDSL based queries with permission restrictions. Imagine the following query:
SELECT ... FROM document d JOIN document.attachment a;
We can define permissions on every entity that contains a foreign key to the resource table. Every entities can be found in a SQL statement behind FROM and JOIN parts may be authorized or target resource. In this case we can define permissions on documents and on their attachments. We define the following function:
Predicate createPermissionPredicate( Expression<Long> authorizedResourceIdExpr, Expression<Long> targetResourceIdExpr, String action);
The function generates the following predicate (in QueryDSL of course):
EXISTS(SELECT 1 FROM permission p WHERE p.authorized_resource_id IN (?, ...) AND p.target_resource_id = d.resource_id AND action IN (?, ...))
By using QueryDSL and the function above, it is easy to extend every queries with authorization restrictions.
This is more a concept for us at the moment. We realized that full-text searching together with permission checks cannot be implemented effectively if the index is outside of the database. We saw huge document management systems that did the following:
- Doing a full-text search with a technology like Lucene
- Checking permissions on each record one-by-one
- Showing twenty records on the user interface from the middle of the result set
And saw them die on requests that had larger result sets :).
Every modern DBMS has built in full-text search support:
- MySQL – Sphinx
- PostgreSQL – TSearch2
- MSSQL – Some Microsoft stuff 🙂
- Oracle – Oracle Text
All of these allow to have FTS as part of a complex SQL statement. We believe that by using the FTS and the authorization modules together the two logic can be migrated into one SQL statement easily. By doing that, only the number of final results have to be transported from the database.
After the developer starts to design projects with this concept, he/she will find many reusable parts. Some ideas: historical tables with data mining functions, audit records, currencies with exchange rates, geolocations, gtfs, …