Logging practices on running solver processes in Java

We implemented a java computer program that solves a certain mathematical problem. The program runs for a long time and produces a textual representation of the best solution found.

It was the opportunity to discover the importance of adopting two usually unnoticed components: the Simple Logging Facade for Java (SLF4J) and the Logback logging framework. By employing both components a step further than they were originally designed, we increased our coding, debugging, testing and validation productivity in means far away than expected for traditional logging. All benefits were nearly for free and required no additional coding.

Six important principles

We were looking for following important six principles to aid us while developing our computer program:

  1. Produce friendly human readable messages when run manually from the console, but not when integrated to the enterprise application.
  2. During test phase, optionally print relevant data and decisions when debugging or validating a certain module.
  3. Print progress messages that might be easily parsed by the enterprise application.
  4. Measure the execution time of relevant operations and print the results to be easily parsed by a profiling tool.
  5. At important checkpoints, print a data dump that allows restarting the calculation from this point on. Such data dump saves us from starting a prolonged calculation from scratch just to debug or test a late phase of the calculation.
  6. At the end of the execution, print one or more representations of the solution.

All these principles imposed great risks: polluting the program with functionality unrelated to its main purpose, increasing code complexity and increasing effort to maintain the additional functionality  However, the next topic shows that those risks were mitigated to nearly zero.

A clever logging configuration

We were pleased to discover that SLF4J and Logback endowed us with all principles nearly for free. By a clever choice of the logger hierarchy, and a proper Logback configuration, one may adjust the behavior of each principle according to each context the program run in. Further components from Logback Classic allow one to redirect data to virtually anywhere (as stdout, stderr, file, rotating files, sockets) by properly setting appenders and formatters.

Form our experience, we would recommend that, for each module of the java program, there should be a base logger and a child logger for each principle. For example, a module “A” might define loggers as described below:

  • Root logger:
    • When running the program manually, shall accept all messages. The appender prints all messages to stdout. We prefer adjusting loggers for each module or component individually to turn off messages that are not required.
    • For integration, we recommend a rolling file appender. Or a file that is unique for each execution. In a more advanced scenario, the calling application created the java program as a POSIX process. To follow the POSIX convention, the Logback configuration should write a copy of error messages to the stderr channel, too. This allows the calling process to parse the stderr channel for error messages and meta-data, and to provide adequate feedback to the user.
  • Logger A: The base logger associated to the module. As usual, this logger one may resort to this logger to print local data and decisions with trace or debug level, which relates to principle 2. Or to print warn, errormessages and exceptions. This logger also inherits messages form all child loggers, if they are not otherwise configured.
    • When running the program manually to test or validate this module, the logger is set to accept trace or debug level. A dedicated appender (with no additivity) might filter out messages to a separated file to prevent polluting the root logger with trace and debug messages.
    • During manual execution without testing or validation, this logger is set to accept info level. The root appender is inherited.
    • For integration, this logger is set to accept only warn and error levels. The root appender is inherited.
  • Logger A.h: Instead of printing human readable messages to stdout, the program shall use a special logger for this purpose, normally with infolevel. Relates to principle 1.
    • When running the application manually, a dedicated appender (with no additivity) filters out messages from this logger to the console (stdout).
    • For integration, this logger is just turned off to prevent polluting the parent logger with user messages.
  • Logger A.p: Used only to print progress messages, normally with infolevel. Relates to principle 3.
    • When running the application manually, a dedicated appender (with no additivity) filters out messages from this logger to the console (stdout).
    • For integration, assuming the calling application created the java program as a POSIX process, a dedicated appender (with no additivity) may filter out messages from this logger to the stdout channel, using a format that may be easily recognized and parsed. The calling application observes the stdout channel for progress messages and provides proper feedback to the user, as updating a progress bar.
  • Logger A.t: Used only to print execution time measures. Relates to principle 4.
    • When running the application manually for validation, a dedicated appender (with no additivity) filters out messages from this logger into a file to be analyzed later by a profiling tool.
    • Else, this logger might just be turned off or ignored to inherit the parent logger.
  • Loggers A.c.1, A.c.2, A.c.3…: Our convention establishes that each logger is used only once to output a set of data that constitutes a checkpoint of the module. Relates to principle 5. For our application, a complete XML document was written as only one log message.
    • When running the application manually, a dedicated appender (with no additivity)  may filter out messages from this logger into a file if we are interested in creating checkpoints.
    • Else, this logger might just be turned off. The parent logger A.C groups the loggers together, to easy turning them all off at once.
  • Loggers A.r.1, A.r.2, A.r.3: Similarly to the previous item, our convention establishes that each logger is used only once to output a set of data that constitutes the calculation result. Relates to principle 6. For our application, a complete XML document was written as only one log message.
    • When running the application manually, a dedicated appender (with no additivity)  may filter out messages from this logger into a file. There may be several loggers to print the same data, one for each supported representation (eg. html, text, xml).
    • For integration, assuming that the program was called as a POSIX process, the appender might write to stdout to be captured by the calling application.  Or write to a file that is read later by the calling application.

Further discussion

Extensive logging was one of the techniques applied to assure algorithmic correctness as early as possible (of course, combined with detailed unit testing and with assertions checking for invariants). Manual debugging would be nearly impracticable due to the complexity of data and the long execution time.

All output data and messages are leveraged through SLF4J, to allow arbitrary configuration of its behavior with Logback. No message is ever written directly to stdout or stderr. No data is ever written directly to a file or any other device.

In order to write checkpoints or result data, additional care must be taken. Data will be output as structured text (html or xml) or even as a binary stream. The formatter must not insert additional characters and write the message exactly as generated by the program.

Conclusion

We have shown how to take advantage of SLF4J and Logback to incorporate six important principles into our mathematical solver program. Those principles were used to increase productivity and to save time while debugging, testing and validating the application. Further, no additional code had to be written to support those important principles, but at the expend of a slightly more complex Logback configuration with several dedicated appenders.

SymmetricDS syncronizing two tables with different names

While working with our newly discovered SymmetricDS tool, I realized that their manual does not clearly state how to synchronize two databases if their table names do not match. That might happen while matching tables between databases of two different companies, each one with its own table name policies. Blame those Database Administrators!

Here is how I accomplished synchronized two tables with different names:

First, I did the usual configuration: I defined two node groups A and B, defined a link between those node groups, defined a channel and a router for each direction (A to B  and B to A).

The description of triggers requires more configuration effort: Since the table names are different on each database, and since each trigger definition is bound to a table name, I had to define one trigger for each table name. Here, I suppose to databases: A and B, where the equivalent tables are called TableA and TableB.

insert into sym_trigger
   (trigger_id,source_table_name,channel_id,last_update_time,create_time)
values
   ('TriggerA','TableA','channel',current_timestamp,current_timestamp),
('TriggerB','TableB','channel',current_timestamp,current_timestamp);

As usual, the trigger needs to be associated with a router. Additional care is required: TriggerA is applicable only on database A and the reported changes are meaningful to be routed to database A to B. TriggerB is handled similarly.

insert into sym_trigger_router
   (trigger_id, router_id, initial_load_order, create_time, last_update_time)
values
   ('TriggerA', 'routerAtoB', 1, current_timestamp, current_timestamp),
('TriggerB', 'routerBtoA', 1, current_timestamp, current_timestamp);

Then, a table transformation must be defined: From node A to node B, table is to renamed from TableA to TableB. And from node B to node A, table is to renamed from TableB to TableA. The 'IMPLED' column policy means that all columns are to be copied without changes. I assume that both tables have the same column names and types. The delete action is ‘DEL_ROW’ and is not the main subject of this article. I prefered to run the transformation during the ‘Load’ transformation point.

insert into sym_transform_table
   (transform_id,source_node_group_id,target_node_group_id,transform_point,source_table_name,target_table_name,delete_action,column_policy)
values
   ('TransfAtoB', 'nodeA', 'nodeB', 'Load', 'TableA', 'TableB', 'DEL_ROW', 'IMPLIED'),
('TransfBtoA', 'nodeB', 'nodeA', 'Load', 'TableB', 'TableA', 'DEL_ROW', 'IMPLIED');

How SymmetricDS successfully replaced our complicated client-server architecture

With SymmetricDS , we avoided investing effort in designing, implementing and testing a complex, asynchronous communication protocol over web-service layer. Instead, we achieved a simple, intuitive interface that allowed us to deliver out solution within few days.

Our application started with a local standalone server that processes long running transactions. It naturally evolved to a distributed architecture. While benefits of the new architecture were clear to us and our customers, an issue arose : we would force our customers to enable their applications with boring and non intuitive web-services to communicate with our transaction server.

In short, the following requirements were considered:

  • The customer application creates data to be queued to our transaction server. Those data must be somehow uploaded to their respective transaction server instance that is running the queue.
  • Furthermore, if the customer updates data that was already sent to the server instance, then these changes must be reflected on the server, too (for shortness, I will omit details, but the transaction server handles transactions with input data that changes over time).
  • Finally, as soon as the transaction server instance concludes the transaction, the resulting status must be returned to the costumer application.

Since the transactions are long running, the requirements would probably resort to a complex, asynchronous communication protocol over web-service layer.

Motivated by the SymmetricDS features, we decided to adopt conventional database tables as communication a interface for our transaction server. The customer application is expected to insert and update data that shall be processed by the transaction server. The same table records the transaction status as soon as the transaction is complete. The transaction server contains identical tables as a queue of pending and completed tasks. SymmetricDS  does all the work of running the required web-services between customer and the transaction server to keep the tables consistent and synchronized.

I confess that such solution is really less fascinating than popular web-service/rest architectures. But the solutions was immediately understood by junior software analysts and inexperienced programmers. And allowed us to deliver the solution about 10 times faster than expected.

Eclipse Subversive does not frustrate me anymore… Now I use EGit!

A considerably time ago, I wrote down my frustration on using the Eclipse Subversive Plugin. At that time, I was annoyed just because refactoring did not work properly. Now, Subversive has gone worse.

SVN itself does a good job tracking history of moved and copied files, even if directories are moved or renamed. By 2009, the Subversive plugin stopped supporting this features and a simple directory rearrangements may now lead your local copy to an irreversible inconsistent state.

Since then, people were tired of gathering Subversion plugins and dependencies over several repositories. But why using Eclipse central repository, if you can reinvent the default install procedure? Subversion is now able to add automatically the repositories and gather dependencies for you. The hack, however, works only once. If you are required to reinstall Subversion, then it might be easier to reinstall whole Eclipse.

Subversion introduced potentially useful keyboard shortcuts that never worked. And staying for weeks on the same workspace is not recommended: label decorators will eventually disappear until you reset all your preferences in your workspace. People used to carry their ‘portable’ workspace with them will notice that suddenly Subversion suddenly stops recognizing SVN projects checked out by it.

Meanwhile, there has been increasing network concerns. Subversion does not follow proxy configuration from Eclipse preferences. Recently, connections over SSH started to fail randomly, leaving incomplete and inconsistent checkouts.

Faced by all that many ‘challenges’, I found myself wondering about the benefits to continue with Subversive. A trial with with EGit (Eclipse Git Plugin) was smooth and pleasurable. I do not have to tell that I moved all my repositories to Git.

I still like SVN, although I already consider it an ‘aged’ technology. There are many nice tools that work well with SVN. However, I cannot recommend Subversive anymore if you want to make good use of Eclipse+Java+SVN . Instead, Eclipse+Java+Git are an appealing trade off for those looking for simplicity, power and productivity. Just install the EGit plugin.

Firefox memory footprint

Firefox is releasing one new version after the other. And each new version, Mozilla claims to have enhanced memory usage to reduce the amount of resource required to load each page. Is that a valid claim?

As a curious computer engineer, I decided to install several legacy Firefox versions and compare the amount of memory required to load two greedy pages: http://www.gmail.com and http://www.spiegel.de.

Here are the results.

One observes that at Firefox 3.6, reading emails from gmail suddenly doubled memory usage. And from Firefox 4.0 on, it requires the triple of memory. I suspect that gmail also loads more content on newer versions since Firefox enhanced javascript support.

But a similar behavior also happened for http://www.spiegel.de news journal, although not as intense as for gmail: memory has grown nearly 3 times!

Although the results are no scientific measures, I believe that they show the greedy memory consumption tendencies from Firefox, and that probably Mozilla was not really successful to “enhance memory usage”.

ILOG OPL syntax highlighting for gEdit

This article offers the syntax highlighting definition file for gEdit 2.16.
Read more of this post

ILOG OPL cannot distinguish apples from oranges

I used to think about OPL that promotes obligatorily mathematical rigor as one would expect from a mathematical modeling language. I was misunderstood: it seems that OPL deludes itself when comparing apples and oranges. OPL still requires enhancements that address serious idiosyncrasies how the model is handled.

Read more of this post

Follow

Get every new post delivered to your Inbox.