FAQ

What is the format of the configuration file for an AWS Blu Age Compare Tool project?

The JSON format

Which operating system (OS) should the AWS Blu Age Compare Tool be run on?

Linux and Windows

Can I mix up both Binary and Flatfile in .bacmp to compare EBCDIC vs ASCII?

No, you can’t compare the two data which are not the same name and format.       
Both Binary and FlatFile comparison modes in Bluage Compare Tool have the following limitations:       
Requires matching filenames: The tool expects identical filenames in both left and right folders. It cannot compare files with different names (e.g., file.ebc vs file.asc).       
Single configuration per file: The tool applies the same processing configuration to both sides of the comparison, making it unable to handle different encoding formats simultaneously.       
No cross-format support: When different file extensions are configured, the tool searches for each filename in both folders, resulting in file not found errors.       
 

My browser got frozen while viewing comparison report contains VB with large arrays. What should I do?

Enable variable-length (VB) column aggregation in your configuration file by adding these two parameters:     
 

"isVBColumnsToAggregate": true,
"minVBOccursToAggregate": 100


This will aggregate VB fields with more than 100 occurrences into single formatted columns with lazy loading, preventing browser freezing. The HTML report will initially show 20 array elements with an "Expand" button to load more on demand.     
If you still experience performance issues, lower the minVBOccursToAggregate value to 50 or 25 to aggregate even smaller arrays. If your VB fields have fewer occurrences and don't need aggregation, set isVBColumnsToAggregate to false.     
Note: Currently this option is available only for Binary file comparison

When should I use Histogram algorithm instead of Myers diff algorithm?

Use Histogram algorithm for better performance with large files and file splitting:

{
 "compareAlgo": "Histogram",
 "splitLines": 100000,
 "splitSize": 500
}

Use Histogram when:

  • File size > 1GB
  • Using splitLines or splitSize options
  • Processing high-volume comparisons
  • Memory-constrained environments

Performance: Histogram is significantly faster than Myers for large files and handles split processing more efficiently.

How do I capture heap dumps and handle memory issues when the tool exits due to OutOfMemoryError or running for days?

Configure JVM for memory debugging:

java -Xmx32g -Xms32g \
    -XX:+UseG1GC \
    -XX:+HeapDumpOnOutOfMemoryError \
    -XX:HeapDumpPath=/tmp/heap_dump.hprof \
    -jar CompareTool.jar config.bacmp

Prevent memory issues:

{
 "compareAlgo": "Histogram",
 "splitLines": 50000,
 "splitSize": 100,
 "maximumDifferences": 1000000,
 "comparisonDataStorage": "OnDisk"
}

Memory requirements:

  • Small files (<1GB): 8GB heap
  • Medium files (1-5GB): 32GB heap
  • Large files (>5GB): 48GB heap

If Out Of Memory error occurs: Reduce splitLines and splitSize values, increase heap size, or use OnDisk storage mode.

Why do my SORT output files differ between the mainframe and the modernized application when the data content is identical?

This is a common observation across modernization projects and is caused by non-deterministic sort behavior in the IBM SORT program.

Root cause:

According to the official IBM documentation, when the EQUALS keyword is not specified in the SORT control card, the order of records with equal keys is not guaranteed. This means two successive runs of the same SORT - even on the same mainframe - may produce different orderings for records sharing the same key values. This non-determinism naturally carries over when comparing mainframe output against the modernized application's output.

See also:

How to handle this in the Compare Tool:

Use a business key that corresponds to the entire record (i.e., all fields concatenated) as the sort key for comparison. This approach ensures that:

  1. Both files (mainframe and modernized) are sorted in a fully deterministic order - since the key covers the complete record, no two records can have equal keys unless they are truly identical;
  2. The Compare Tool can then verify line-by-line that the content is indeed identical.

Recommended practice:

  • Before running a comparison, re-sort both output files using the full record as the sort key. - If your original SORT control card uses a partial key, do not rely on the relative order of equal-key records for validation purposes;
  • If preserving the original sort order matters for your business logic, consider adding the EQUALS keyword to your mainframe SORT control card to enforce stable sorting, then replicate the same behavior in the modernized application.