Mentorship Series: Automating Mainframe Modernization Using AI Agents to Reduce Costs by A.Vijay Aditya

Open Mainframe Project Summer Mentorship Series: Midterm Updates – At this midpoint, our selected mentees are reporting in. Below, you’ll learn what they’ve built, the challenges they’ve overcome, and their goals for the rest of the summer. We’re proud of every contribution and eager to see what comes next. Hear from A.Vijay Aditya, Indian Institute of Technology (BHU), Varanasi.

Mainframe systems power some of the most mission-critical workloads across industries like banking, insurance, and government. However, maintaining these legacy systems is becoming increasingly expensive due to aging technology, lack of skilled developers, and outdated programming languages. To tackle this, organizations are adopting AI-driven modernization pipelines that automate discovery, analysis, decision-making, and transformation of legacy applications.

This blog introduces a structured, multi-agent approach to automate mainframe modernization and optimize costs:

Mainframe application discovery and Analysis:

Before any modernization work can begin, organizations must first understand what exists inside the mainframe. This includes identifying COBOL programs, JCL scripts, datasets, copybooks, VSAM files, and more. This process, called discovery, is often manual and tedious. The Discovery Agent automates this step using existing mainframe utilities and APIs, extracting all relevant information for further analysis.

Agent 1 – Discovery Agent:

The Discovery Agent is the foundational AI component in a mainframe modernization pipeline. It automates the tedious, manual process of inventorying legacy applications by using native mainframe tools and secure data transfer methods.

1. Connect to Mainframe

Use a terminal emulator like IBM PCOMM, x3270, or TN3270 to connect to the z/OS system via TSO(Time Sharing Option).

2. Use mainframe utilities to list datasets and discover applications:

Tool	Purpose	Working
IEHLIST	Lists PDS/PDSE members and Volume Table of Contents (VTOC) entries	Used to list datasets and directories within partitioned datasets. Helpful in examining the contents of COBOL sources, copybooks, etc.
IDCAMS	Lists and manages VSAM datasets and system catalogs	Used to access and manage catalog entries and VSAM dataset structures. Essential for identifying structured data schemas and transactional data.
ISPF 3.4	Manual dataset browsing via TSO interface	Helpful during setup and validation. Allows developers to manually inspect the dataset structure and filter the dataset names interactively.
z/OSMF REST APIs	Enables automated dataset discovery using REST endpoints	Allows programmatic access to dataset metadata and content remotely. Ideal for integrating into AI agents.

This helps to extract all the relevant datasets like :

COBOL, PL/I, Assembler source code
JCL scripts (e.g., SYS1.JCL.LIB)
Copybooks (e.g., SYS1.CPYLIB)
DB2/VSAM/IMS schemas (e.g., SYS1.DBDLIB)
BMS screen definitions
Security logs (SMF, SDSF)
Transactional programs (CICS modules)

3. Methods to extract the Data:

Method	Use Case
FTP	Transfer datasets from mainframe to PC or local VM over TCP/IP. Used in scripts and batch jobs.
IND$FILE	Transfers files using 3270 terminal emulators like x3270. Ideal for quick file transfers from a TSO session.
z/OSMF REST API	Provides secure RESTful access to datasets, supports both metadata browsing and downloading files.

4. Parsing & Preprocessing:

After extracting the raw datasets from the mainframe (using tools like FTP, IND$FILE, or z/OSMF REST APIs), the Discovery Agent performs parsing and preprocessing to make the data usable for AI-driven analysis.

4.1. Parsing COBOL, JCL, and Copybooks:

Use custom or open-source parsers (like COBOL-Parser, ANTLR COBOL grammar) to tokenize and analyze each COBOL file.

Break the code into tokens (e.g., keywords, variable declarations, data definitions)
Extract PERFORM, CALL, IF, MOVE, OPEN, and CLOSE statements
Map out control flow and subroutine calls

4.2. Extracting Metadata:

For JCL: Parse EXEC, DD, and JOB statements to understand job flow and data usage.
For Copybooks: Identify common data definitions (01, 05 levels) reused across multiple programs.
For VSAM: Extract key field definitions and dataset references.

4.3. Dependency Mapping:

Match programs to the copybooks they include
Identify batch jobs referencing VSAM/DB2 tables
Create call graphs: which programs call which others

4.4. Parse SMF/SDSF Logs:

Understand application runtime behavior and costs.

From SMF Type 30/70/110: Extract job execution time, CPU usage, number of runs, abends
From SDSF logs: Pull job status (completed, abended), frequency, errors.

4.5. Generate Dependency Graphs

Map how programs and datasets are linked.

Trace calls between COBOL programs (CALL), and JCL→Program→Dataset chains.
Use tools like NetworkX (Python) to visualize and store graphs.

4.6. Compute Technical Complexity:

Quantify how difficult modernization will be.

Metrics:

Lines of Code (LOC)
Cyclomatic complexity (number of branches)
Copybook count
Fan-in/Fan-out (program coupling)

4.7. Normalize and Package as JSON

Structure all parsed and enriched data into a machine-readable format for the Analysis Agent. Once parsing is complete, all extracted metadata is saved in a structured JSON format. This becomes the input to the Analysis Agent.

Fig 1 : Sample output Schema for the Discovery agent

Fig : Architecture of the discovery agent

Agent 2 – Analysis Agent:

The Analysis Agent consumes the parsed metadata from the Discovery Agent and performs intelligent scoring and ranking using LLMs, rule-based logic, and usage metrics.

Step 1: Source Code Metrics

Extract from COBOL, PL/I, or Assembler files using static analysis or LLMs.

Metrics

Source (from Discovery Agent)

Use Case

MIPS usage

SMF Type 30/70 logs

CPU cost per program

Higher MIPS = More expensive to run.

Execution frequency

SMF logs / JCL job triggers

How often does it run

More frequent = higher total cost.

Dataset size

IDCAMS / z/OSMF metadata

Disk/storage cost

Higher size = More storage cost.

Aging components

last_modified from code parsing

Flags outdated programs

Very old code is riskier and costlier to maintain.

Add an Aging_penalty if the code is old (say, more than 5 years)

Cost Score Formula : mips_usage × frequency_multiplier + (dataset_size_mb / 100 ) + aging_penalty

Step 2 : Business Value Estimation

Identify programs involved in mission-critical business logic
Flag programs that handle sensitive data
Determine usage relevance based on dependencies and frequency
Give a numerical business value score (0 to 10)

What the Agent Looks At:

1. Business Rule Mining from COBOL

Use LLMs (e.g., Code LLaMA, Mistral) with prompts like, for example :
“Explain the business purpose of this code:

IF EMPLOYEE.TYPE = ‘P’ THEN CALCULATE_BONUS”.

This reveals:

- Whether the logic belongs to key domains like for example : Payroll, Insurance, LoanApproval.
- The intent of the program, even if the naming is ambiguous.

Example Insight:
IF EMPLOYEE.TYPE = ‘P’ THEN CALCULATE_BONUS
(Means the app handles payroll bonuses → High business value)

2. Sensitive Dataset Detection:

In mainframe modernization, one of the key factors that influences whether an application should be prioritized for modernization is how sensitive the data it accesses is. Applications that handle personally identifiable information (PII) or financial data are generally considered business-critical, and their modernization needs to be handled carefully, securely, and often urgently.

For example :

If the Discovery Agent output includes fields like (in JSON):

“database_calls”: [“DB2.EMPLOYEES”, “VSAM.SALARY”]

You can infer that it accesses salary or PII data.
We mark such applications with high business sensitivity

3. Dependency Check / Program Fan-In

Count how many other programs call this one:

“called_by”: [“REPORTGEN”, “HRMODULE”]

More incoming calls = higher business utility.

4. Usage Frequency

Already captured via:

“execution_frequency”: “daily”

Daily programs typically = operational backbone.

Business Value Scoring Formula

score = w1 * is_critical_rule + w2 * is_sensitive_data +

w3 * frequency_multiplier + w4 * dependency_count

You can normalize this to a 0–10 scale per application.

Fig 2 : Sample output Schema for Analysis agent

Step 3: Complexity Scoring – Step-by-Step

Goal:

Assign a complexity score (0–10) to each application based on its structure and technical depth. This score influences whether the app should be:

Rehosted (low complexity)
Refactored (moderate complexity)
Rewritten (very high complexity)

Formula:

Complexity Score = w1 * LOC + w2 * Cyclomatic + w3 * Copybooks + w4 * Fan-Out

Where:

w1, w2, w3, w4 are weights to control the importance of each metric
(e.g., w1 = 0.25, w2 = 0.4, w3 = 0.2, w4 = 0.15)
The raw values are normalized (0–10 scale) before applying weights

Example Thresholds for Normalization:

Metric	Normalization Logic
Lines of Code (LOC)	LOC / 10,000 (capped at 10)
Cyclomatic Complexity	complexity / 50 (capped at 10)
Copybook Count	count / 10 (capped at 10)
Fan-Out / Call Depth	depth / 5 (capped at 10)

Example Calculation

Content of the program.json file :

{

“lines_of_code”: 3840,

“cyclomatic_complexity”: 46,

“copybook_count”: 4,

“fan_out”: 3

}

Normalize:

LOC: 3840 / 10000 ≈ 0.384
Cyclomatic: 46 / 50 ≈ 0.92
Copybooks: 4 / 10 = 0.4
Fan-out: 3 / 5 = 0.6

Apply weights (w1=0.25, w2=0.4, w3=0.2, w4=0.15):

Complexity Score = 0.25 * 0.384 + 0.4 * 0.92 + 0.2 * 0.4 + 0.15 * 0.6

= 0.096 + 0.368 + 0.08 + 0.09 = 0.634 → Scaled to 6.34 / 10

So, the app has a moderate-to-high complexity → may be tagged as “Refactor”.

Output:

Fig 3 : Sample output Schema

Step 4 : Application Ranking & Tagging

Goal:

The Analysis Agent ranks and tags each application into one of four modernization categories:

Modernize Now
Refactor
Rehost
Retire

This classification simplifies decision-making and enables downstream agents to focus only on the most important applications.

Tagging Logic:

Each app is scored on a combination of:

Business Value (0–10)
Complexity Score (0–10)
Cost Estimate (numeric or scaled)
Execution Frequency (e.g., daily, monthly)

Then it is tagged using business rules like:

Condition	Tag
High value (>7), high cost, frequent use, moderate complexity	Modernize Now
High value, high complexity (>6), moderate cost	Refactor
Moderate value (3–7), low complexity, high cost	Rehost
Low value (≤2), low frequency, old last modified, low complexity	Retire

Final Output Schema:

Fig 4 : Sample output Schema

For another program:

Fig 5 : Sample output Schema

Fig 2 : Architecture of Analysis Agent

Agent 3 – Roadmap Generator

Goal:

To generate custom modernization roadmaps for only those applications that are:

Costly
Complex
Business-critical

Step 1: Filter Only Modernize-Worthy Apps

Apply filters like:

if modernization_score > 7.5 and business_value > 8.0:

shortlist_for_roadmap = True

Step 2: Group Apps by Recommendation

Create separate treatment paths for:

Recommendation	Strategy
refactor	Code transformation with logic retention
rehost	Lift-and-shift to cloud or containers
rewrite	Full redesign using modern stack
retire	Document and decommission safely

Step 3: Fetch Source Artifacts

Pull relevant code artifacts (COBOL, JCL, copybooks)
Access associated database schemas, e.g.:
- DB2 DDL scripts
- VSAM definitions
- IMS hierarchical schemas

These can be extracted from earlier Discovery Agent stages and stored in a local folder or vector store.

Step 4: Load into Vector DB (for RAG)

Use a vector database (e.g. FAISS, ChromaDB) to store:

Code snippets (functions, modules)
Business rules (from COBOL)
JCL job steps
Database structures (table schemas, keys, constraints)

Step 5: Prompt the LLM via RAG

The prompt now includes:

You are a COBOL modernization expert.
Here is the parsed code of PAYROLL001 and its related DB2 schema.
This app is marked for refactor (Score: 9.1).
Based on the dependency depth, MIPS cost, and code structure, generate a phased roadmap to refactor this application safely.

RAG will enrich the prompt with:

Extracted business rules (IF EMP.TYPE = “P” THEN…)
Schema relationships (foreign keys, VSAM file mappings)
JCL job dependencies

Step 6: Output a Detailed Roadmap (in JSON / Markdown)

Fig : Architecture of Roadmap Generator Agent

Agent 4- Modernization Executor

Goal

Automate the transformation, code generation, and testing process using the roadmap + access to actual source code and databases.

1. From Discovery Agent Outputs

All the code assets were already extracted by the Discovery Agent using tools like:

Source	Example Output	Extraction Method
COBOL programs	SYS1.COBOL.SOURCE	via FTP / IND$FILE / z/OSMF
JCL scripts	SYS1.JCL.LIB	via FTP or IND$FILE
Copybooks	SYS1.CPYLIB	via FTP or IND$FILE
DB schemas	SYS1.DBDLIB, VSAM mappings	via IDCAMS, parsed manually

These files are now stored in your PC or cloud VM, usually in:

A local directory structure:
./discovered_assets/COBOL/, ./discovered_assets/JCL/
OR a metadata database (like SQLite/NoSQL)
OR vector store (like FAISS/ChromaDB for RAG)

2. Code Understanding via LLM (Contextual RAG):

Feed both roadmap + code context into the LLM using a Retrieval-Augmented Generation (RAG) approach.

Use a hybrid prompt:

For example :

“Based on the roadmap for PAYROLL001, refactor the COBOL logic to a Java microservice using REST architecture. Here’s the original code + copybooks.”

LLM reads:

Legacy COBOL logic (with copybooks resolved)
Data source references (VSAM, DB2)
JCL entry point for batch or transactional execution

This step helps the model understand what transformation is required.

3. Code Generation

LLM generates:

Java/Python/C# source code
API controllers (if modernized into microservices)
Entity classes or DTOs mapped from copybook fields
DB Access layer (e.g., using JPA/Hibernate for DB2)

4. Auto-Test Creation

For each generated module:

Generate test cases (e.g., JUnit, PyTest)
Include edge-case tests (e.g., null checks, max length fields)
Simulate integration with dummy DB or mock endpoints

5. Run & Validate in Sandbox

Use CI/CD runner (e.g., GitHub Actions + Docker) to:

Compile code
Run unit tests
Log results

6. Failure Handling and Feedback Loop

If test fails:

Capture test logs
Prompt LLM:
For example :

“Fix this error in method calculateBonus() — java.lang.NullPointerException”
LLM refactors and re-tests until test passes.

Technologies Used:

LLM: Claude, GPT-4, StarCoder, Deepseek R1
Code interpreter: OpenAI Python tool / Codeium
Vector DB: FAISS / Chroma
CI/CD: GitHub Actions, Docker, PyTest/JUnit
LangChain: For chaining prompts and transformation logic
CrewAI / LangGraph: For orchestrating code generation, testing, and debugging agents
Dust / AutoGen / SemanticRouter: For task routing and LLM-test loop feedback
GitHub Copilot / VS Code API: For embedding suggestions and tests directly in IDE

Conclusion:

During the first half of this project, I focused on understanding the fundamental architecture and operational workflow of mainframe systems. I extensively studied how legacy applications are structured (e.g., COBOL, JCL, VSAM), how they interact with datasets and transaction managers like CICS, and the types of metadata and logs available within the mainframe environment.

Building on this understanding, I designed a multi-agent architecture for automating mainframe modernization using AI. This architecture includes four AI agents: Discovery, Analysis, Roadmap Generation, and Modernization Execution—each targeting a specific phase of the modernization lifecycle. These designs were inspired by a combination of existing mainframe documentation and modern AI-driven automation strategies, especially leveraging LLMs, RAG pipelines, and vector databases for intelligent transformation.

So far, my work has been conceptual and architectural. The second phase of the project will focus on implementing and validating this architecture through prototypes, automation workflows, and proof-of-concept testing and also developing this architecture .

The goal is to move from theoretical design to a demonstrable system that automates code understanding, transformation, and testing—ultimately optimizing mainframe costs through targeted modernization.

References :

https://www.geeksforgeeks.org/computer-networks/file-transfer-protocol-ftp-in-application-layer/

https://docs.zowe.org/stable/web_help/index.html?p=zowe_zos-files_download_data-set

https://stackoverflow.com/questions/61790869/how-to-use-python-to-interface-with-z-os-services-to-automate-manual-processes

https://www.geeksforgeeks.org/dbms/job-control-language-jcl-utilities/

https://www.mainframestechhelp.com/utilities/idcams/listcat.htm

https://www.ibm.com/docs/en/zos-basic-skills?topic=utilities-idcams-use-access-method-services-catalogs

https://www.ibm.com/docs/en/zos/2.4.0?topic=guide-using-zosmf-rest-services

https://www.ibm.com/docs/en/zos/3.1.0?topic=3-data-set-list-utility-option-34

https://wearecommunity.io/communities/india-devtestsecops-community/articles/1243

2022 Annual Report

2022 Goals

Keynote from Open Mainframe Summit

Zowe Wins DevOps Award

COBOL Check

GenevaERS: More Than Just A Reporting Solution

Mentorship in Open Source

Open Mainframe Summit Playlist

COBOL Programming Course T-Shirt

“I Am A Mainframer” Podcast

Increased Security and Hands-on Training

Making our Strong Community Stronger

Mentorship Series: Automating Mainframe Modernization Using AI Agents to Reduce Costs by A.Vijay Aditya

Mainframe application discovery and Analysis:

Agent 1 – Discovery Agent:

1. Connect to Mainframe

2. Use mainframe utilities to list datasets and discover applications:

3. Methods to extract the Data:

4. Parsing & Preprocessing:

Agent 2 – Analysis Agent:

Step 1: Source Code Metrics

Step 2 : Business Value Estimation

What the Agent Looks At:

1. Business Rule Mining from COBOL

Example Insight: IF EMPLOYEE.TYPE = ‘P’ THEN CALCULATE_BONUS (Means the app handles payroll bonuses → High business value)

2. Sensitive Dataset Detection:

3. Dependency Check / Program Fan-In

4. Usage Frequency

Step 3: Complexity Scoring – Step-by-Step

Goal:

Formula:

Output:

Step 4 : Application Ranking & Tagging

Goal:

Tagging Logic:

Final Output Schema:

Agent 3 – Roadmap Generator

Goal:

Step 1: Filter Only Modernize-Worthy Apps

Step 2: Group Apps by Recommendation

Step 3: Fetch Source Artifacts

Step 4: Load into Vector DB (for RAG)

Step 5: Prompt the LLM via RAG

Agent 4- Modernization Executor

Goal

1. From Discovery Agent Outputs

2. Code Understanding via LLM (Contextual RAG):

3. Code Generation

4. Auto-Test Creation

5. Run & Validate in Sandbox

6. Failure Handling and Feedback Loop

Technologies Used:

Conclusion:

SIMILAR BLOGS

Sign up for Our Newsletter

Example Insight:
IF EMPLOYEE.TYPE = ‘P’ THEN CALCULATE_BONUS
(Means the app handles payroll bonuses → High business value)