Open Mainframe Project Summer Mentorship Series: Midterm Updates – At this midpoint, our selected mentees are reporting in. Below, you’ll learn what they’ve built, the challenges they’ve overcome, and their goals for the rest of the summer. We’re proud of every contribution and eager to see what comes next. Hear from Advith Krishnan, SRM Institute of Science and Technology, Potheri below.
In large organizations, the most valuable business insights are often the hardest to access, since they lie buried deep within large datastores, mainframe archives, siloed log systems, or trapped in schema-heavy SQL tables. The truth about how systems behave, when they break, and why they recover can be used to gain several infrastructural insights that can aid with preemptive problem identification, optimizing company maintenance timelines, understanding data trends, etc. The challenge? Making sense of those vast amounts of data without the simplicity of natural language summaries. That’s the problem we’re taking on with the RAG to Riches project.
Hi! I’m Advith Krishnan, an AI engineer and researcher passionate about building complex systems from the ground up. This summer, as part of the Open Mainframe Project’s Modernization Working Group, I’m designing a Retrieval-Augmented Generation (RAG) framework that can pull structured, semi-structured, and unstructured data from both legacy mainframe systems and modern databases, and answer high-context natural language questions like:
- “Why did our login services degrade last quarter?”
- “Which user sessions are at risk based on last year’s I/O trend?”
- “What caused last Friday’s IMS downtime, and how did the fallback system respond?”
Project Goals
The goal of the RAG to Riches project is to build an LLM-powered, connector-based analytics framework that sits on top of organizational data silos (especially mainframes, SQL backends, time-series systems, and logs) and converts them into a dynamic knowledge base. When a user asks a question, the framework should know what kind of question it is, identify whether the answer lies in historical logs, recent metrics, or previously cached knowledge, and critically respect access control boundaries at all stages. Some key necessities for the framework are as follows:
- Perform scalable, low-latency retrieval across legacy and modern systems using a modular, connector-based architecture.
- Support structured and semantic queries via a centralized planner, while enforcing fine-grained RBAC for data isolation.
- Design pipelines for chunking and tagging data for efficient storage and relevance, feeding into a unified vector store that enables fast, metadata-aware search at scale.
- Handle queries about infrastructure level insights, root cause analysis, and trends analysis (in general, anything related to a system), and centralize them through a natural language summary.
Progress up to Mid-Term
My mentorship officially kicked off on May 30, 2025, under the guidance of my mentor, Dr. Vinu Russell Viswasadhas. We dedicated the mid-term phase to system architecture ideation. In the first week of June, we reviewed existing methods to solve our problems and identified limitations with them. The following two weeks, we discussed, debated, and designed our framework’s architecture, identifying several key considerations for the same. Some of the critical design discussions that emerged are as follows:
- Who is this system for?
While originally envisioned for sysadmins and infrastructure architects, we realized its scope could be much broader. Analysts, compliance officers, and even support teams can find a use case for this framework. This required us to rethink how we could generalize intent across user types and prompt types, leading to the need for a query understanding layer and a query planning layer.
- How do you retrieve massive data slices from mainframes?
One approach to solve this was to front-load retrieval via smart caching. We considered cache stores like RediSearch, LangChain Vector stores, etc., but we later agreed that an optimized retrieval flow paired with cache stores would make a significant improvement. That’s where our Level-1/Level-2 retrieval strategy emerged. If a similar question has been asked before by a previous user, we serve it instantly from a Vector Knowledge Graph (VKG) if the relevant context is stored. If not, we fall back to deep retrieval via connectors (at the cost of speed, but with higher specificity). We then summarize all the data used through a unified vector store, which can create/update the contextual embeddings for keeping conversational memory persistent. We then store the newly generated context for the question back into the VKG for reuse.
- How do we dynamically plan a retrieval path?
Rather than hardwiring query execution logic, we chose to design a Query Planner that can output a Direct Acyclic Graph (DAG) action flow depending on the query type, source system, and user permissions. This allows a prompt like “Find all auth errors in the past 48 hours” to be routed differently depending on whether it’s a first-time query or a repeat one, and whether the requester has permission to access auth logs or not.
We then devised a multi-layered architecture that includes:
- Query Understanding Layer (QUL) – Classifies prompt type, extracts entities, and infers intent.
- Query Planner – Plans the query execution path based on retrieval method, user access, and data type.
- RAG Engine – Handles both Level-1 (cached VKG lookup) and Level-2 (deep semantic + structured retrieval).
- Model Context Protocol (MCP) – Manages context assembly, agent triggers, tool execution, actions, etc.
- Unified Vector Store & VKG – Stores all semantically chunked and tagged data, enabling fast similarity + metadata-based queries.
- Database Connector Layer – Modular plugins to fetch and transform data from sources like DB2, IMS, Splunk, PostgreSQL, etc.
- Role-based Access Control (RBAC) Middleware – Enforces secure, scoped access at the retrieval and synthesis layers.
To simplify system behavior, we split the architecture into two general flows: prompt flow and data flow. In the prompt flow, user queries are first passed through the Query Understanding Layer to extract intent and entities, then routed by the Query Planner to determine whether to retrieve from cached knowledge (Level-1) or perform deep retrieval (Level-2). This plan is executed by the RAG Engine to query the data. The Model Context Protocol (MCP) assembles the final context and instructs the LLM to generate a response, an action, a tool execution, etc.
In the data flow, data from mainframes, databases, and logs is fetched via connectors, chunked, and tagged with metadata, and stored in a Unified Vector Store. Retrieved chunks feed the RAG engine, and relevant results are stored in the Vector Knowledge Graph (VKG) for reuse. Based on the query plan and RBAC, data flow facilitates the retrieval of necessary data in an orchestrated manner.
Around the fourth week of June, we considered different software frameworks that could be plugged into each layer of the architecture, and critically evaluated them. The technology stack is still open for discussion and finalization, but we considered some options like Rasa or fasttext for the Query Understanding Layer, Neo4j paired with Qdrant for the VKG layer, LlamaIndex + MiniLM for the RAG engine layer, and so on. With a general idea of the system architecture, we planned on commencing implementation right away.
Dr. Vinu and I have been gathering different data sources like mainframe emulators, online databases, logs, JSON data, etc., before implementation and framework testing, and we are attempting to generate a dataset for this project that will be directly used by the prompt flow. We are designing the framework under the assumption that the data connector layer is functional, allowing us to focus on language understanding, planning, and context augmentation.
Plan for Final-Term
For the second half of the mentorship, we aim to create a working prototype with at least a few data connectors. Our immediate plan is to set up the vector store, followed by implementing the RAG engine. Once that’s in place, we’ll test it against sample datasets and focus on integrating the Query Understanding and Planning layers once the RAG engine consistently delivers meaningful responses. Next, we’ll work on improving retrieval performance by experimenting with chunking strategies, optimizing vector search, and enriching metadata tagging for more precise filtering. This phase will also involve evaluating the memory efficiency of the Vector Knowledge Graph (VKG) and developing a learning loop to cache repeated queries and accelerate future retrievals.
Once the prompt and data flows are functional, we will implement the data connector layer to enable full end-to-end testing. With both flows validated, our focus will shift to tightening the system through the integration of RBAC for secure, role-based access control, and the Model Context Protocol (MCP) for orchestrating final response generation, formatting, and potential agentic actions. These components will ensure both compliance and contextual intelligence, enabling the system to adapt dynamically to different user roles, query types, and data sensitivity levels.
By breaking the implementation into clear, prioritized phases, we aim to make significant progress and deliver a system that can serve as a robust foundation for enterprise-scale retrieval and reasoning across complex infrastructure landscapes. We also plan to release a white paper by the end of the mentorship, detailing the system’s architecture, design rationale, and potential applications across hybrid legacy-modern environments.