Data Engineering–Driven World Models for Consequence-Aware Agentic Systems





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 31
File Size 457.70 KB
File Count 1
Create Date 26/01/2026
Last Updated 26/01/2026

Download

Description

Data Engineering–Driven World Models for Consequence-Aware Agentic Systems

Brahma Reddy Katam

Technical Lead, Data Engineering and Advanced Computing

Abstract: Large Language Models (LLMs) have enabled a new generation of intelligent agents capable of generating queries, automating workflows, and assisting data engineering tasks through natural language interaction. Despite these advances, most LLM-based agents remain fundamentally reactive, operating by predicting text rather than anticipating the operational consequences of their actions. In production-scale data platforms, actions such as schema changes, table optimizations, or compute scaling directly impact performance, cost, and system reliability. Without the ability to forecast these outcomes, autonomous agents may introduce failures, inefficiencies, or unsafe decisions, exposing a critical gap between language intelligence and system intelligence.

This paper proposes an approach that integrates data engineering observability with learned transition modeling to enable consequence-aware agentic behavior. We introduce Data Engineering–Driven World Models, where agents learn state-transition behavior of data platforms using historical telemetry, system metrics, and action–outcome logs. Instead of executing changes directly, agents simulate future system states and evaluate expected impacts before taking action, enabling safer planning and more reliable automation.

To operationalize this concept, we present the Data System Digital Twin (DSDT) architecture, which combines observability pipelines, structured state encoding, machine learning–based world models, and planning modules with LLM interfaces. The framework continuously captures runtime and cost signals, learns system dynamics, and selects optimal actions through simulation-based reasoning. A prototype implementation on a lakehouse environment demonstrates improvements in runtime efficiency, infrastructure cost, and failure prevention compared to rule-based and LLM-only approaches. This work shows that combining world models with strong data engineering foundations provides a practical pathway toward safe, self-optimizing, and autonomous data platforms.

Keywords

Agentic AI, World Models, Data Engineering, Autonomous Systems, Data System Digital Twin, Predictive Modeling, Intelligent Agents, Lakehouse Optimization, Consequence-Aware Planning, Data Platform Automation

Data Engineering–Driven World Models for Consequence-Aware Agentic Systems

Data Engineering–Driven World Models for Consequence-Aware Agentic Systems

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

Data Engineering–Driven World Models for Consequence-Aware Agentic Systems

Data Engineering–Driven World Models for Consequence-Aware Agentic Systems

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us