Deploying Digital Twins at City Scale

Linker Vision
Aug 21
4 min read

Updated: Aug 25

Digital twins have long been a powerful concept, but they’ve been hard to scale. Traditional approaches often require months of modeling and deep domain knowledge just to set up a basic environment. For most cities, that meant digital twins stayed locked in pilot projects.

Today, we can do things differently. By leveraging aerial imagery and satellite data, Linker Vision enables cities to build digital twins in a fraction of the time and cost. No IoT sensor networks. No ground remapping. Just the data cities already have.

From Months to Days: Accelerating Digital Twin Creation

Our platform uses Unreal Engine and NVIDIA Omniverse™ as the foundation for rapid, photorealistic 3D environment generation. Through seamless cooperation with our partners, we have high-resolution aerial imagery automatically processed through several precise stages, including 2D satellite image capture, 3D terrain generation using DSM/DTM, and reconstruction of 3D buildings with detailed architectural styles and landmark features. This structured pipeline ensures geographic accuracy and faithful reconstruction, without requiring cities to integrate multiple tools or manage disparate systems.

NVIDIA Omniverse excels in simulation fidelity, geospatial coherence, and real-time collaboration across teams. Unreal Engine complements it with ultra-realistic rendering and performance-optimized visualization for large-scale urban scenes. With NVIDIA Omniverse Converter, we ingest processed data into a simulation-ready environment aligned with real-world coordinates. Then, using NVIDIA Omniverse and Unreal Engine, we build high-resolution digital twins that accurately reflect the city’s terrain, infrastructure, and visual identity.

This combination dramatically reduces setup time, allowing us to deliver city-scale models in days rather than months, and making digital twins more accessible to city stakeholders.

A section (4 km²) of the Kaohsiung City digital twin

A Living Interface for Urban Management

A digital twin functions as a unified interface for citywide operations. Leveraging our expertise in vision AI and vision-language models (VLMs), we integrate real-time streams from diverse sources such as CCTV networks, drones, robot dogs, and bus-mounted cameras. These live feeds are analyzed, interpreted, and translated into meaningful insights, then visualized directly within the digital twin.

By projecting events into a spatially grounded context, we enable city personnel to understand what’s happening, where it’s happening, and how best to respond. This tight feedback loop shortens decision time and turns raw data into rapid action.

Additionally, with an appropriate data ingestion layer, external systems, such as traffic control, environmental sensors, or utility networks, could be synchronized with the 3D environment, allowing events and conditions to dynamically update or influence the model over time.

Furthermore, the digital twin could evolve into a platform that also supports predictive insights. Coupling the twin with simulation engines or AI modules opens up possibilities for forecasting potential future scenarios, including disaster response, infrastructure stress testing, or mobility planning, enabling cities to explore what-if situations before they unfold.

Breaking Silos with Shared Visual Intelligence

In many cities, digital twins and camera networks are owned by different bureaus, and each department monitors only its own feeds: Transportation focuses on traffic, Water on floods, and so on. Meanwhile, conventional vision models embedded in these systems are limited to detecting pre-defined events, offering little flexibility for more nuanced or cross-cutting urban situations.

Vision-Language Models (VLMs) change this dynamic. By interpreting the meaning and context of complex visual scenes, VLMs can detect and reason about multiple types of incidents within a single frame. For example, a camera feed from a riverside road could reveal both flooding and traffic disruption, triggering alerts for the Water Bureau, Police Department, and Transportation Bureau simultaneously. Each alert is enriched with semantic understanding: event summary, location, and assessed severity.

This semantic reasoning capability allows VLMs to fill the gap left by traditional models, empowering cities to break down operational silos and respond more quickly and effectively to evolving situations.

VLM inference results are visualized in the digital twin for real-time monitoring of urban hazards

Training AI for City Readiness: Synthetic Data from Digital Twins

Beyond real-time operations, the spatial coherence and photorealism of digital twins make them a powerful foundation for training AI models that must interpret complex urban scenarios.

We generate synthetic data by adjusting various parameters within our virtual environments. While tools like NVIDIA Omniverse Replicator support parts of this workflow, we extend the process using domain-specific simulation tools to model edge-case conditions, like atypical vehicle behavior or complex traffic interactions, where precise environmental context and realistic object or vehicle behavior are essential for realism and learning value.

To further enhance the diversity of our synthetic datasets, we integrate NVIDIA Cosmos™ into our generation pipeline. After creating base environments and agent behaviors, NVIDIA Cosmos enables us to apply natural-language prompts to vary conditions like lighting, weather, and scene composition. This allows us to efficiently generate multiple versions of the same scenario, helping AI models generalize better across edge cases without extensive manual tuning.

This boosts AI model performance without relying on costly manual data collection. Photorealistic renderings and physically based simulation engines ensure that the data reflects real-world physics and behavior. It’s a scalable way to prepare AI agents for rare but critical incidents.

Synthetic data generated via digital twin simulation to train AI models for real-world incidents

Forging Resilient and Responsive Urban Futures

By turning static infrastructure into dynamic, vision-powered digital twins, cities gain a new level of situational awareness and responsiveness. We’re enabling faster, more coordinated action in the face of real-world challenges.

This is how we move from pilot projects to scalable, citywide AI. From fragmented views to unified action. From passive monitoring to adaptive governance.

Looking ahead, these digital foundations can unlock a new era of intelligent urban operations—from predictive maintenance and crowd-aware mobility planning to synthetic training environments for next-gen AI agents. As data grows and urban systems become more complex, digital twins will serve as the connective tissue between sensing, decision-making, and action—making cities not only smarter, but truly adaptive to the needs of their people.

▶ Contact us to explore how digital twins can benefit your industry:

https://www.linkervision.com/sales

▶ See how NVIDIA spotlights our city-scale digital twin deployment:

https://www.nvidia.com/en-us/customer-stories/linker-vision-ai-smart-city-solutions/