Object detection in remotely sensed satellite imagery is increasingly important for urban planning, disaster management, and environmental monitoring in smart-city settings. This manuscript presents a coherent and publication-ready account of an ontology-guided deep learning framework that integrates a lightweight YOLOv8 detector with an ontology reasoning module for semantic scene interpretation. The system is designed to detect five urban-environment classes—residences, roads, shorelines, swimming pools, and vegetation—from Sentinel-2 MSI imagery collected over the southern Durban metropolitan region of KwaZulu-Natal, South Africa. The dataset consists of 92 annotated images resized to 640 × 640 pixels, partitioned into 61 training, 21 validation, and 10 testing images, then augmented to 6,100 training, 2,100 validation, and 1,000 testing images. The visual recognition component employs a YOLOv8 architecture with a C2f-based backbone/neck design and anchor-free detection heads, while the semantic layer uses RDF/OWL concepts queried through SPARQL to represent hierarchical class relations, object adjacency, and interpretable scene semantics. On the proposed dataset, the YOLOv8 model attains 68% precision, 60% recall, 43% mAP@50, and 17.5% mAP@50–95, with the highest class-specific precision observed for swimming pools (62.7%) and the highest class-specific mAP@50 for shorelines (99.5%). The ontology remains lightweight and scalable, with a maximum depth of inheritance of 3 and a maximum number of children of 4, enabling efficient reasoning with low computational demand. By combining object detection with structured semantic inference, the framework provides an interpretable analytical layer for smart-city land-cover understanding, disaster-aware urban monitoring, and knowledge-driven scene analysis.