With rapid advances in artificial intelligence, this study develops an AI-driven personalized dance training model to address two persistent limitations of traditional instruction: low movement-recognition accuracy and weak support for individualized guidance. The method employs a dual-branch, twin supervised-learning framework to convert 2D pose information into 3D skeletal keypoints, and enhances the ST-GCN architecture by introducing a spatio-temporal attention mechanism to strengthen feature extraction across both space and time. A custom dataset is constructed from 3,500 images sampled from concert and dance videos, covering six movement categories including waist crossing, high lifting, single-arm extension, waving, double-arm extension, and walking. Experimental results show that the improved ST-GCN achieves 93.63% recognition accuracy on the test set—about 14 percentage points higher than a conventional residual network baseline. After incorporating spatio-temporal attention, the top-1 performance reaches 86.66%, exceeding the original ST-GCN by 5.63 percentage points. Overall, the proposed model demonstrates robustness to occlusion and viewpoint variation, substantially improves recognition performance, and offers technical support for personalized dance training as well as dance-related health management applications.