The cyber resilience of cyber-physical power grids relies on swift restoration of cyber domain components following major disturbances such as, natural disasters or man-made attacks. The cyber domain restoration problem is inherently stochastic due to uncertainties surrounding initial outage conditions and restoration action failures. Traditionally, optimization-based methods, such as heuristics and mixed-integer linear programming (MILP), are utilized for solving restoration problems. However, these methods suffer from time-consuming processes and limited adaptability to dynamic conditions. To address these challenges, this paper formulates the observability recovery problem (ORP) as a Markov decision process and uses deep reinforcement learning (DRL) to solve the problem. Numerical simulations on the IEEE 30-bus system demonstrate that our proposed approach outperforms the heuristic approach in terms of both performance and computational efficiency. Moreover, when compared to the MILP approach, our method achieves comparable performance while requiring significantly less computation time.