In this paper we present a novel framework for simultaneous detection of click action and estimation of occluded fingertip positions from egocentric viewed single-depth image sequences. For the detection and estimation, a novel probabilistic inference based on knowledge priors of clicking motion and clicked position is presented. Based on the detection and estimation results, we were able to achieve a fine resolution level of a bare hand-based interaction with virtual objects in egocentric viewpoint. The proposed method delivers promising performance under frequent self-occlusions in the process of selecting objects in AR/VR space whilst wearing an egocentric-depth camera-attached HMD.