We present a system that allows the user to virtually try on new clothes. It uses a single commodity depth camera to capture the user in 3D. Both the pose and the shape of the user are estimated with a novel real-time template-based approach that performs tracking and shape adaptation jointly. The result is then used to drive realistic cloth simulation, in which the synthesized clothes are overlayed on the input image. The main challenge is to handle missing data and pose ambiguities due to the monocular setup, which captures less than 50 percent of the full body. Our solution is to incorporate automatic shape adaptation and novel constraints in pose tracking. The effectiveness of our system is demonstrated with a number of examples.