We present an image-based reconstruction framework to model real water scenes captured by stereoscopic video. In contrast to many image-based modeling techniques that rely on user interaction to obtain high-quality 3D models, we instead apply automatically calculated physically-based constraints to refine the initial model. The combination of image-based reconstruction with physically-based simulation allows us to model complex and dynamic objects such as fluid. Using a depth map sequence as initial conditions, we use a physically based approach that automatically fills in missing regions, removes outliers, and refines the geometric shape so that the final 3D model is consistent to both the input video data and the laws of physics. Physically-guided modeling also makes interpolation or extrapolation in the space-time domain possible, and even allows the fusion of depth maps that were taken at different times or viewpoints. We demonstrated the effectiveness of our framework with a number of real scenes, all captured using only a single pair of cameras.