Accurate surgical video semantic segmentation is vital for computer-aided surgery. Semi-supervised algorithms produce pseudo labels to solve the problem of the lack of labels, as it is very difficult to obtain the pixel-level segmentation labels from doctors or researchers. However, most of the algorithms consider the videos as independent images, which cannot solve some issues caused by complex surgery scenarios, such as blurred instruments. The paper proposes a novel Cross Supervision of Inter-frame (CSI) method using inter-frame information from surgery video to crosswise supervise semantic segmentation. Specifically, we design Inter-frame Information Transformation (I2T) modules to transfer features with class prototypes between continuous frames mutually. Besides, we utilize ground truth to supervise inter-frame features for labeled frames, and for unlabeled frames, we propose a cross pseudo loss and a pixel-wise contrastive loss as the constraints. Extensive experiments are performed on a publicly available cataract surgery dataset, which proves that our CSI method improves the segmentation accuracy after considering the inter-frame information.