# monster **Repository Path**: code_reading/monster ## Basic Information - **Project Name**: monster - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-29 - **Last Updated**: 2025-05-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # π MonSter (CVPR 2025 Highlight) π
video demo
## :art: Zero-shot performance on the wild captured stereo images

Zero-shot generalization performance on our captured stereo images.
## π Benchmarks performance


Comparisons with state-of-the-art stereo methods across five of the most widely used benchmarks.
## βοΈ Installation
* NVIDIA RTX 3090
* python 3.8
### β³ Create a virtual environment and activate it.
```Shell
conda create -n monster python=3.8
conda activate monster
```
### π¬ Dependencies
```Shell
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install tqdm
pip install scipy
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib
pip install timm==0.6.13
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1/index.html
pip install accelerate==1.0.1
pip install gradio_imageslider
pip install gradio==4.29.0
```
## βοΈ Required Data
* [SceneFlow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)
* [KITTI](https://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo)
* [ETH3D](https://www.eth3d.net/datasets)
* [Middlebury](https://vision.middlebury.edu/stereo/submit3/)
* [TartanAir](https://github.com/castacks/tartanair_tools)
* [CREStereo Dataset](https://github.com/megvii-research/CREStereo)
* [FallingThings](https://research.nvidia.com/publication/2018-06_falling-things-synthetic-dataset-3d-object-detection-and-pose-estimation)
* [InStereo2K](https://github.com/YuhuaXu/StereoDataset)
* [Sintel Stereo](http://sintel.is.tue.mpg.de/stereo)
* [HR-VS](https://drive.google.com/file/d/1SgEIrH_IQTKJOToUwR1rx4-237sThUqX/view)
## βοΈ Model weights
| Model | Link |
|:----:|:-------------------------------------------------------------------------------------------------:|
| KITTI (one model for both 2012 and 2015)| [Download π€](https://huggingface.co/cjd24/MonSter/resolve/main/kitti.pth?download=true) |
| Middlebury | [Download π€](https://huggingface.co/cjd24/MonSter/resolve/main/middlebury.pth?download=true)|
|ETH3D | [Download π€](https://huggingface.co/cjd24/MonSter/resolve/main/eth3d.pth?download=true)|
|sceneflow | [Download π€](https://huggingface.co/cjd24/MonSter/resolve/main/sceneflow.pth?download=true)|
|mix_all (mix of all datasets) | [Download π€](https://huggingface.co/cjd24/MonSter/resolve/main/mix_all.pth?download=true)|
The mix_all model is trained on all the datasets mentioned above, which has the best performance on zero-shot generalization.
## βοΈ Evaluation
To evaluate the zero-shot performance of MonSter on Scene Flow, KITTI, ETH3D, vkitti, DrivingStereo, or Middlebury, run
```Shell
python evaluate_stereo.py --restore_ckpt ./pretrained/sceneflow.pth --dataset *(select one of ["eth3d", "kitti", "sceneflow", "vkitti", "driving"])
```
or use the model trained on all datasets, which is better for zero-shot generalization.
```Shell
python evaluate_stereo.py --restore_ckpt ./pretrained/mix_all.pth --dataset *(select one of ["eth3d", "kitti", "sceneflow", "vkitti", "driving"])
```
## βοΈ Submission
For MonSter submission to the KITTI benchmark, run
```Shell
python save_disp.py
```
For MonSter submission to the Middlebury benchmark, run
```Shell
python save_pfm.py
```
For MonSter submission to the ETH3D benchmark, run
```Shell
python save_pfm_eth.py
```
## βοΈ Training
To train MonSter on Scene Flow or KITTI or ETH3D or Middlebury, run
```Shell
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch train_kitti.py (for KITTI)
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch train_eth3d.py (for ETH3D)
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch train_sceneflow.py (for Scene Flow)
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch train_middlebury.py (for Middlebury)
```
## π Star History