Metadata-Version: 2.4
Name: Vid2Sim
Version: 0.1.0
Summary: Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
Author-email: Ziyang Xie <ziyangxie01@gmail.com>
License: MIT
Project-URL: Homepage, https://metadriverse.github.io/vid2sim/
Project-URL: Repository, https://github.com/Vid2Sim/Vid2Sim
Keywords: video,simulation,computer vision,deep learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Multimedia :: Video
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: torch==2.1.1
Requires-Dist: torchvision==0.16.1
Requires-Dist: numpy>=1.21.0
Requires-Dist: opencv-python>=4.5.0
Requires-Dist: matplotlib>=3.3.0
Requires-Dist: pillow>=8.0.0
Requires-Dist: tqdm>=4.60.0
Provides-Extra: dev
Requires-Dist: torch==2.1.1; extra == "dev"
Requires-Dist: torchvision==0.16.1; extra == "dev"
Requires-Dist: numpy==1.26.3; extra == "dev"
Requires-Dist: opencv-python>=4.5.0; extra == "dev"
Requires-Dist: matplotlib>=3.3.0; extra == "dev"
Requires-Dist: pillow>=8.0.0; extra == "dev"
Requires-Dist: imageio==2.37.0; extra == "dev"
Requires-Dist: plyfile==0.7.0; extra == "dev"
Requires-Dist: transformers==4.0; extra == "dev"
Requires-Dist: tqdm>=4.60.0; extra == "dev"
Requires-Dist: fused_ssim; extra == "dev"
Provides-Extra: all
Requires-Dist: Vid2Sim[dev]; extra == "all"

# 🎬 Vid2Sim 🤖: Realistic and Interactive Simulation from Video for Urban Navigation
> [Ziyang Xie](https://ziyangxie.site/), [Zhizheng Liu](https://scholar.google.com/citations?user=Asc7j9oAAAAJ&hl=en), [Zhenghao Peng](https://pengzhenghao.github.io/), [Wayne Wu](https://wywu.github.io/), [Bolei Zhou](https://boleizhou.github.io/)
>
> [![Paper](https://img.shields.io/badge/Paper-arXiv-red)](https://arxiv.org/abs/2501.06693)
> [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://metadriverse.github.io/vid2sim/)

Vid2Sim is a novel framework that converts monocular videos into photorealistic and physically interactive simulation environments for training embodied agents with minimal sim-to-real gap.

<p align="center">
  <img src="./assets/teaser.png" width="100%">
</p>


## 🚧 Installation 

```bash
# Clone the repository
git clone https://github.com/Vid2Sim/Vid2Sim.git --recursive
cd Vid2Sim

# Create a new environment
conda create -n vid2sim python=3.10
conda activate vid2sim

# Install dependencies
pip install -e .

# Install reconstruction dependencies
pip install -e submodules/vid2sim-rasterizer
pip install -e submodules/vid2sim-deva-segmentation
pip install -e submodules/simple-knn

# Install RL dependencies
pip install -r src/vid2sim_rl/requirements.txt
pip install -e submodules/ml-agents/ml-agents
[Optional] pip install -e submodules/r3m
```

## 🎥 Reconstruct the simulation envs from videos
Vid2Sim transforms monocular videos into simulation environments by reconstructing the scene geometry and appearance. The generated environments preserve real-world diversity and visual fidelity, providing minimal sim-to-real gap for agent training.


👉 To get started, follow the reconstruction guide in [vid2sim_recon](src/vid2sim_recon/README.md) to reconstruct the simulation environment from video.

## 🤖 Train the Agent in Real-to-Sim Environments

After the environment is reconstructed, Vid2Sim translates the real-to-sim environments into a interactive environment with both realistic visual appearance and physical collision to train the agent in diverse situations.

👉 To set up the environment and launch RL training, refer to [vid2sim_rl](src/vid2sim_rl/README.md). 


## 📦 Repository Structure
```
Vid2Sim/
├── data/ # Source data
├── src/
│   ├── vid2sim_recon/ # Reconstruct the simulation environment from video
│   ├── vid2sim_rl/ # Train the agent in real-to-sim environments
├── tools/ # Tools scripts
├── README.md # This file
```


## 📚 Vid2Sim Dataset

The Vid2Sim dataset includes 30 high-quality real-to-sim simulation environments reconstructed from video clips sourced from 9 web videos. Each clip includes 15 seconds of forward-facing video recorded at 30 fps, providing 450 frames per scene for environment reconstruction and simulation.  

We provide the source [video data](https://drive.google.com/drive/folders/1jGmKxZL6hKvjCg6qhM9wmW1_HjMwCUGa?usp=sharing), and [interactive Unity environments](https://drive.google.com/drive/folders/1LCruqb6M3mCgsjaqI1ON6WVoZ-9CmQDY?usp=sharing) for agent training.

<p align="center">
  <img src="./assets/dataset.png" width="100%">
</p>

## Citation 📝

If you find this work useful in your research, please consider citing:

```bibtex
@article{xie2024vid2sim,
  title={Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation},
  author={Xie, Ziyang and Liu, Zhizheng and Peng, Zhenghao and Wu, Wayne and Zhou, Bolei},
  journal={CVPR},
  year={2025}
}
```
