Quick Start
To quickly get started and better understand each step of the Cisformer workflow, we recommend downloading the test files from the Cisformer GitHub repository and following the examples below.
Generate Default Config Files
cisformer generate_default_config
This command creates a cisformer_config folder in the current directory, containing:
accelerate_config.yamlatac2rna_config.yamlrna2atac_config.yaml
Cisformer uses Hugging Face Accelerate for distributed training. You may need to modify accelerate_config.yaml based on your environment. See the Accelerate launch guide for more details.
RNA ➝ ATAC
1. Configure Parameters (Optional)
Edit the RNA2ATAC configuration:
cisformer_config/rna2atac_config.yaml
It is recommanded to modify the parameter multiple to at least 40 to get a good performance.
Refer to the Usage page for detailed explanations of each parameter.
2. Preprocess Data
Cisformer expects input scRNA-seq and scATAC-seq data in Scanpy .h5ad format.
cisformer data_preprocess -r test_data/rna.h5ad -a test_data/atac.h5ad -s preprocessed_dataset
-r: RNA.h5adfile-a: ATAC.h5adfile-s: output directory
See the Usage page for additional options and output details.
3. Train the Model
cisformer rna2atac_train -t preprocessed_dataset/cisformer_rna2atac_train_dataset -v preprocessed_dataset/cisformer_rna2atac_val_dataset -n rna2atac_test
-t: training dataset path-v: validation dataset path-n: project name
A save directory will be created (can be customized with -s). Inside, a subdirectory like 2025-05-12_rna2atac_test will contain the trained models. The model from the final epoch is typically the best.
Refer to the Usage page for more training options and output details.
4. Run Prediction
cisformer rna2atac_predict -r preprocessed_dataset/test_rna.h5ad -m save/2025-05-12_rna2atac_test/epoch34/pytorch_model.bin
-r: RNA.h5adfile for prediction-m: trained model checkpoint
The predicted ATAC matrix will be saved to output/cisformer_predicted_atac.h5ad.
Refer to the Usage page for more training options and output details.
ATAC ➝ RNA
1. Configure Parameters (Optional)
Edit the ATAC2RNA configuration:
cisformer_config/atac2rna_config.yaml
Refer to the Usage page for parameter details.
2. Preprocess Data
cisformer data_preprocess -r test_data/rna.h5ad -a test_data/atac.h5ad -s preprocessed_dataset --atac2rna
--atac2rna: specifies the ATAC2RNA directionOther arguments are the same as in RNA2ATAC
See the Usage page for additional options and output details.
3. Train the Model
cisformer atac2rna_train -d preprocessed_dataset/cisformer_atac2rna_train_dataset -n atac2rna_test
-d: training dataset path-n: project name
A subdirectory like 2025-05-12_atac2rna_test will be created under save, containing the trained model.
Refer to the Usage page for more training options and output details.
4. Run Prediction (Optional)
cisformer atac2rna_predict -d preprocessed_dataset/cisformer_atac2rna_test_dataset/atac2rna_0.pt -m save/2025-05-12_atac2rna_test/epoch30/pytorch_model.bin
-d:.pttest dataset-m: trained model checkpoint
The predicted RNA matrix will be saved to output/cisformer_predicted_rna.h5ad.
Refer to the Usage page for more training options and output details.
5. Link Cis-Regulatory Elements to Genes
cisformer atac2rna_link -d preprocessed_dataset/cisformer_atac2rna_test_dataset/atac2rna_0.pt -m save/2025-05-12_atac2rna_test/epoch30/pytorch_model.bin -c test_data/celltype_info.tsv
-d: test dataset.ptfile (must be accompanied bycell_info.tsvin the same folder)-m: trained model-c: TSV file mapping cell barcodes to cell types (no header)
Example celltype_info.tsv:
GTACCGGGTATACTGG-1 CD14 Mono
ACTGAATGTCACCAAA-1 cDC2
AACCTTGCAAACTGTT-1 CD14 Mono
...
Linked cis-regulatory and gene information will be saved in output/cisformer_link/ as .h5ad files, organized by cell type.
Refer to the Usage page for more training options and output details.