conda info -e conda activate busco_env conda deactivate # To display all available datasets busco --list-datasets # Running BUSCO busco -m genome -i INPUT.nucleotides -o OUTPUT -l LINEAGE -c 20 # Genome mode: assessing a genome assembly busco -m protein -i INPUT.amino_acids -o OUTPUT -l LINEAGE -c 20 # Protein mode: assessing a gene set busco -m transcriptome -i INPUT.nucleotides -o OUTPUT -l LINEAGE -c 20 # Transcriptome mode: assessing assembled transcripts # 重要参数 -i defines the input file to analyse which is either a nucleotide fasta file or a protein fasta file, depending on the BUSCO mode. As of v5.1.0 the input argument can now also be a directory containing fasta files to run in batch mode. -o defines the folder that will contain all results, logs, and intermediate data -m sets the assessment MODE: genome, proteins, transcriptome -l It can be a dataset name, i.e. bacteria_odb10, or /path/to/bacteria_odb10. In the former case, which is the recommended usage, BUSCO will automatically download and version the corresponding dataset. In the latter case, the dataset found in the given path will be used. Generally the lineage to select for your assessments should be the most specific lineage available, e.g. for assessing fish data one would select the *actinopterygii* lineage rather than the *metazoa* lineage. BUSCO运行时会自动下载指定的数据集,如果下载较慢,可从 https://busco-data.ezlab.org/v5/data/lineages/ 手动下载,调用时指定路径。 -c Specify the number of threads to use.
# 赋予运行权限 chmod 755 summary_BUSCO_results.py # Place all BUSCO short summary files (short_summary.specific.dataset.label.txt) in a single folder. # Recommended usage find . -name 'short_summary.specific*txt' | summary_BUSCO_results.py - > busco_statistics.tsv
# 运行generate_plot.py python3 generate_plot.py -wd [WORKING_DIRECTORY] [OTHER OPTIONS] # required arguments: -wd Define the location of your working directory --no_r To avoid to run R. It will just create the R script file in the working directory