安装#
环境准备#
确保显卡驱动正确安装即可,例如在 NVIDIA GPU 设备上,nvidia-smi 的 Driver Version 需要大于 550.127.08
XTuner 安装#
git clone https://github.com/InternLM/xtuner.git
cd xtuner
pip install -e .
不同的任务 xtuner 依赖会有所不同,比如想训练 gpt-oss 模型,需要强制安装 torch2.8。
训练 MoE 模型建议额外安装 GroupedGEMM。
pip install git+https://github.com/InternLM/GroupedGEMM.git@main
如果需要训练 FP8 MoE 模型,除了安装上述 GroupedGEMM 外需要额外安装 AdaptiveGEMM。
pip install git+https://github.com/InternLM/AdaptiveGEMM.git@main
小技巧
对于 Hopper 架构的 GPU,可以额外安装 fa3,并通过 export XTUNER_USE_FA3=1 启用 fa3
此外,XTuner 推荐安装 flash-attn,RL 推荐安装 flash-attn-3,能够显著提升训练速度。可以参考官方文档进行安装。
如果想抢先体验 RL 相关功能,则需要执行下述命令来安装 RL 部分依赖。除此之外,需要安装你选择的推理引擎。以LMDeploy为例,可参考官网文档进行安装。
pip install -r requirements/rl.txt
# 或者直接安装
# pip install -e '.[rl]'
XTuner 校验#
LLM 大模型微调#
单卡启动一次简单的 LLM 微调任务,验证安装是否成功:
备注
运行出现问题?不然看看 常见问题
1torchrun xtuner/v1/train/cli/sft.py --model-cfg examples/v1/config/sft_qwen3_tiny.py --chat_template qwen3 --dataset tests/resource/openai_sft.jsonl
运行成功后,日志如下所示
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 1/13 data_time: 0.0091 lr: 0.000060 time: 0.6024 text_tokens: 4054.0 total_loss: 12.075, reduced_llm_loss: 12.075 max_memory: 5.36 GB reserved_memory: 7.05 GB grad_norm: 15.847 tgs: 6729.9 e2e_tgs: 6630.0
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 2/13 data_time: 0.0065 lr: 0.000060 time: 0.0515 text_tokens: 3964.0 total_loss: 10.779, reduced_llm_loss: 10.779 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 9.420 tgs: 76934.8 e2e_tgs: 11936.3
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 3/13 data_time: 0.0064 lr: 0.000059 time: 0.0505 text_tokens: 3898.0 total_loss: 10.176, reduced_llm_loss: 10.176 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 7.346 tgs: 77123.4 e2e_tgs: 16303.9
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 4/13 data_time: 0.0087 lr: 0.000057 time: 0.0509 text_tokens: 4034.0 total_loss: 9.867, reduced_llm_loss: 9.867 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 6.393 tgs: 79315.5 e2e_tgs: 20124.4
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 5/13 data_time: 0.0090 lr: 0.000053 time: 0.0523 text_tokens: 3991.0 total_loss: 9.639, reduced_llm_loss: 9.639 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 6.080 tgs: 76342.2 e2e_tgs: 23295.4
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 6/13 data_time: 0.0063 lr: 0.000047 time: 0.0522 text_tokens: 3998.0 total_loss: 9.455, reduced_llm_loss: 9.455 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 6.155 tgs: 76605.0 e2e_tgs: 26114.5
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 7/13 data_time: 0.0091 lr: 0.000041 time: 0.0520 text_tokens: 4004.0 total_loss: 9.332, reduced_llm_loss: 9.332 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 5.986 tgs: 77040.2 e2e_tgs: 28515.1
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 8/13 data_time: 0.0087 lr: 0.000034 time: 0.0499 text_tokens: 3966.0 total_loss: 9.307, reduced_llm_loss: 9.307 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 5.740 tgs: 79467.9 e2e_tgs: 30659.3
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 9/13 data_time: 0.0062 lr: 0.000027 time: 0.0501 text_tokens: 4083.0 total_loss: 9.091, reduced_llm_loss: 9.091 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 6.092 tgs: 81479.0 e2e_tgs: 32743.6
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 10/13 data_time: 0.0057 lr: 0.000020 time: 0.0502 text_tokens: 4044.0 total_loss: 9.120, reduced_llm_loss: 9.120 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 6.090 tgs: 80491.7 e2e_tgs: 34594.4
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 11/13 data_time: 0.0111 lr: 0.000014 time: 0.0548 text_tokens: 4042.0 total_loss: 9.287, reduced_llm_loss: 9.287 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 5.432 tgs: 73822.0 e2e_tgs: 35971.9
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 12/13 data_time: 0.0051 lr: 0.000008 time: 0.0509 text_tokens: 4010.0 total_loss: 9.007, reduced_llm_loss: 9.007 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 5.998 tgs: 78743.6 e2e_tgs: 37460.4
[XTuner][RANK 0][2025-09-08 03:04:13][INFO] Step 13/13 data_time: 0.0081 lr: 0.000004 time: 0.0497 text_tokens: 4000.0 total_loss: 9.129, reduced_llm_loss: 9.129 max_memory: 7.06 GB reserved_memory: 8.79 GB grad_norm: 5.567 tgs: 80562.5 e2e_tgs: 38765.3
上述日志显示最大仅需 8G 显存即可运行,如果你还想降低显存占用,可以考虑修改 examples/v1/config/sft_qwen3_tiny.py 中的 num_hidden_layers 和 hidden_size 参数。
MLLM 多模态大模型微调#
单卡启动一次简单的 MLLM 微调任务,验证安装是否成功:
以 Intern-S1 科学多模态为例
1torchrun xtuner/v1/train/cli/sft.py --config examples/v1/config/sft_intern_s1_tiny_config.py
运行成功后,日志如下所示
[XTuner][2025-09-08 03:09:17][INFO] Using toy tokenizer: <xtuner.v1.train.toy_tokenizer.UTF8ByteTokenizer object at 0x7f4c4a256b70>!
[XTuner][2025-09-08 03:09:17][INFO]
============XTuner Training Environment============
XTUNER_DETERMINISTIC: None
XTUNER_FILE_OPEN_CONCURRENCY: None
XTUNER_TOKENIZE_CHUNK_SIZE: None
XTUNER_TOKENIZE_WORKERS: None
XTUNER_ACTIVATION_OFFLOAD: None
XTUNER_USE_FA3: None
XTUNER_GROUP_GEMM: cutlass
XTUNER_DISPATCHER_DEBUG: None
XTUNER_ROUTER_DEBUG: None
==================================================
[XTuner][RANK 0][2025-09-08 03:09:18][WARNING] Model pad_token_id 151645 is different from tokenizer pad_token_id 258. Using tokenizer pad_token_id 258.
[XTuner][RANK 0][2025-09-08 03:09:18][INFO] [mllm_sft_text_example_data.jsonl] Using dynamic image size: True and max_dynamic_patch: 12 and min_dynamic_patch: 1 and use_thumbnail: True data_aug: False for training.
[XTuner][RANK 0][2025-09-08 03:09:18][INFO] Start loading [pure_text]tests/resource/mllm_sft_text_example_data.jsonl with sample_ratio=1.0.
WARNING: input_ids length 4304 exceeds model_max_length 4096. truncated!
WARNING: input_ids length 4639 exceeds model_max_length 4096. truncated!
WARNING: input_ids length 4421 exceeds model_max_length 4096. truncated!
WARNING: input_ids length 5397 exceeds model_max_length 4096. truncated!
[XTuner][RANK 0][2025-09-08 03:09:18][INFO] [Dataset] (Original) pure_text/mllm_sft_text_example_data.jsonl: 200 samples.
[XTuner][RANK 0][2025-09-08 03:09:18][INFO] [mllm_sft_media_example_data.jsonl] Using dynamic image size: True and max_dynamic_patch: 12 and min_dynamic_patch: 1 and use_thumbnail: True data_aug: False for training.
[XTuner][RANK 0][2025-09-08 03:09:18][INFO] Start loading [media]tests/resource/mllm_sft_media_example_data.jsonl with sample_ratio=2.0.
WARNING: input_ids length 4171 exceeds model_max_length 4096. truncated!
WARNING: input_ids length 4188 exceeds model_max_length 4096. truncated!
[XTuner][RANK 0][2025-09-08 03:09:18][INFO] [Dataset] (Original) media/mllm_sft_media_example_data.jsonl: 44 samples.
[XTuner][RANK 0][2025-09-08 03:09:18][INFO] [Dataset] Start packing data of ExpandSoftPackDataset.
[XTuner][RANK 0][2025-09-08 03:09:18][INFO] Using 8 pack workers for packing datasets.
1it [00:00, 294.11it/s]
[XTuner][RANK 0][2025-09-08 03:09:18][INFO] [Dataset] (Original) 244 samples.
[XTuner][RANK 0][2025-09-08 03:09:18][INFO] [Dataset] (Packed) 109 samples.
[FSDP Sharding]: 0%| | 0/8 [00:00<?, ?it/s]/cpfs01/shared/llm_razor/huanghaian/code/refactor_xtuner/xtuner/xtuner/v1/model/utils/checkpointing.py:92: FutureWarning: Please specify CheckpointImpl.NO_REENTRANT as CheckpointImpl.REENTRANT will soon be removed as the default and eventually deprecated.
return ptd_checkpoint_wrapper(module, *args, **kwargs)
[FSDP Sharding]: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 278.75it/s]
[Vision Fully Shard]: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 295.44it/s]
[XTuner][RANK 0][2025-09-08 03:09:19][INFO] FSDPInternS1ForConditionalGeneration(
(vision_tower): FSDPInternS1VisionModel(
(embeddings): InternVLVisionEmbeddings(
(patch_embeddings): InternVLVisionPatchEmbeddings(
(projection): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14))
)
(dropout): Dropout(p=0.0, inplace=False)
)
(encoder): InternS1VisionEncoder(
(layer): ModuleList(
(0-23): 24 x FSDPCheckpointWrapper(
(_checkpoint_wrapped_module): InternS1VisionLayer(
(attention): InternS1VisionAttention(
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(projection_layer): Linear(in_features=1024, out_features=1024, bias=True)
(projection_dropout): Identity()
(q_norm): Identity()
(k_norm): Identity()
)
(mlp): InternVLVisionMLP(
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
)
(layernorm_before): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(layernorm_after): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
(drop_path1): Identity()
(drop_path2): Identity()
)
)
)
)
(layernorm): Identity()
)
(multi_modal_projector): FSDPInternS1MultiModalProjector(
(layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
(linear_1): Linear(in_features=4096, out_features=1024, bias=True)
(act): GELUActivation()
(linear_2): Linear(in_features=1024, out_features=1024, bias=True)
)
(language_model): FSDPQwen3Dense(
(norm): FSDPRMSNorm((1024,), eps=1e-06)
(lm_head): FSDPLMHead(in_features=1024, out_features=300, bias=False)
(layers): ModuleDict(
(0): FSDPCheckpointWrapper(
(_checkpoint_wrapped_module): DenseDecoderLayer(
(self_attn): MultiHeadAttention(
(q_proj): _Linear(in_features=1024, out_features=4096, bias=False)
(k_proj): _Linear(in_features=1024, out_features=1024, bias=False)
(v_proj): _Linear(in_features=1024, out_features=1024, bias=False)
(o_proj): _Linear(in_features=4096, out_features=1024, bias=False)
(q_norm): RMSNorm((128,), eps=1e-06)
(k_norm): RMSNorm((128,), eps=1e-06)
)
(mlp): DenseMLP(
(gate_proj): _Linear(in_features=1024, out_features=4096, bias=False)
(up_proj): _Linear(in_features=1024, out_features=4096, bias=False)
(down_proj): _Linear(in_features=4096, out_features=1024, bias=False)
(act_fn): SiLU()
)
(input_layernorm): RMSNorm((1024,), eps=1e-06)
(post_attention_layernorm): RMSNorm((1024,), eps=1e-06)
)
)
(1): FSDPCheckpointWrapper(
(_checkpoint_wrapped_module): DenseDecoderLayer(
(self_attn): MultiHeadAttention(
(q_proj): _Linear(in_features=1024, out_features=4096, bias=False)
(k_proj): _Linear(in_features=1024, out_features=1024, bias=False)
(v_proj): _Linear(in_features=1024, out_features=1024, bias=False)
(o_proj): _Linear(in_features=4096, out_features=1024, bias=False)
(q_norm): RMSNorm((128,), eps=1e-06)
(k_norm): RMSNorm((128,), eps=1e-06)
)
(mlp): DenseMLP(
(gate_proj): _Linear(in_features=1024, out_features=4096, bias=False)
(up_proj): _Linear(in_features=1024, out_features=4096, bias=False)
(down_proj): _Linear(in_features=4096, out_features=1024, bias=False)
(act_fn): SiLU()
)
(input_layernorm): RMSNorm((1024,), eps=1e-06)
(post_attention_layernorm): RMSNorm((1024,), eps=1e-06)
)
)
...
)
(rotary_emb): RotaryEmbedding()
(embed_tokens): FSDPEmbedding(300, 1024, padding_idx=258)
)
)
[XTuner][RANK 0][2025-09-08 03:09:19][INFO] Total trainable parameters: 494.0M, total parameters: 494.0M
[XTuner][RANK 0][2025-09-08 03:09:19][INFO] Untrainable parameters names: []
[XTuner][RANK 0][2025-09-08 03:09:20][INFO] grad_accumulation_steps: 1
[XTuner][RANK 0][2025-09-08 03:09:21][INFO] Step 1/109 data_time: 0.0178 lr: 0.000100 time: 1.3009 text_tokens: 3915.0 total_loss: 5.852, reduced_llm_loss: 5.852 max_memory: 7.49 GB reserved_memory: 7.86 GB grad_norm: 17.179 tgs: 3009.5 e2e_tgs: 2968.7
[XTuner][RANK 0][2025-09-08 03:09:21][INFO] Step 2/109 data_time: 0.0873 lr: 0.000100 time: 0.4088 text_tokens: 3469.0 total_loss: 5.776, reduced_llm_loss: 5.776 max_memory: 8.18 GB reserved_memory: 9.55 GB grad_norm: 20.855 tgs: 8485.0 e2e_tgs: 4067.1
[XTuner][RANK 0][2025-09-08 03:09:22][INFO] Step 3/109 data_time: 0.0741 lr: 0.000100 time: 0.3770 text_tokens: 3901.0 total_loss: 4.781, reduced_llm_loss: 4.781 max_memory: 8.24 GB reserved_memory: 9.77 GB grad_norm: 21.934 tgs: 10348.8 e2e_tgs: 4977.3
[XTuner][RANK 0][2025-09-08 03:09:22][INFO] Step 4/109 data_time: 0.0181 lr: 0.000100 time: 0.3268 text_tokens: 3286.0 total_loss: 5.629, reduced_llm_loss: 5.629 max_memory: 7.66 GB reserved_memory: 9.77 GB grad_norm: 10.874 tgs: 10054.8 e2e_tgs: 5576.8
[XTuner][RANK 0][2025-09-08 03:09:22][INFO] Step 5/109 data_time: 0.0114 lr: 0.000100 time: 0.3039 text_tokens: 3624.0 total_loss: 5.303, reduced_llm_loss: 5.303 max_memory: 7.59 GB reserved_memory: 9.77 GB grad_norm: 8.132 tgs: 11926.8 e2e_tgs: 6212.8
[XTuner][RANK 0][2025-09-08 03:09:23][INFO] Step 6/109 data_time: 0.0734 lr: 0.000100 time: 0.3455 text_tokens: 3602.0 total_loss: 5.339, reduced_llm_loss: 5.339 max_memory: 8.18 GB reserved_memory: 9.77 GB grad_norm: 10.068 tgs: 10424.9 e2e_tgs: 6510.3
[XTuner][RANK 0][2025-09-08 03:09:23][INFO] Step 7/109 data_time: 0.0821 lr: 0.000100 time: 0.3496 text_tokens: 3850.0 total_loss: 4.532, reduced_llm_loss: 4.532 max_memory: 8.18 GB reserved_memory: 9.77 GB grad_norm: 5.207 tgs: 11011.1 e2e_tgs: 6780.2
[XTuner][RANK 0][2025-09-08 03:09:24][INFO] Step 8/109 data_time: 0.0774 lr: 0.000100 time: 0.3452 text_tokens: 3514.0 total_loss: 4.550, reduced_llm_loss: 4.550 max_memory: 8.18 GB reserved_memory: 9.77 GB grad_norm: 4.633 tgs: 10180.4 e2e_tgs: 6933.3
[XTuner][RANK 0][2025-09-08 03:09:24][INFO] Step 9/109 data_time: 0.0132 lr: 0.000100 time: 0.3076 text_tokens: 4036.0 total_loss: 5.225, reduced_llm_loss: 5.225 max_memory: 7.59 GB reserved_memory: 9.77 GB grad_norm: 6.815 tgs: 13122.0 e2e_tgs: 7332.6
[XTuner][RANK 0][2025-09-08 03:09:24][INFO] Step 10/109 data_time: 0.0165 lr: 0.000100 time: 0.3021 text_tokens: 3603.0 total_loss: 4.614, reduced_llm_loss: 4.614 max_memory: 7.66 GB reserved_memory: 9.77 GB grad_norm: 6.599 tgs: 11928.0 e2e_tgs: 7593.1
[XTuner][RANK 0][2025-09-08 03:09:25][INFO] Step 11/109 data_time: 0.0771 lr: 0.000100 time: 0.3457 text_tokens: 3509.0 total_loss: 4.728, reduced_llm_loss: 4.728 max_memory: 8.18 GB reserved_memory: 9.77 GB grad_norm: 7.904 tgs: 10149.9 e2e_tgs: 7648.9
[XTuner][RANK 0][2025-09-08 03:09:25][INFO] Step 12/109 data_time: 0.0128 lr: 0.000100 time: 0.3073 text_tokens: 3624.0 total_loss: 4.677, reduced_llm_loss: 4.677 max_memory: 7.59 GB reserved_memory: 9.77 GB grad_norm: 3.419 tgs: 11792.1 e2e_tgs: 7858.2
WARNING: input_ids length 4171 exceeds model_max_length 4096. truncated!
[XTuner][RANK 0][2025-09-08 03:09:25][INFO] Step 13/109 data_time: 0.0759 lr: 0.000100 time: 0.3480 text_tokens: 4095.0 total_loss: 4.579, reduced_llm_loss: 4.579 max_memory: 8.18 GB reserved_memory: 9.77 GB grad_norm: 7.580 tgs: 11767.7 e2e_tgs: 7984.5
[XTuner][RANK 0][2025-09-08 03:09:26][INFO] Step 14/109 data_time: 0.0135 lr: 0.000100 time: 0.3094 text_tokens: 3813.0 total_loss: 4.531, reduced_llm_loss: 4.531 max_memory: 7.59 GB reserved_memory: 9.77 GB grad_norm: 3.034 tgs: 12323.2 e2e_tgs: 8178.5
[XTuner][RANK 0][2025-09-08 03:09:26][INFO] Step 15/109 data_time: 0.0329 lr: 0.000100 time: 0.3346 text_tokens: 4057.0 total_loss: 4.348, reduced_llm_loss: 4.348 max_memory: 7.79 GB reserved_memory: 9.77 GB grad_norm: 2.568 tgs: 12125.5 e2e_tgs: 8334.6
[XTuner][RANK 0][2025-09-08 03:09:26][INFO] Step 16/109 data_time: 0.0170 lr: 0.000100 time: 0.3209 text_tokens: 3852.0 total_loss: 4.243, reduced_llm_loss: 4.243 max_memory: 7.66 GB reserved_memory: 9.77 GB grad_norm: 1.819 tgs: 12003.9 e2e_tgs: 8480.7
WARNING: input_ids length 4188 exceeds model_max_length 4096. truncated!
[XTuner][RANK 0][2025-09-08 03:09:27][INFO] Step 17/109 data_time: 0.0573 lr: 0.000100 time: 0.3553 text_tokens: 4095.0 total_loss: 4.738, reduced_llm_loss: 4.738 max_memory: 8.18 GB reserved_memory: 9.77 GB grad_norm: 8.874 tgs: 11525.3 e2e_tgs: 8559.7
[XTuner][RANK 0][2025-09-08 03:09:27][INFO] Step 18/109 data_time: 0.0139 lr: 0.000100 time: 0.3192 text_tokens: 3838.0 total_loss: 4.512, reduced_llm_loss: 4.512 max_memory: 7.59 GB reserved_memory: 9.77 GB grad_norm: 3.116 tgs: 12025.6 e2e_tgs: 8685.5
[XTuner][RANK 0][2025-09-08 03:09:28][INFO] Step 19/109 data_time: 0.0177 lr: 0.000100 time: 0.3119 text_tokens: 3509.0 total_loss: 4.229, reduced_llm_loss: 4.229 max_memory: 7.66 GB reserved_memory: 9.77 GB grad_norm: 2.872 tgs: 11251.3 e2e_tgs: 8764.3
[XTuner][RANK 0][2025-09-08 03:09:28][INFO] Step 20/109 data_time: 0.0630 lr: 0.000100 time: 0.3474 text_tokens: 3897.0 total_loss: 4.070, reduced_llm_loss: 4.070 max_memory: 8.18 GB reserved_memory: 9.77 GB grad_norm: 6.970 tgs: 11217.2 e2e_tgs: 8798.9
[XTuner][RANK 0][2025-09-08 03:09:28][INFO] Step 21/109 data_time: 0.0187 lr: 0.000100 time: 0.3068 text_tokens: 3827.0 total_loss: 4.054, reduced_llm_loss: 4.054 max_memory: 7.66 GB reserved_memory: 9.77 GB grad_norm: 2.400 tgs: 12473.7 e2e_tgs: 8907.0
[XTuner][RANK 0][2025-09-08 03:09:29][INFO] Step 22/109 data_time: 0.0684 lr: 0.000100 time: 0.3509 text_tokens: 4092.0 total_loss: 3.505, reduced_llm_loss: 3.505 max_memory: 8.24 GB reserved_memory: 9.77 GB grad_norm: 2.953 tgs: 11663.1 e2e_tgs: 8944.9
[XTuner][RANK 0][2025-09-08 03:09:29][INFO] Step 23/109 data_time: 0.0919 lr: 0.000100 time: 0.3813 text_tokens: 3960.0 total_loss: 3.896, reduced_llm_loss: 3.896 max_memory: 8.30 GB reserved_memory: 10.59 GB grad_norm: 3.337 tgs: 10384.5 e2e_tgs: 8916.4
[XTuner][RANK 0][2025-09-08 03:09:30][INFO] Step 24/109 data_time: 0.0234 lr: 0.000100 time: 0.3595 text_tokens: 3967.0 total_loss: 4.578, reduced_llm_loss: 4.578 max_memory: 7.72 GB reserved_memory: 10.59 GB grad_norm: 3.265 tgs: 11035.0 e2e_tgs: 8970.3
WARNING: input_ids length 4171 exceeds model_max_length 4096. truncated!
[XTuner][RANK 0][2025-09-08 03:09:30][INFO] Step 25/109 data_time: 0.0734 lr: 0.000100 time: 0.3467 text_tokens: 4095.0 total_loss: 5.200, reduced_llm_loss: 5.200 max_memory: 8.18 GB reserved_memory: 10.59 GB grad_norm: 7.238 tgs: 11810.1 e2e_tgs: 9000.7
上述日志显示最大仅需 10G 显存即可运行,如果你还想降低显存占用,可以考虑修改 examples/v1/config/sft_intern_s1_tiny_config.py 中的 llm_cfg 字典相关参数。
常见问题#
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
解决方法:
pip uninstall opencv-python pip install opencv-python-headless