bedrock-fine-tuning
π―Skillfrom adaptationio/skrillz
bedrock-fine-tuning skill from adaptationio/skrillz
Part of
adaptationio/skrillz(191 items)
Installation
/plugin marketplace add adaptationio/Skrillz/plugin install skrillz@adaptationio-Skrillz/plugin enable skrillz@adaptationio-Skrillz/plugin marketplace add /path/to/skrillz/plugin install skrillz@local+ 4 more commands
Skill Details
Amazon Bedrock Model Customization with fine-tuning, continued pre-training, reinforcement fine-tuning (NEW 2025 - 66% accuracy gains), and distillation. Create customization jobs, monitor training, deploy custom models, and evaluate performance. Use when customizing Claude, Titan, or other Bedrock models for domain-specific tasks, adapting to proprietary data, improving accuracy on specialized workflows, or distilling large models to smaller ones.
Overview
# Amazon Bedrock Model Customization
Complete guide to customizing Amazon Bedrock foundation models through fine-tuning, continued pre-training, reinforcement fine-tuning, and distillation.
Overview
Amazon Bedrock Model Customization allows you to adapt foundation models to your specific use cases without managing infrastructure. Four customization approaches are available:
1. Fine-Tuning (Supervised Learning)
Adapt models to specific tasks using labeled examples (input-output pairs). Best for:
- Task-specific optimization (classification, extraction, generation)
- Improving responses for domain terminology
- Teaching specific output formats
- Typical gains: 20-40% accuracy improvement
2. Continued Pre-Training (Domain Adaptation)
Continue training on unlabeled domain-specific text to build domain knowledge. Best for:
- Medical, legal, financial, technical domains
- Proprietary knowledge bases
- Industry-specific language
- Typical gains: 15-30% domain accuracy improvement
3. Reinforcement Fine-Tuning (NEW 2025)
Use reinforcement learning with human feedback (RLHF) or AI feedback (RLAIF) for alignment. Best for:
- Improving response quality and safety
- Aligning to brand voice and values
- Reducing hallucinations
- Typical gains: 40-66% accuracy improvement (AWS announced 66% gains in 2025)
4. Distillation (Teacher-Student)
Transfer knowledge from larger models to smaller, faster models. Best for:
- Cost optimization (smaller models are cheaper)
- Latency reduction (faster inference)
- Maintaining quality while reducing size
- Typical gains: 80-90% of teacher model quality at 50-70% cost reduction
Supported Models
| Model | Fine-Tuning | Continued Pre-Training | Reinforcement | Distillation |
|-------|-------------|------------------------|---------------|--------------|
| Claude 3.5 Sonnet | β | β | β (2025) | β (teacher) |
| Claude 3 Haiku | β | β | β (2025) | β (student) |
| Claude 3 Opus | β | β | β (2025) | β (teacher) |
| Titan Text G1 | β | β | β | β |
| Titan Text Lite | β | β | β | β (student) |
| Titan Embeddings | β | β | β | β |
| Cohere Command | β | β | β | β |
| AI21 Jurassic-2 | β | β | β | β |
Note: Availability varies by region. Check AWS Console for latest model support.
Training Data Formats
Fine-Tuning Format (JSONL)
```jsonl
{"prompt": "Classify the medical condition: Patient presents with fever, cough, and fatigue.", "completion": "Likely viral infection. Recommend rest, hydration, and symptomatic treatment."}
{"prompt": "Classify the medical condition: Patient has chest pain, shortness of breath, and dizziness.", "completion": "Potential cardiac event. Immediate emergency evaluation required."}
{"prompt": "Classify the medical condition: Patient reports persistent headache and light sensitivity.", "completion": "Possible migraine. Consider neurological consultation if symptoms persist."}
```
Requirements:
- Minimum 32 examples (recommended: 1000+)
- Maximum 10,000 examples per job
- Each example: prompt + completion
- JSONL format (one JSON object per line)
- Max 32K tokens per example
Continued Pre-Training Format (JSONL)
```jsonl
{"text": "The HIPAA Privacy Rule establishes national standards for protecting individuals' medical records and personal health information. Covered entities must implement safeguards to ensure confidentiality."}
{"text": "Electronic health records (EHR) systems integrate patient data from multiple sources, enabling comprehensive care coordination. Interoperability standards like HL7 FHIR facilitate data exchange."}
{"text": "Clinical decision support systems (CDSS) analyze patient data to provide evidence-based recommendations. Integration with EHR workflows improves diagnostic accuracy and treatment outcomes."}
```
Requirements:
- Minimum 1000 examples (recommended: 10,000+)
- Maximum 100,000 examples per job
- Unlabeled text only
- JSONL format
- Max 32K tokens per document
Reinforcement Fine-Tuning Format (JSONL)
```jsonl
{"prompt": "Explain type 2 diabetes to a patient.", "chosen": "Type 2 diabetes is a condition where your body doesn't use insulin properly. This causes high blood sugar. Managing it involves healthy eating, exercise, and sometimes medication.", "rejected": "Type 2 diabetes mellitus is characterized by insulin resistance and relative insulin deficiency leading to hyperglycemia."}
{"prompt": "What should I do if I miss a dose?", "chosen": "If you miss a dose, take it as soon as you remember. If it's almost time for your next dose, skip the missed one. Don't double up. Call your doctor if you have questions.", "rejected": "Consult the prescribing information or contact your healthcare provider immediately."}
```
Requirements:
- Minimum 100 preference pairs (recommended: 1000+)
- Each example: prompt + chosen response + rejected response
- JSONL format
- Max 32K tokens per example
- Ranking score optional (0.0-1.0)
Distillation Format (No Training Data Required)
Distillation uses the teacher model's outputs automatically:
```python
# Configuration only - no training data needed
distillation_config = {
'teacherModelId': 'anthropic.claude-3-5-sonnet-20241022-v2:0',
'studentModelId': 'anthropic.claude-3-haiku-20240307-v1:0',
'distillationDataSource': {
'promptDataset': {
's3Uri': 's3://bucket/prompts.jsonl' # Just prompts, no completions
}
}
}
```
Prompt Dataset Format:
```jsonl
{"prompt": "Explain the water cycle."}
{"prompt": "What are the symptoms of the flu?"}
{"prompt": "Describe photosynthesis."}
```
Requirements:
- Minimum 1000 prompts (recommended: 10,000+)
- Teacher model generates completions automatically
- Student model trained to match teacher outputs
Quick Start
1. Prepare Training Data
```python
import json
# Fine-tuning examples
training_data = [
{
"prompt": "Classify sentiment: This product exceeded my expectations!",
"completion": "Positive"
},
{
"prompt": "Classify sentiment: Terrible customer service, very disappointed.",
"completion": "Negative"
},
{
"prompt": "Classify sentiment: The item was okay, nothing special.",
"completion": "Neutral"
}
]
# Save as JSONL
with open('training_data.jsonl', 'w') as f:
for example in training_data:
f.write(json.dumps(example) + '\n')
```
2. Upload to S3
```python
import boto3
s3 = boto3.client('s3')
bucket_name = 'my-bedrock-training-bucket'
# Upload training data
s3.upload_file('training_data.jsonl', bucket_name, 'fine-tuning/training_data.jsonl')
# Upload validation data (optional but recommended)
s3.upload_file('validation_data.jsonl', bucket_name, 'fine-tuning/validation_data.jsonl')
```
3. Create Customization Job
```python
bedrock = boto3.client('bedrock')
response = bedrock.create_model_customization_job(
jobName='sentiment-classifier-v1',
customModelName='sentiment-classifier',
roleArn='arn:aws:iam::123456789012:role/BedrockCustomizationRole',
baseModelIdentifier='anthropic.claude-3-haiku-20240307-v1:0',
trainingDataConfig={
's3Uri': f's3://{bucket_name}/fine-tuning/training_data.jsonl'
},
validationDataConfig={
's3Uri': f's3://{bucket_name}/fine-tuning/validation_data.jsonl'
},
outputDataConfig={
's3Uri': f's3://{bucket_name}/fine-tuning/output/'
},
hyperParameters={
'epochCount': '3',
'batchSize': '8',
'learningRate': '0.00001'
}
)
job_arn = response['jobArn']
print(f"Customization job created: {job_arn}")
```
4. Monitor Training
```python
# Check job status
response = bedrock.get_model_customization_job(jobIdentifier=job_arn)
status = response['status'] # InProgress, Completed, Failed, Stopped
print(f"Job status: {status}")
if status == 'Completed':
custom_model_arn = response['outputModelArn']
print(f"Custom model ARN: {custom_model_arn}")
```
5. Deploy and Test
```python
bedrock_runtime = boto3.client('bedrock-runtime')
# Use custom model
response = bedrock_runtime.invoke_model(
modelId=custom_model_arn,
body=json.dumps({
"prompt": "Classify sentiment: I love this product!",
"max_tokens": 50
})
)
result = json.loads(response['body'].read())
print(f"Prediction: {result['completion']}")
```
Operations
create-fine-tuning-job
Create a supervised fine-tuning job with labeled examples.
```python
import boto3
import json
def create_fine_tuning_job(
job_name: str,
model_name: str,
base_model_id: str,
training_s3_uri: str,
output_s3_uri: str,
role_arn: str,
validation_s3_uri: str = None,
hyper_params: dict = None
) -> str:
"""
Create fine-tuning job for task-specific adaptation.
Args:
job_name: Unique job identifier
model_name: Name for custom model
base_model_id: Base model ARN (e.g., Claude 3 Haiku)
training_s3_uri: S3 path to training JSONL
output_s3_uri: S3 path for outputs
role_arn: IAM role with Bedrock + S3 permissions
validation_s3_uri: Optional validation dataset
hyper_params: Training hyperparameters
Returns:
Job ARN for monitoring
"""
bedrock = boto3.client('bedrock')
# Default hyperparameters
if hyper_params is None:
hyper_params = {
'epochCount': '3', # Number of training epochs
'batchSize': '8', # Batch size (4, 8, 16, 32)
'learningRate': '0.00001', # Learning rate (0.00001 - 0.0001)
'learningRateWarmupSteps': '0'
}
# Build configuration
config = {
'jobName': job_name,
'customModelName': model_name,
'roleArn': role_arn,
'baseModelIdentifier': base_model_id,
'trainingDataConfig': {
's3Uri': training_s3_uri
},
'outputDataConfig': {
's3Uri': output_s3_uri
},
'hyperParameters': hyper_params,
'customizationType': 'FINE_TUNING'
}
# Add validation data if provided
if validation_s3_uri:
config['validationDataConfig'] = {
's3Uri': validation_s3_uri
}
# Create job
response = bedrock.create_model_customization_job(**config)
print(f"Fine-tuning job created: {response['jobArn']}")
return response['jobArn']
# Example: Fine-tune Claude 3 Haiku for medical classification
job_arn = create_fine_tuning_job(
job_name='medical-classifier-v1',
model_name='medical-classifier',
base_model_id='anthropic.claude-3-haiku-20240307-v1:0',
training_s3_uri='s3://my-bucket/medical/training.jsonl',
output_s3_uri='s3://my-bucket/medical/output/',
role_arn='arn:aws:iam::123456789012:role/BedrockCustomizationRole',
validation_s3_uri='s3://my-bucket/medical/validation.jsonl',
hyper_params={
'epochCount': '5',
'batchSize': '16',
'learningRate': '0.00002'
}
)
```
create-continued-pretraining-job
Create continued pre-training job for domain adaptation.
```python
def create_continued_pretraining_job(
job_name: str,
model_name: str,
base_model_id: str,
training_s3_uri: str,
output_s3_uri: str,
role_arn: str,
validation_s3_uri: str = None
) -> str:
"""
Create continued pre-training job for domain knowledge.
Args:
job_name: Unique job identifier
model_name: Name for custom model
base_model_id: Base model ARN
training_s3_uri: S3 path to unlabeled text JSONL
output_s3_uri: S3 path for outputs
role_arn: IAM role ARN
validation_s3_uri: Optional validation dataset
Returns:
Job ARN for monitoring
"""
bedrock = boto3.client('bedrock')
config = {
'jobName': job_name,
'customModelName': model_name,
'roleArn': role_arn,
'baseModelIdentifier': base_model_id,
'trainingDataConfig': {
's3Uri': training_s3_uri
},
'outputDataConfig': {
's3Uri': output_s3_uri
},
'hyperParameters': {
'epochCount': '1', # Usually 1 epoch for continued pre-training
'batchSize': '16',
'learningRate': '0.000005' # Lower LR for stability
},
'customizationType': 'CONTINUED_PRE_TRAINING'
}
if validation_s3_uri:
config['validationDataConfig'] = {
's3Uri': validation_s3_uri
}
response = bedrock.create_model_customization_job(**config)
print(f"Continued pre-training job created: {response['jobArn']}")
return response['jobArn']
# Example: Adapt Claude for medical domain
job_arn = create_continued_pretraining_job(
job_name='medical-domain-adapter-v1',
model_name='claude-medical',
base_model_id='anthropic.claude-3-5-sonnet-20241022-v2:0',
training_s3_uri='s3://my-bucket/medical-corpus/documents.jsonl',
output_s3_uri='s3://my-bucket/medical-corpus/output/',
role_arn='arn:aws:iam::123456789012:role/BedrockCustomizationRole'
)
```
create-reinforcement-finetuning-job
Create reinforcement fine-tuning job with preference data (NEW 2025).
```python
def create_reinforcement_finetuning_job(
job_name: str,
model_name: str,
base_model_id: str,
preference_s3_uri: str,
output_s3_uri: str,
role_arn: str,
algorithm: str = 'DPO' # DPO, PPO, or RLAIF
) -> str:
"""
Create reinforcement fine-tuning job for alignment (NEW 2025).
Args:
job_name: Unique job identifier
model_name: Name for custom model
base_model_id: Base model ARN
preference_s3_uri: S3 path to preference pairs JSONL
output_s3_uri: S3 path for outputs
role_arn: IAM role ARN
algorithm: RL algorithm (DPO, PPO, RLAIF)
Returns:
Job ARN for monitoring
"""
bedrock = boto3.client('bedrock')
config = {
'jobName': job_name,
'customModelName': model_name,
'roleArn': role_arn,
'baseModelIdentifier': base_model_id,
'trainingDataConfig': {
's3Uri': preference_s3_uri
},
'outputDataConfig': {
's3Uri': output_s3_uri
},
'hyperParameters': {
'epochCount': '3',
'batchSize': '8',
'learningRate': '0.00001',
'rlAlgorithm': algorithm,
'beta': '0.1' # KL divergence coefficient
},
'customizationType': 'REINFORCEMENT_FINE_TUNING'
}
response = bedrock.create_model_customization_job(**config)
print(f"Reinforcement fine-tuning job created: {response['jobArn']}")
print(f"Expected accuracy gains: 40-66% improvement")
return response['jobArn']
# Example: Improve response quality with preference learning
job_arn = create_reinforcement_finetuning_job(
job_name='claude-aligned-v1',
model_name='claude-aligned',
base_model_id='anthropic.claude-3-5-sonnet-20241022-v2:0',
preference_s3_uri='s3://my-bucket/preferences/pairs.jsonl',
output_s3_uri='s3://my-bucket/preferences/output/',
role_arn='arn:aws:iam::123456789012:role/BedrockCustomizationRole',
algorithm='DPO' # Direct Preference Optimization
)
```
create-distillation-job
Create distillation job to transfer knowledge from large to small model.
```python
def create_distillation_job(
job_name: str,
model_name: str,
teacher_model_id: str,
student_model_id: str,
prompts_s3_uri: str,
output_s3_uri: str,
role_arn: str
) -> str:
"""
Create distillation job to compress large model knowledge.
Args:
job_name: Unique job identifier
model_name: Name for distilled model
teacher_model_id: Large model to learn from
student_model_id: Small model to train
prompts_s3_uri: S3 path to prompts JSONL
output_s3_uri: S3 path for outputs
role_arn: IAM role ARN
Returns:
Job ARN for monitoring
"""
bedrock = boto3.client('bedrock')
config = {
'jobName': job_name,
'customModelName': model_name,
'roleArn': role_arn,
'baseModelIdentifier': student_model_id,
'trainingDataConfig': {
's3Uri': prompts_s3_uri,
'teacherModelIdentifier': teacher_model_id
},
'outputDataConfig': {
's3Uri': output_s3_uri
},
'hyperParameters': {
'epochCount': '3',
'batchSize': '16',
'learningRate': '0.00002',
'temperature': '1.0', # Softmax temperature for distillation
'alpha': '0.5' # Balance between hard and soft targets
},
'customizationType': 'DISTILLATION'
}
response = bedrock.create_model_customization_job(**config)
print(f"Distillation job created: {response['jobArn']}")
print(f"Teacher: {teacher_model_id}")
print(f"Student: {student_model_id}")
print(f"Expected: 80-90% teacher quality at 50-70% cost")
return response['jobArn']
# Example: Distill Claude 3.5 Sonnet to Haiku
job_arn = create_distillation_job(
job_name='claude-haiku-distilled-v1',
model_name='claude-haiku-distilled',
teacher_model_id='anthropic.claude-3-5-sonnet-20241022-v2:0',
student_model_id='anthropic.claude-3-haiku-20240307-v1:0',
prompts_s3_uri='s3://my-bucket/distillation/prompts.jsonl',
output_s3_uri='s3://my-bucket/distillation/output/',
role_arn='arn:aws:iam::123456789012:role/BedrockCustomizationRole'
)
```
monitor-job
Track training progress and retrieve metrics.
```python
import time
from typing import Dict, Any
def monitor_job(job_arn: str, poll_interval: int = 60) -> Dict[str, Any]:
"""
Monitor customization job until completion.
Args:
job_arn: Job ARN to monitor
poll_interval: Seconds between status checks
Returns:
Final job details with metrics
"""
bedrock = boto3.client('bedrock')
print(f"Monitoring job: {job_arn}")
while True:
response = bedrock.get_model_customization_job(
jobIdentifier=job_arn
)
status = response['status']
print(f"Status: {status}", end='')
# Show metrics if available
if 'trainingMetrics' in response:
metrics = response['trainingMetrics']
if 'trainingLoss' in metrics:
print(f" | Loss: {metrics['trainingLoss']:.4f}", end='')
print() # Newline
# Check terminal states
if status == 'Completed':
print(f"Job completed successfully!")
print(f"Custom model ARN: {response['outputModelArn']}")
return response
elif status == 'Failed':
print(f"Job failed: {response.get('failureMessage', 'Unknown error')}")
return response
elif status == 'Stopped':
print(f"Job was stopped")
return response
# Wait before next check
time.sleep(poll_interval)
# Example: Monitor with automatic polling
job_details = monitor_job(job_arn, poll_interval=60)
if job_details['status'] == 'Completed':
custom_model_arn = job_details['outputModelArn']
# Download metrics from S3
output_uri = job_details['outputDataConfig']['s3Uri']
print(f"Metrics available at: {output_uri}")
```
deploy-custom-model
Provision custom model for inference.
```python
def deploy_custom_model(
model_arn: str,
provisioned_model_name: str,
model_units: int = 1
) -> str:
"""
Deploy custom model with provisioned throughput.
Args:
model_arn: Custom model ARN from training job
provisioned_model_name: Name for provisioned model
model_units: Throughput units (1-10)
Returns:
Provisioned model ARN for inference
"""
bedrock = boto3.client('bedrock')
response = bedrock.create_provisioned_model_throughput(
provisionedModelName=provisioned_model_name,
modelId=model_arn,
modelUnits=model_units
)
provisioned_arn = response['provisionedModelArn']
print(f"Provisioned model created: {provisioned_arn}")
print(f"Throughput: {model_units} units")
print(f"Allow 5-10 minutes for provisioning")
return provisioned_arn
# Example: Deploy with standard throughput
provisioned_arn = deploy_custom_model(
model_arn='arn:aws:bedrock:us-east-1:123456789012:custom-model/medical-classifier-v1',
provisioned_model_name='medical-classifier-prod',
model_units=2
)
# Wait for provisioning
time.sleep(300) # 5 minutes
# Use provisioned model
bedrock_runtime = boto3.client('bedrock-runtime')
response = bedrock_runtime.invoke_model(
modelId=provisioned_arn,
body=json.dumps({
"prompt": "Classify: Patient has fever and cough.",
"max_tokens": 100
})
)
result = json.loads(response['body'].read())
print(f"Prediction: {result['completion']}")
```
evaluate-model
Test custom model performance with evaluation dataset.
```python
import pandas as pd
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
def evaluate_model(
model_id: str,
test_data_path: str,
output_path: str = None
) -> Dict[str, float]:
"""
Evaluate custom model on test dataset.
Args:
model_id: Custom model ARN
test_data_path: Path to test JSONL file
output_path: Optional path to save predictions
Returns:
Evaluation metrics dictionary
"""
bedrock_runtime = boto3.client('bedrock-runtime')
# Load test data
test_data = []
with open(test_data_path, 'r') as f:
for line in f:
test_data.append(json.loads(line))
# Run predictions
predictions = []
ground_truth = []
print(f"Evaluating {len(test_data)} examples...")
for i, example in enumerate(test_data):
if i % 10 == 0:
print(f"Progress: {i}/{len(test_data)}")
# Invoke model
response = bedrock_runtime.invoke_model(
modelId=model_id,
body=json.dumps({
"prompt": example['prompt'],
"max_tokens": 200
})
)
result = json.loads(response['body'].read())
prediction = result['completion'].strip()
predictions.append(prediction)
ground_truth.append(example['completion'].strip())
# Calculate metrics
accuracy = accuracy_score(ground_truth, predictions)
precision, recall, f1, _ = precision_recall_fscore_support(
ground_truth, predictions, average='weighted', zero_division=0
)
metrics = {
'accuracy': accuracy,
'precision': precision,
'recall': recall,
'f1_score': f1,
'total_examples': len(test_data)
}
print("\n=== Evaluation Results ===")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
# Save predictions if requested
if output_path:
results_df = pd.DataFrame({
'prompt': [ex['prompt'] for ex in test_data],
'ground_truth': ground_truth,
'prediction': predictions
})
results_df.to_csv(output_path, index=False)
print(f"Predictions saved to: {output_path}")
return metrics
# Example: Evaluate medical classifier
metrics = evaluate_model(
model_id='arn:aws:bedrock:us-east-1:123456789012:provisioned-model/medical-classifier-prod',
test_data_path='test_data.jsonl',
output_path='evaluation_results.csv'
)
```
Hyperparameter Tuning
Fine-Tuning Parameters
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| epochCount | 1-10 | 3 | Training passes over dataset |
| batchSize | 4-32 | 8 | Examples per training step |
| learningRate | 0.00001-0.0001 | 0.00001 | Step size for weight updates |
| learningRateWarmupSteps | 0-100 | 0 | Gradual LR increase steps |
Tuning Guidelines:
- Small dataset (<100 examples): Lower epochs (1-2), smaller batch (4-8)
- Medium dataset (100-1000): Standard settings (3 epochs, batch 8-16)
- Large dataset (>1000): Higher epochs (5-10), larger batch (16-32)
- Overfitting signs: Reduce epochs or increase batch size
- Underfitting signs: Increase epochs or decrease learning rate
Example Configurations
```python
# Configuration 1: Small dataset, quick iteration
small_dataset_params = {
'epochCount': '2',
'batchSize': '4',
'learningRate': '0.00002',
'learningRateWarmupSteps': '10'
}
# Configuration 2: Balanced, general purpose
balanced_params = {
'epochCount': '3',
'batchSize': '8',
'learningRate': '0.00001',
'learningRateWarmupSteps': '0'
}
# Configuration 3: Large dataset, high quality
large_dataset_params = {
'epochCount': '5',
'batchSize': '16',
'learningRate': '0.000005',
'learningRateWarmupSteps': '20'
}
# Configuration 4: Continued pre-training
pretraining_params = {
'epochCount': '1',
'batchSize': '16',
'learningRate': '0.000005',
'learningRateWarmupSteps': '0'
}
```
Data Preparation Best Practices
1. Data Quality
```python
def validate_training_data(data_path: str) -> bool:
"""
Validate training data quality.
Checks:
- JSONL format validity
- Required fields present
- Token length within limits
- Data distribution balance
"""
import json
from collections import Counter
issues = []
completion_distribution = Counter()
with open(data_path, 'r') as f:
for i, line in enumerate(f, 1):
try:
example = json.loads(line)
except json.JSONDecodeError:
issues.append(f"Line {i}: Invalid JSON")
continue
# Check required fields
if 'prompt' not in example:
issues.append(f"Line {i}: Missing 'prompt' field")
if 'completion' not in example:
issues.append(f"Line {i}: Missing 'completion' field")
# Track completion distribution
if 'completion' in example:
completion_distribution[example['completion']] += 1
# Check token length (approximate)
prompt_tokens = len(example.get('prompt', '').split())
completion_tokens = len(example.get('completion', '').split())
total_tokens = prompt_tokens + completion_tokens
if total_tokens > 8000: # Conservative estimate
issues.append(f"Line {i}: Likely exceeds 32K token limit")
# Report issues
if issues:
print("Data Validation Issues:")
for issue in issues[:10]: # Show first 10
print(f" - {issue}")
if len(issues) > 10:
print(f" ... and {len(issues) - 10} more issues")
return False
# Check distribution balance
print("\nCompletion Distribution:")
for completion, count in completion_distribution.most_common():
print(f" {completion}: {count}")
# Warn about imbalance
counts = list(completion_distribution.values())
if max(counts) > 3 * min(counts):
print("\nWarning: Imbalanced dataset detected")
print("Consider balancing or stratified sampling")
print("\nValidation passed!")
return True
# Example usage
validate_training_data('training_data.jsonl')
```
2. Data Augmentation
```python
def augment_training_data(
input_path: str,
output_path: str,
augmentation_factor: int = 2
):
"""
Augment training data with paraphrasing and variations.
Args:
input_path: Original training data
output_path: Augmented output file
augmentation_factor: Multiplier for dataset size
"""
import random
# Load original data
original_data = []
with open(input_path, 'r') as f:
for line in f:
original_data.append(json.loads(line))
# Augmentation strategies
prompt_prefixes = [
"",
"Please ",
"Could you ",
"I need you to "
]
augmented_data = []
for example in original_data:
# Include original
augmented_data.append(example)
# Create variations
for _ in range(augmentation_factor - 1):
prefix = random.choice(prompt_prefixes)
augmented_example = {
'prompt': prefix + example['prompt'],
'completion': example['completion']
}
augmented_data.append(augmented_example)
# Save augmented data
with open(output_path, 'w') as f:
for example in augmented_data:
f.write(json.dumps(example) + '\n')
print(f"Augmented {len(original_data)} β {len(augmented_data)} examples")
# Example usage
augment_training_data('training_data.jsonl', 'training_data_augmented.jsonl')
```
3. Train/Validation Split
```python
def split_dataset(
input_path: str,
train_path: str,
val_path: str,
val_split: float = 0.2
):
"""
Split dataset into training and validation sets.
Args:
input_path: Full dataset JSONL
train_path: Output training JSONL
val_path: Output validation JSONL
val_split: Fraction for validation (0.1-0.3)
"""
import random
# Load data
data = []
with open(input_path, 'r') as f:
for line in f:
data.append(json.loads(line))
# Shuffle
random.shuffle(data)
# Split
val_size = int(len(data) * val_split)
train_data = data[val_size:]
val_data = data[:val_size]
# Save
with open(train_path, 'w') as f:
for example in train_data:
f.write(json.dumps(example) + '\n')
with open(val_path, 'w') as f:
for example in val_data:
f.write(json.dumps(example) + '\n')
print(f"Split: {len(train_data)} training, {len(val_data)} validation")
# Example usage
split_dataset('full_dataset.jsonl', 'training.jsonl', 'validation.jsonl', val_split=0.2)
```
Cost Considerations
Training Costs
Cost Structure:
- Fine-tuning: $0.01-0.05 per 1000 tokens processed
- Continued pre-training: $0.02-0.08 per 1000 tokens processed
- Reinforcement fine-tuning: $0.03-0.10 per 1000 tokens processed
- Distillation: $0.02-0.06 per 1000 tokens processed
Example Calculations:
```python
def estimate_training_cost(
num_examples: int,
avg_tokens_per_example: int,
num_epochs: int,
cost_per_1k_tokens: float = 0.03
) -> float:
"""
Estimate training cost.
Args:
num_examples: Number of training examples
avg_tokens_per_example: Average tokens (prompt + completion)
num_epochs: Training epochs
cost_per_1k_tokens: Cost rate
Returns:
Estimated cost in USD
"""
total_tokens = num_examples avg_tokens_per_example num_epochs
cost = (total_tokens / 1000) * cost_per_1k_tokens
print(f"Training Examples: {num_examples:,}")
print(f"Avg Tokens/Example: {avg_tokens_per_example}")
print(f"Epochs: {num_epochs}")
print(f"Total Tokens: {total_tokens:,}")
print(f"Estimated Cost: ${cost:.2f}")
return cost
# Example: Fine-tune with 1000 examples
estimate_training_cost(
num_examples=1000,
avg_tokens_per_example=500,
num_epochs=3,
cost_per_1k_tokens=0.03
)
# Output: ~$45
```
Inference Costs
Provisioned Throughput Pricing:
- Model Units: $X per hour per unit
- Cost varies by base model
- Minimum commitment: 1 month or 6 months
Cost Optimization:
```python
def compare_model_costs(
requests_per_day: int,
avg_tokens_per_request: int
):
"""
Compare on-demand vs provisioned vs distilled model costs.
"""
# Base Claude 3.5 Sonnet on-demand: $3/$15 per 1M tokens
base_cost_input = (requests_per_day avg_tokens_per_request 30) / 1_000_000 * 3
base_cost_output = (requests_per_day avg_tokens_per_request 0.5 30) / 1_000_000 15
base_monthly = base_cost_input + base_cost_output
# Provisioned throughput: ~$2500/month per unit
provisioned_monthly = 2500
# Distilled to Haiku: 50% cost reduction
distilled_monthly = base_monthly * 0.5
print(f"Monthly Cost Comparison ({requests_per_day:,} requests/day):")
print(f" Base Model On-Demand: ${base_monthly:.2f}")
print(f" Provisioned (1 unit): ${provisioned_monthly:.2f}")
print(f" Distilled Model: ${distilled_monthly:.2f}")
# Breakeven analysis
if base_monthly > provisioned_monthly:
print(f"\nProvisioned throughput recommended (saves ${base_monthly - provisioned_monthly:.2f}/mo)")
else:
print(f"\nOn-demand recommended (saves ${provisioned_monthly - base_monthly:.2f}/mo)")
# Example comparison
compare_model_costs(requests_per_day=10000, avg_tokens_per_request=1000)
```
Related Skills
- bedrock-inference: Invoke foundation models and custom models
- bedrock-knowledge-bases: RAG with custom models
- bedrock-guardrails: Apply safety policies to custom models
- bedrock-agentcore: Build agents with custom models
- claude-cost-optimization: Optimize model selection and costs
- claude-context-management: Manage context for custom models
- boto3-ecs: Deploy custom model inference on ECS
- boto3-eks: Deploy custom model inference on EKS
Additional Resources
- [Bedrock Model Customization Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization.html)
- [Fine-tuning Best Practices](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-prepare.html)
- [Custom Model Pricing](https://aws.amazon.com/bedrock/pricing/)
- [Reinforcement Fine-tuning Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-rlhf.html) (2025)
More from this repository10
xai-stock-sentiment skill from adaptationio/skrillz
Generates Ralph-compatible prompts for single implementation tasks with clear completion criteria and automatic verification.
Orchestrates continuous autonomous coding sessions, managing feature implementation, testing, and progress tracking with intelligent checkpointing and recovery mechanisms.
auto-claude-troubleshooting skill from adaptationio/skrillz
auto-claude-setup skill from adaptationio/skrillz
Analyzes Claude Code observability data to generate insights on performance, costs, errors, tool usage, sessions, conversations, and subagents through advanced metrics and log querying.
xai-financial-integration skill from adaptationio/skrillz
xai-agent-tools skill from adaptationio/skrillz
xai-crypto-sentiment skill from adaptationio/skrillz
auto-claude-memory skill from adaptationio/skrillz