G
GitIgnore.pro

Data Science .gitignore Lifecycle

Comprehensive .gitignore templates for ML/AI projects. From research and experimentation to production deployment and MLOps workflows.

EDA & Research

Jupyter notebooks & datasets

Model Training

Feature engineering & ML

Experiments

MLflow, W&B tracking

MLOps

Production deployment

Choose Your ML Project Stage

Research & EDA

Exploratory data analysis and research experiments

Jupyter notebooksRaw datasetsExploratory analysisPrototype models

Model Development

Feature engineering and model training workflows

Feature engineeringModel trainingHyperparameter tuningCross-validation

Experiment Tracking

MLflow, Weights & Biases, and experiment management

MLflow trackingW&B experimentsModel registryArtifact storage

Production MLOps

Model serving, monitoring, and production deployment

Model servingCI/CD pipelinesMonitoringA/B testing

Research & EDA

Exploratory data analysis and research experiments

# Data Science Research & EDA .gitignore

# Jupyter Notebook checkpoints
.ipynb_checkpoints/
*/.ipynb_checkpoints/*
.jupyter/
.ipython/

# Raw datasets (usually large and should not be in version control)
data/raw/
data/external/
datasets/
*.csv
*.tsv
*.json
*.parquet
*.hdf5
*.h5
*.pickle
*.pkl
*.feather

# Exploratory data analysis outputs
eda_output/
exploratory/
analysis_results/
plots/
figures/
*.png
*.jpg
*.jpeg
*.svg
*.pdf

# Research notebooks and temporary files
research_notebooks/
scratch/
temp_analysis/
prototype/
*.tmp
*.temp

# Data profiling reports
profiling_reports/
data_quality/
*.html
report_*.json

# Virtual environment
venv/
.venv/
env/
ENV/
conda-env/
.conda/

# Python cache
__pycache__/
*.py[cod]
*$py.class
*.so

# Research databases and caches
cache/
.cache/
*.db
*.sqlite3
research.db

# Temporary model files
temp_models/
scratch_models/
*.joblib
*.model

# IDE and editor files
.vscode/
.idea/
*.swp
*.swo

# OS files
.DS_Store
Thumbs.db

# Environment variables
.env
.env.local
research_config.py

# Logs
*.log
research.log
analysis.log

# Statistical analysis outputs
stats_output/
correlation_matrices/
feature_importance/

# Documentation drafts
draft_docs/
research_notes/
*.md.backup

# Testing and validation
test_results/
validation_output/
cross_validation/

ML Framework Integration

๐Ÿง 

TensorFlow / Keras

Deep learning and neural networks

# TensorFlow/Keras *.tfrecord *.tfevents.* tf_logs/ saved_model/ *.pb *.h5 *.hdf5 checkpoints/ tensorboard_logs/
๐Ÿ”ฅ

PyTorch

Dynamic neural networks and research

# PyTorch *.pth *.pt *.ckpt lightning_logs/ torch_cache/ .torch/ torchvision_cache/
๐Ÿ”ฌ

scikit-learn

Traditional ML algorithms

# scikit-learn *.joblib sklearn_models/ pipeline_cache/ grid_search_results/ cross_val_cache/
๐Ÿ“Š

MLflow

Experiment tracking and model registry

# MLflow mlruns/ mlflow.db mlartifacts/ model_registry/ mlflow_tracking/
๐Ÿ“ˆ

Weights & Biases

Experiment tracking and collaboration

# Weights & Biases wandb/ .wandb/ wandb-offline/ wandb-metadata.json wandb-summary.json
๐Ÿค—

Hugging Face

Transformers and NLP models

# Hugging Face transformers_cache/ .cache/huggingface/ models--*/ tokenizers_cache/ *.safetensors

Data Management Strategy

Data TypeFile ExtensionsTypical SizeVersion Control Strategy
Structured Data
.csv.tsv.parquet.feather
Medium (MB - GB)Sample data in repo, full data external
Unstructured Data
.json.txt.pdf.docx
Large (GB - TB)Metadata only, data in cloud storage
Image Data
.jpg.png.tiff.dicom
Very Large (TB+)DVC or cloud storage with versioning
Audio/Video
.wav.mp3.mp4.avi
Extremely Large (PB)External storage with metadata tracking

Data Science Best Practices

Data Handling

๐Ÿšซ Never Commit:

Raw datasets, personal information, API keys, large model files

โœ… Always Include:

Data schemas, sample data, preprocessing scripts, feature definitions

Model Management

๐Ÿ“Š Model Artifacts

Use model registry (MLflow) instead of git for model versioning

๐Ÿ” Experiment Tracking

Keep experiment metadata in code, results in tracking systems

Accelerate Your ML Development

Start your ML project with the right .gitignore configuration. From research notebooks to production deployment, maintain clean repositories and efficient workflows.

๐Ÿ” Analyze ML Repository โ†’