This page provides simple examples to get started accessing data and training models with TableShift. If you would like to set up customized experiments or run experiments at scale, check the examples in our github repository here.
1. Environment Setup¶
We recommend using the Conda environment in the TableShift github repo. You can create an environment with all dependencies required via
conda env create -f environment.yml
where environment.yml can be changed to point to the location of the file at the root of the TableShift git repo.
Once the conda environment is created, activate it with
conda activate tableshift
2. Model Training¶
TableShift includes implementations of 19 different models (described in the paper and in detail here). To train a model on a publicly-available dataset, you can you the example script provided in the examples directory:
python examples/run_expt.py --experiment diabetes_readmission --model xgb
3. [Optional/Advanced Usage] Distributed hyperparameter tuning with Ray¶
If you would like to run distributed hyperparameter tuning with Ray, you can run
ulimit -u 127590 && python scripts/ray_train.py \ --experiment adult \ --num_samples 2 \ --num_workers 1 \ --cpu_per_worker 4 \ --use_cached \ --models xgb
Please note that Ray can require careful tuning based on your system resources for optimal performance. For more information, check the Ray Tune documentation.