Sie sind auf Seite 1von 2

End-to-end overview

(https://cloud.google.com/ml-engine/docs/technical-overview#end-to-end_overview)
1) Prepare your trainer and data for the cloud
Cloud ML Engine is your training application, written in TensorFlow
This application is responsible for defining your computation graph, managing the training and validation process, and
exporting your model
You need to follow a few guidelines about your approach to work well with cloud training
If you have a TensorFlow application that you have been running locally on a single computer, the biggest change you are
likely to need to make to your application is to add support for running it with distributed TensorFlow
Make your trainer into a Python package and stage it on Google Cloud Storage where your training job can access it

2) Train your model


With package , you can begin using Cloud ML Engine to train your model.
The training service allocates resources in the cloud according to specifications you include with your job request. It installs
your trainer package on each machine it allocates and runs each instance (called a replica).
While your trainer runs, it can write output to Google Cloud Storage locations. A trainer typically writes regular checkpoints
during training and exports the trained model at the end of the job
You can create a model resource to assign your trained model to and then deploy your model version. Your hosted model
version can then be used to get predictions for new data

3) Get predictions
Cloud ML Engine supports two kinds of prediction: online and batch
Online prediction is optimized for handling a high rate of requests with minimal latency. You give it your data in a JSON
request string and it returns predictions for that data in its response message.
Batch prediction is optimized for getting inferences for large collections of data with minimal job duration.
You put your input data instances in files on Google Cloud Storage and pass their paths to a job request.
The service allocates a cluster of machines to run your prediction job and distributes your input data among them. It saves
your predictions to files in a Google Cloud Storage location that you specify

You should have completed these steps before you submit training jobs:
 Configure your development environment.
 Develop your trainer application with TensorFlow.
 Package your application and upload it and any unusual dependencies to a Google Cloud Storage bucket (this
step is included in job creation when you use the gcloud command-line tool).

Configure your development environment


Before you can start a job you need to assemble its configuration details. The details are the required input objects in the
Job resource, including the items in the TrainingInput resource. You can read more about the configuration information that
you need to provide along with the other training concepts. Gather the following information to package your trainer:
 Package path
 Job directory
 Dependency paths
 Module name
 Staging bucket
 Job name
 Job directory

Building your trainer package manually


(https://cloud.google.com/ml-engine/docs/packaging-trainer)
The following structure is commonly used in Cloud ML Engine samples, and having your project's organization be similar to
the samples can make them easier to follow
 Use a main project directory, containing your setup.py file.
 Use a subdirectory named trainer to store your main application module.
 Name your main trainer application module task.py.
 Make whatever other subdirectories in your main project directory that you need to implement your application.
 Create an __init__.py file in every subdirectory. These files are used by Setuptools to identify directories with code
to package, and may be empty.

The breakdown of code is:


 task.py contains the trainer logic that manages the job.
 model.py contains the TensorFlow graph code—the logic of the model.
 util.py if present, contains code to run the trainer.

If you use the gcloud tool to package your application, you don't need to create a setup.py or any __init__.py files. When
you run gcloud ml-engine jobs submit training, you can set the --package_path argument to the path of your main
project directory, or you can run the tool from that directory and omit the argument altogether

To manually upload packages:


The easiest way to manually upload your package and any custom dependencies to your Google Cloud Storage bucket is to
use the gsutil tool:

gsutil cp /local/path/to/package.tar.gz gs://bucket/path/

This example command shows you how to use a zipped tarball package (called trainer-0.0.1.tar.gz here) that is in a Cloud
Storage bucket. The main function is in a module called task.py

PATH_TO_PACKAGED_TRAINER=gs://$CLOUD_STORAGE_BUCKET_NAME/trainer-0.0.0.tar.gz

gcloud ml-engine jobs submit training $JOB_NAME \


--job-dir $JOB_DIR \
--packages $PATH_TO_PACKAGED_TRAINER \
--module-name $MAIN_TRAINER_MODULE \
--region us-central1 \
-- \
--trainer_arg_1 value_1 \
...
--trainer_arg_n value_n

Das könnte Ihnen auch gefallen