Sie sind auf Seite 1von 2

Lesson 2:

Learning rate: if too small it takes long time. If too big, accuracy will be low because of big steps

Best LR:
lr_find(): looks at each mini batch/iteration, and mulptiplicatively increase the learning rate.
The learn.shed.plot_lr() (LR vs. iterations) has this information

learn.shed.plot() (LR vs. LOSS) : As LR increases decrease in loss is much high but after the
optimal learning rate point, the decrease in loss is constant/ may get worse
So, Pick the learning rate where the loss is still improving

Improving the model:


1. Get more Data
2. Data Augmentation: aug_tfms = transforms_side_on, max_zoom=1.1
Precompute = True ( use saved precomputed activations) data augmentations do not work
learn.precompute = False
learnt.fit()
3. cycle_len = 1 : Reset LR after every epoch. As you get closer, decrease LR (LR annealing).
Cosine annealing is commonly used. SGDR: LR annealing — restart — LR annealing. Do this
multiple times to find global minima.
4. learn.save(): saves the weights after the last epoch and lean.load() - load the weights
5. Fine tuning and differential learning rates:
- learn.unfreeze(): Unfreeze all the layers
- create an array of differential learning rates np.array([1e-4, 1e-3, 1e-2])
- Cycle_mult = 2: doubling the length of the cycle after each cycle. Each cycle takes twice as
many epochs as the previous cycle
learnfit(lr, epochs, cycle_len, cycle_mult)
6. Test time augmentations in validation set(learn.TTA()) : Do 4 random augmentations,
make predictions for augmented and original image, and take avrerage
7. Start train on small images and then train on big images will avoid over fitting
⁃ - ---- - -----
Steps:
1. Precompute=True
2. lr_find()
3. Train last layers precompute = False, with 2-3 epochs with cycle_len=1
4. Learn.unfreeze()
5. Set differential learning rates
6. Train full network with cycle_mult=2
7. Train on large image size

Lesson 3: CNNs
Learn.bn_freeze(True):Batch normalization - if your data is similar to imagine and using deeper
models, then add this.
Multiply every element of 3 x 3 matrix (Convolution) with every element of 3 x 3 section of
image
Filter 1
Filter 2
Relu
Max pool (Replace every 2X2 part of grid with max resulting half the size )
Filter 3
Filter 4
Relu
Maxpool
…….
Get the probabilities

⁃ Fully connected layer: Give weights to every activation in convolution layer


Weights * activation * kernels * n outputs - apply softmax function
⁃ Compare soft max to one hot encoded labels and calculate error

Das könnte Ihnen auch gefallen