Adam#

class vulkpy.nn.Adam#

Bases: Optimizer

Adam Optimizer

See also

vulkpy.nn.SGD: SGD optimizer

Notes

This class implement Adam [adam1]. The algorithm utilizes moving averages of the 1st and 2nd order moment. The 1st (\(m_t\)) and 2nd (\(v_t\)) order moment are updated as follows;

\[\begin{split}m_t = \beta _1 m_{t-1} + (1 - \beta _1) g_t\\ v_t = \beta _2 v_{t-1} + (1 - \beta _2) g_t ^2\end{split}\]

where \(g_t\) is gradient.

To mitigate initial underestimation, corrected \(\hat{m_t}\) and \(\hat{v_t}\) are used for parameter update.

\[\begin{split}\hat{m}_t = m_t / (1 - \beta _1 ^t)\\ \hat{v}_t = v_t / (1 - \beta _2 ^t)\end{split}\]

Finally, parameter \(\theta _t\) is updated by

\[\theta _t = \theta _{t-1} - \text{lr} \times \hat{m}_t/(\sqrt{\hat{v}_t} + \epsilon)\]

References

[adam1]

D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization”, ICLR (Poster) 2015, https://dblp.org/rec/journals/corr/KingmaB14.html

Examples

>>> import vulkpy.vk
>>> from vulkpy import nn
>>> gpu = vk.GPU()
>>> adam = nn.Adam(gpu, lr=0.001, beta1=0.9, beta2=0.999)

Methods Summary

init_state(shape)

Initialize Optimizer state

Methods Documentation

init_state(shape: Iterable[int]) → AdamState#

Initialize Optimizer state

Parameters:: shape (iterable of ints) – Shape of parameter
Returns:: Optimizer state
Return type:: AdamState

__init__(gpu: GPU, *, lr: float = 0.001, beta1: float = 0.9, beta2: float = 0.999, eps: float = 1e-08)#

Initialize Adam Optimizer

Parameters:

gpu (vulkpy.GPU) – GPU
lr (float, optional) – Adam parameter. The default is 0.001.
beta1 (float, optional) – Adam parameter. The default is 0.9.
beta2 (float, optional) – Adam parameter. The defeault is 0.999.
eps (float, optional) – Adam parameter. The default is 1e-8.