Adam#
- class vulkpy.nn.Adam#
Bases:
Optimizer
Adam Optimizer
See also
vulkpy.nn.SGD
SGD optimizer
Notes
This class implement Adam [adam1]. The algorithm utilizes moving averages of the 1st and 2nd order moment. The 1st (\(m_t\)) and 2nd (\(v_t\)) order moment are updated as follows;
\[\begin{split}m_t = \beta _1 m_{t-1} + (1 - \beta _1) g_t\\ v_t = \beta _2 v_{t-1} + (1 - \beta _2) g_t ^2\end{split}\]where \(g_t\) is gradient.
To mitigate initial underestimation, corrected \(\hat{m_t}\) and \(\hat{v_t}\) are used for parameter update.
\[\begin{split}\hat{m}_t = m_t / (1 - \beta _1 ^t)\\ \hat{v}_t = v_t / (1 - \beta _2 ^t)\end{split}\]Finally, parameter \(\theta _t\) is updated by
\[\theta _t = \theta _{t-1} - \text{lr} \times \hat{m}_t/(\sqrt{\hat{v}_t} + \epsilon)\]References
[adam1]D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization”, ICLR (Poster) 2015, https://dblp.org/rec/journals/corr/KingmaB14.html
Examples
>>> import vulkpy.vk >>> from vulkpy import nn >>> gpu = vk.GPU() >>> adam = nn.Adam(gpu, lr=0.001, beta1=0.9, beta2=0.999)
Methods Summary
init_state
(shape)Initialize Optimizer state
Methods Documentation
- init_state(shape: Iterable[int]) AdamState #
Initialize Optimizer state
- Parameters:
shape (iterable of ints) – Shape of parameter
- Returns:
Optimizer state
- Return type:
- __init__(gpu: GPU, *, lr: float = 0.001, beta1: float = 0.9, beta2: float = 0.999, eps: float = 1e-08)#
Initialize Adam Optimizer
- Parameters:
gpu (vulkpy.GPU) – GPU
lr (float, optional) – Adam parameter. The default is
0.001
.beta1 (float, optional) – Adam parameter. The default is
0.9
.beta2 (float, optional) – Adam parameter. The defeault is
0.999
.eps (float, optional) – Adam parameter. The default is
1e-8
.