J. Imaging, Vol. 8, Pages 312: Seismic Waveform Inversion Capability on Resource-Constrained Edge Devices

In this work, our DCN architecture is implemented based on deep convolutional layers. Specifically, we modified the UNet and InversionNet architectures to be compatible with the input seismic data dimensions. Both architectures consist of the encoder and decoder. The encoder comprises of a set of convolution blocks which contains convolution layers, batch normalization [33] and ReLU [34]. The convolution layers convolve the input seismic data by using filters to extract relevant features. Batch normalization is used to normalize the inputs (zero-means, unit variance and decorrelated) to a layer for every mini-batch. This stabilizes the training process and makes the network converge much faster. The ReLU activation function accounts for non-linearities by assigning zero to negative input values. The decoder consists of a mixture of convolution and deconvolution blocks. The deconvolution block, also known as the transposed convolution expands the size of its input by padding zeros on the input feature maps.Modified UNet: Each convolutional layer in the UNet uses a fixed kernel size of 3×3. The channel dimensions used in the convolution layers are 64,128,256,512 and 1024, as the network depth increases. After the convolutions, the max pooling layer of kernel size 2×2 with stride 2 is applied to reduce the shapes of the feature maps to half of their previous shape. Deconvolution layers with the same channel dimensions as the convolution layers are applied to the feature maps to expand the shape of the output feature maps to be the same as the input. Specifically, we used a fixed kernel size of 5×5 with stride 2 in the deconvolution layers. Finally, the soft-max function is used to obtain the predicted label. The predicted label shows the pixels that belong to the salt body within the seismic data. We used the mean squared error (squared L2 norm) to compute the loss between the predicted label and ground truth label which is defined by:

L2(yg,yp)=∑i=1n|ypi−ygi|2

(7)

where yg= is the ground truth, yp= is the predicted velocity model and n is the number of spatial locations in the velocity model. Figure 4 shows the architecture of the UNet used for the seismic inversion. Mathematically, the operations in the UNet can be represented by the expression below:

y=UNet(x;Θ)=S(K2∗(P(α(K1∗x+b1)))+b2)

(8)

where UNet() represents non-linear mapping of the network, x denotes the input, y is the output, K1,K2 are the convolutional kernel dimensions, b1,b2 are the biases, Θ= is the set of parameters to be learned, α denotes the activation function such as the Rectified Linear Unit (ReLU), sigmoid, etc., P denotes the pooling function (e.g., max-pooling), “∗” represents the convolution operation, S() denotes the soft-max function. Since our approach is based on supervised learning, the network has to be fed with input-output pairs (seismic data and their corresponding velocity models). Given that our aim is to predict the velocity models using the seismic data, the UNet model learns a non-linear mapping between the seismic data (input) and their corresponding velocity model (output). Hence, the model projects the seismic data from the data distribution to model distribution. The network learns by solving the objective function below:

Θ^=argminΘ1pN∑n=1NL2(vn,DCN(dn;Θ))forv˜n=DCN(dn;Θ)

(9)

where p denotes the total number of pixels in a velocity model, L2(·) is the error value between the ground-truth values vn and predicted values v˜n. In order to update the learned parameters, Adam and back-propagation algorithms are used. The Adam optimizer [35] updates the parameters iteratively using:

Θt=Θt−1−α·m^t/(v^t+ϵ)

(10)

where m^t←mt/(1−β1t), v^t←vt/(1−β2t), α is the positive step size, ϵ=10−8, β1 and β2 have their default values of 0.9 and 0.999 respectively as used in the Adam paper.Modified InversionNet: In this architecture, the encoder is implemented with a stack of 14 convolution layers with the first layer having a kernel size of 7×1 and the next six layers having a kernel size of 3×3. We used a stride of 2×1 in the first convolution layer to reduce the data dimension to the velocity model dimension. The six convolution layers with kernel size 3×3 are used to extract spatial-temporal features in the data where a stride of 2 is used to downsample the data in each layer. Next, a convolution layer with kernel size 10×2 is used to flatten the feature maps to an output latent vector dimension (512 in this case). The decoder comprises of first deconvolution layer with kernel size 5×13 which is applied on the latent vector to produce a 5×13×512 tensor followed by a convolution layer with same input and output channel dimensions. After the first deconvolution layer, series of deconvolution-convolution operations are performed with kernel sizes of 4×4 and 3×3 in the deconvolution and convolution layers respectively. Finally, we use the negative padding technique with pad dimensions [−7,−8,−9,−10] to crop the feature maps and apply a 3×3 convolution layer to get an output of a single velocity map of shape (141×401). Both the convolution and deconvolution layers are followed with batch normalization and LeakyReLU activation function. The L1 loss function is used to compute the reconstruction error, which is given below:

L1(x,y)=1n∑i=1n|xi−yi|

(11)

where y= is the ground truth, x= is the predicted velocity model and n is the number of spatial locations in the velocity model.

View original article

JOURNAL OF IMAGING

Like

分享书签

0 0 0 0 0 0 0

More from this channel

J. Imaging, Vol. 8, Pages 312: Seismic Waveform Inversion Capability on Resource-Constrained Edge Devices

留言 (0)