Beruflich Dokumente
Kultur Dokumente
Ming Gong§, Linjun Shou§, Wutao Lin§, Zhijie SangⱠ, Quanjia Yanⱡ, Ze YangⱠ, Daxin Jiang§
§
STCA NLP Group, Microsoft, Beijing, China
ⱠCenter for Intelligence Science and Technology, BUPT, Beijing, China
ⱡResearch Center for Ubiquitous Computing Systems, Institute of Computing Technology, CAS, Beijing, China
{migon, lisho, wutlin, djiang}@microsoft.com
{woailaosang, yangze01}@bupt.edu.cn
yanquanjia17s@ict.ac.cn
1 https://github.com/Microsoft/NeuronBlocks
1
which provides a suite of reusable and standard DNN model building all stages, including training,
components, as well as a set of popular models for test and inference.
common tasks. NeuronBlocks minimizes NeuronBlocks is built on PyTorch3 , including
duplicated efforts of code writing during DNN two major components: Block Zoo and Model Zoo.
model training. With NeuronBlocks, users only In Block Zoo, we abstract and encapsulate the most
need to define a simple JSON configuration file to commonly used components of deep neural
design and train models, which enables engineers networks into standard and reusable blocks. These
to focus more on high level model design, instead block units act like building blocks to form various
of programming language and framework details. complicated neural network architectures for
The technical contributions of NeuronBlocks different NLP tasks. In Model Zoo, we select a rich
are summarized as three aspects: scope of model architectures covering popular
Block Zoo: abstract the most commonly used NLP tasks in the form of simple JSON
components of DNN into standard and configuration files. With Model Zoo, users can
reusable blocks. easily navigate these model configurations and
Model Zoo: provide a rich scope of model select propriate models for their target task, modify
architectures covering popular NLP tasks. a few configurations if needed, and then start
training immediately without any coding efforts.
Platform Compatibility: support both Linux
and Windows machines, CPU/GPU chips, as 3.1 Overall Framework
well as GPU platforms like PAI2.
The overall framework of NeuronBlocks is shown
in Figure 1 which consists of two major
2 Related Work
components: Block Zoo and Model Zoo. In Block
Currently, there are several general-purpose deep
Zoo, we provide commonly used neural network
learning frameworks, such as TensorFlow (Abadi
components as building blocks for model
et al., 2016), PyTorch (Paszke et al., 2017) and
architecture design. Model Zoo consists of various
Keras (Chollet, 2015), which have gained DNN models for common NLP tasks, in the form
popularity in NLP community. These frameworks of JSON configuration files.
offer a huge flexibility in DNN model design and
support various NLP tasks. However, building GPU Box (Train)
model with these frameworks still require heavy
cost to learn framework details. Therefore, we need PAI (Train) Model Visualizer
higher levels of abstraction for the purpose of
skipping the framework details. Other Popular
Slot Tagging Q-Q sim Domain classifier
deep learning toolkits in NLP include OpenNMT
(Klein, Kim, Deng, Senellart, & Rush, 2017), Textual Entailment QnA Relevance MRC
AllenNLP (Gardner et al., 2018) etc. OpenNMT is
an open-source toolkit mainly targeting neural Model Zoo (JSONs)
machine translation or other natural language
generation tasks. AllenNLP provides several pre-
Convolution Pooling LSTM
built models for NLP tasks like semantic role
labeling, machine comprehension, textual Transformer
GRU QRNN
entailment, etc. However, to create a new model
usually requires complex programming work. GloVe CharEmbedding PosTagEmbedding
2
https://github.com/Microsoft/pai
3
https://pytorch.org
2
3.2 Block Zoo Sequence tagging. Given a sentence, classify
Blocks for deep learning models are like bricks for each token in the sequence into some
a tall building. Block Zoo consists of key predefined categories. Common tasks include
NER, POS tagging, slot tagging, etc.
components of neural networks. In current released
version, blocks supported by NeuronBlocks are Regression task. Unlike classification, the
summarized as follows (more will be added): output of a regression task is a continuous real
Embedding Layer: Word embedding, number.
character embedding and POS tagging Extractive machine reading comprehension.
embedding are supported. Given a question and a passage, predict
Neural Network Layers: LSTM (Hochreiter & whether a span from the passage a direct
Schmidhuber, 1997), GRU (Chung, Gulcehre, answer to the question or not.
Cho, & Bengio, 2014), Bidirectional LSTM,
3.4 Model Configuration
Bidirectional GRU, QRNN (Bradbury, Merity,
Xiong, & Socher, 2016), Bidirectional QRNN, To train a DNN model or inference with one
Transformer (Vaswani et al., 2017), CNN, exiting model, users need to define model
Pooling, Highway network (R. K. Srivastava, architecture, training data and hyper-parameters in
Greff, & Schmidhuber, 2015), Encoder a JSON configuration file. In this subsection, we
Decoder architecture, etc. will introduce the details of the configuration file
Attention Mechanism: Linear Attention, through an example for question answer matching
Bilinear Attention, Full Attention (Huang, Zhu, task. The configuration file consists of Inputs,
Shen, & Chen, 2017), Bidirectional attention Outputs, Training Parameters, Model Architecture,
flow (Seo, Kembhavi, Farhadi, & Hajishirzi, Loss, Metrics, etc.
2016), etc. Inputs: This part defines the data input
Mathematical Operations: Add, Minus, configuration, including training data location,
Elementwise Multiply, Matrix Multiply, etc. data schema, model inputs, model targets, etc.
"inputs": {
Regularization Layers: Dropout (N. Srivastava, "use_cache": true,
Hinton, Krizhevsky, Sutskever, & "dataset_type": "classification",
Salakhutdinov, 2014), Layer Norm, etc. "data_paths": {
"train_data_path":
Optimizers: All the optimizers defined in "./dataset/demo/train.tsv",
"valid_data_path":
PyTorch are supported by default. "./dataset/demo/valid.tsv",
Loss Function: All the loss functions offered "test_data_path":
"./dataset/demo/test.tsv",
by PyTorch are supported. Additionally, we "predict_data_path":
offer more options, such as Focal Loss (Lin, "./dataset/demo/predict.tsv",
"pre_trained_emb":
Goyal, Girshick, He, & Dollár, 2017). "./dataset/GloVe/glove.840B.300d.txt"
Metrics: For classification task, AUC, },
"file_with_col_header": false,
Accuracy, Recall, Precision, F1 metrics are "pretrained_emb_type": "glove",
supported. For sequence tagging task, F1 and "pretrained_emb_binary_or_text": "text",
"involve_all_words_in_pretrained_emb":
Accuracy are supported. For regression task, false,
MSE and RMSE are supported. For machine "add_start_end_for_seq": true,
reading comprehension task, Exact Match and "file_header": {
"question_text": 0,
F1 are supported. "answer_text": 1,
"label": 2
3.3 Model Zoo },
"predict_file_header": {
NeuronBlocks supports 4 major NLP tasks (more "question_text": 0,
will be added): "answer_text": 1
},
Text classification, including single sentence, "model_inputs": {
sentence pair classification, such as query "question": ["question_text"],
domain classification, intent classification, "answer": ["answer_text"]
},
natural language inference, etc. "target": ["label"]
}
3
Outputs: This part defines model saving paths, "dropout": 0.2,
"num_layers": 2
cache paths, log paths, etc. },
"outputs": { "inputs": ["answer"]
"save_base_dir": "./models/demo/", },
"model_name": "model.nb", {
"train_log_name": "train.log", "layer_id": "question_2",
"test_log_name": "test.log", "layer": "Pooling",
"predict_log_name": "predict.log", "conf": {
"predict_fields": ["prediction", "pool_axis": 1,
"confidence"], "pool_type": "max"
"predict_output_name": "predict.tsv", },
"cache_dir": ".cache.demo/" "inputs": ["question_1"]
} },
{
Training Parameters: In this part, model "layer_id": "answer_2",
"layer": "question_2",
optimizer and all other training hyper "inputs": ["answer_1"]
parameters are indicated. All optimizers },
defined in PyTorch are supported. {
"layer_id": "comb",
"training_params": { "layer": "Combination",
"optimizer": { "conf": {
"name": "Adam", "operations": ["origin",
"params": { "difference", "dot_multiply"]
"lr": 0.001 },
} "inputs": ["question_2", "answer_2"]
}, },
"use_gpu": false, {
"batch_size": 30, "output_layer_flag": true,
"batch_num_to_show_results": 10, "layer_id": "output",
"max_epoch": 3, "layer": "Linear",
"valid_times_per_epoch": 1, "conf": {
"max_lengths": { "hidden_dim": [128, 2],
"question_text": 30, "activation": "PReLU",
"answer_text": 100 "batch_norm": true,
} "last_hidden_activation": false,
} "last_hidden_softmax": false
},
Model Architecture: This part is the key "inputs": ["comb"]
component of the configuration file, which }
define the whole model architecture. Basically, ]
it consists of a list of layers/blocks to form the Loss: Loss function of the model is defined
architecture, based on the blocks supported in here.
the Block Zoo. "loss": {
"architecture": [ "losses": [
{ {
"layer": "Embedding", "type": "CrossEntropyLoss",
"conf": { "conf": {
"word": { "size_average": true
"cols": ["question_text", },
"answer_text"], "inputs": ["output", "label"]
"dim": 300 }
} ]
} }
},
{ Metrics: Task metrics are defined here. Users
"layer_id": "question_1", can define a metric list that they want to check
"layer": "BiLSTM",
"conf": {
during model testing and validation stages.
"hidden_dim": 64, The first metric in the metric list is what our
"dropout": 0.2, toolkit uses to select the best model during
"num_layers": 2
}, training.
"inputs": ["question"] "metrics": ["auc","accuracy"]
},
{
"layer_id": "answer_1", 3.5 Workflow
"layer": "BiLSTM",
"conf": { The whole workflow of leveraging NeuronBlocks
"hidden_dim": 64, for model building is quite simple. Users only need
4
to write a JSON configuration file, by either using output
hidden_dim
Linear
[128, 2]
existing models from Model Zoo or building new activation
last_hidden_activation
PReLU
False
models with blocks from Block Zoo. This last_hidden_softmax False
config file. Experienced users can even add new num_layers 2 num_layers 2
training on GPU management platform like PAI. dim: 300 dim: 300
Table 1: NeuronBlocks’ results on the GLUE benchmark’ development sets. As described in (Wang et al., 2018), for
CoLA, we report Matthews correlation. For QQP, we report accuracy and F1. For MNLI, we report accuracy averaged
over the matched and mismatched development sets. For all other tasks we report accuracy. All values have been
scaled by 100.
5
build models on top of Model Zoo and Block
4.2 WikiQA Zoo. With AutoML support, automatic model
architecture design can be achieved for
WikiQA corpus (Yang et al., 2015) is a publicly specific task and data.
available dataset for open-domain question
answering. This dataset contains 3,047 questions 6 Conclusion
from Bing query logs, each associated with some In this paper, we introduce NeuronBlocks, a DNN
candidate answer sentences from Wikipedia. As toolkit for NLP tasks built on PyTorch.
shown in Table 2, we conducted experiments on NeuronBlocks provides easy-to-use model training
WikiQA dataset, using CNN, BiLSTM, and and inference experience for users. Model building
Attention based models. Area Under Curve (AUC) time can be significantly reduced from days to
metric is used as the evaluation criteria. hours, even minutes. We also provide a suite of
Model AUC pre-built DNN models for popular NLP tasks in
Model Zoo. Majority of the users can simply pick
CNN (paper baseline) 73.59 one from Model Zoo to start model training. For
CNN-Cnt (paper baseline) 75.33 some experienced users, they can also create new
blocks to Block Zoo, which will greatly benefit the
CNN (NeuronBlocks) 74.79
top users for model building. Besides, a series of
BiLSTM (NeuronBlocks) 76.73 experiments on real NLP tasks have been
BiLSTM+Attn (NeuronBlocks) 75.48 conducted to verify the effectiveness of this toolkit.
6
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., "./dataset/GloVe/glove.840B.300d.txt"
},
Clark, C., Lee, K., & Zettlemoyer, L. (2018). "file_with_col_header": false,
Deep contextualized word representations. "pretrained_emb_type": "glove",
"pretrained_emb_binary_or_text": "text",
ArXiv Preprint ArXiv:1802.05365. "involve_all_words_in_pretrained_emb": false,
"add_start_end_for_seq": true,
Radford, A., Narasimhan, K., Salimans, T., & "file_header": {
Sutskever, I. (2018). Improving language "question_text": 0,
understanding by generative pre-training. URL "answer_text": 1,
"label": 2
Https://S3-Us-West-2. Amazonaws. },
Com/Openai-Assets/Research- "predict_file_header": {
"question_text": 0,
Covers/Languageunsupervised/Language "answer_text": 1
Understanding Paper. Pdf. },
"model_inputs": {
Seo, M., Kembhavi, A., Farhadi, A., & Hajishirzi, H. "question": ["question_text"],
(2016). Bidirectional attention flow for machine "answer": ["answer_text"]
},
comprehension. ArXiv Preprint "target": ["label"]
ArXiv:1611.01603. },
"outputs": {
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, "save_base_dir": "./models/demo/",
"model_name": "model.nb",
I., & Salakhutdinov, R. (2014). Dropout: a "train_log_name": "train.log",
simple way to prevent neural networks from "test_log_name": "test.log",
overfitting. The Journal of Machine Learning "predict_log_name": "predict.log",
"predict_fields": ["prediction",
Research, 15(1), 1929–1958. "confidence"],
"predict_output_name": "predict.tsv",
Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). "cache_dir": ".cache.demo/"
Highway networks. ArXiv Preprint },
"training_params": {
ArXiv:1505.00387. "optimizer": {
"name": "Adam",
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., "params": {
Jones, L., Gomez, A. N., … Polosukhin, I. "lr": 0.001
}
(2017). Attention is all you need. In Advances in },
Neural Information Processing Systems (pp. "use_gpu": false,
5998–6008). "batch_size": 30,
"batch_num_to_show_results": 10,
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & "max_epoch": 3,
"valid_times_per_epoch": 1,
Bowman, S. R. (2018). Glue: A multi-task "max_lengths": {
benchmark and analysis platform for natural "question_text": 30,
"answer_text": 100
language understanding. ArXiv Preprint }
ArXiv:1804.07461. },
"architecture": [
Yang, Y., Yih, W., & Meek, C. (2015). Wikiqa: A {
challenge dataset for open-domain question "layer": "Embedding",
"conf": {
answering. In Proceedings of the 2015 "word": {
Conference on Empirical Methods in Natural "cols": ["question_text",
"answer_text"],
Language Processing (pp. 2013–2018). "dim": 300
}
Appendix }
},
Here is a complete example of JSON config file: {
"layer_id": "question_1",
"layer": "BiLSTM",
{ "conf": {
"license": "Copyright (c) Microsoft Corporation. "hidden_dim": 64,
All rights reserved. Licensed under the MIT "dropout": 0.2,
license.", "num_layers": 2
"tool_version": "1.1.0", },
"model_description": "This example shows how to "inputs": ["question"]
train/test/predict.", },
"inputs": { {
"use_cache": true, "layer_id": "answer_1",
"dataset_type": "classification", "layer": "BiLSTM",
"data_paths": { "conf": {
"train_data_path": "hidden_dim": 64,
"./dataset/demo/train.tsv", "dropout": 0.2,
"valid_data_path": "num_layers": 2
"./dataset/demo/valid.tsv", },
"test_data_path": "./dataset/demo/test.tsv", "inputs": ["answer"]
"predict_data_path": },
"./dataset/demo/predict.tsv", {
"pre_trained_emb": "layer_id": "question_2",
7
"layer": "Pooling",
"conf": {
"pool_axis": 1,
"pool_type": "max"
},
"inputs": ["question_1"]
},
{
"layer_id": "answer_2",
"layer": "question_2",
"inputs": ["answer_1"]
},
{
"layer_id": "comb",
"layer": "Combination",
"conf": {
"operations": ["origin", "difference",
"dot_multiply"]
},
"inputs": ["question_2", "answer_2"]
},
{
"output_layer_flag": true,
"layer_id": "output",
"layer": "Linear",
"conf": {
"hidden_dim": [128, 2],
"activation": "PReLU",
"batch_norm": true,
"last_hidden_activation": false,
"last_hidden_softmax": false
},
"inputs": ["comb"]
}
],
"loss": {
"losses": [
{
"type": "CrossEntropyLoss",
"conf": {
"size_average": true
},
"inputs": ["output", "label"]
}
]
},
"metrics": ["auc", "accuracy"]
}