JST_LOGO.JPG NICT_LOGO.JPG

ASPEC

(Asian Scientific Paper Excerpt Corpus)

<-Back

This page describes the cautions of the corpus.

ASPEC-JE

date expressions at the end of Japanese sentence

Almost all the date expressions at the end of the Japanese sentences do not have the corresponding expressions in the English sentences.

e.g.
B-94A0894379 ||| 3 ||| 材料開発における発想支援のためには,ユーザの側の 操作が重要であるため,原子レベルで の物質操作のためのインターフェイスを 開発した[1994.8] ||| Because user operation is important for the idea support in material development, an interface for a substance operation at atomic level was developed.

The result of "grep '[19' file.txt -c" to train, dev, devtest and test data are 1994, 17, 19 and 23 respectively.

For the evaluation of WAT, the date expressions which do not have the corresponding expressions will be removed.


Japanese OOV word "標題"

Japanese word "標題" appears several times in the dev, devtest and test data. However, it never appears in the training data, thus it always becomes OOV.

ASPEC-JC

incomplete Chinese sentences

There are some incomplete Chinese sentences.

e.g.
NICT_JC_SP-IPSJ-JNL4312017-sec3.-par1-sen31 ||| の位置へ,中央の正規表現では”ki”にマッチする位置へ,最後の正規表現では”kik”にマッチするMigemoは,ユーザが1文字入力するごとに,指定された読みで始まる単語を正規表現に動的に展開してインクリメンタル検索を行う. ||| 。。

The result of "grep '||| 。。' file.txt -c" to train, dev, devtest and test data are 10, 0, 0 and 1 respectively.

For the evaluation of WAT, these sentences will be removed.

CHANGE LOG

2014-05-05: date expressions at the end of Japanese sentence


JST (Japan Science and Technology Agency)
Last Modified: 2014-05-15