Japanese
Development of a Japanese-English Software Manual Parallel Corpus
This page lists the alignment data presented in
- Tatsuya Ishisaka, Masao Utiyama, Eiichiro Sumita, and Kazuhide Yamamoto. (2009) Development of a Japanese-English Software Manual Parallel Corpus. MT summit.
The license of these alignment data is Creative Commons Attribution-Share Alike 3.0 Unported. However, note that you should also follow the licenses of the original Japanese and English texts.
The format of this page is as follows.
Name of Software
- English site: URL / license
- Japanese site: URL / license
- Alignment data: je.tgz
je/
align/: alignment files
align/ the format of each file is
SCORE ||| NM ||| JA ||| EN
===============================================
name meaning
-----------------------------------------------
SCORE Score of the alignment
NM # of Japanese and English sentences are N and M
JA Japanese sentences
EN English sentences
===============================================
para.txt: 1-1, 1-2, 2-1 sentences from align/*
SCORE ||| JA ||| EN
Japanese is encoded in EUC.
Alignment Data
FreeBSD
Gentoo_Linux
JM
JF
NetBeans
PEAR
PHP
PostgreSQL
Python
XFree86
Last updated: Wed Jul 1 14:20:58 JST 2009