Introduction
Welcome to the home page of the KANJIDIC2 project.
The files of this project are copyrighted by the Electronic Dictionary Research and Development Group and are available under the Group's licence.
The KANJIDIC2 project is aimed at producing a consolidated XML-format kanji database combining the information currently in the KANJIDIC (6,355 kanji from JIS X 0208) and KANJD212 (5,801 kanji from JIS X 0212) files (overview) (documentation), and adding information about the additional kanji in JIS X 0213.
Why do this? Well, XML is a great format for distributing data because many database packages can import files using XML. Also a growing number of software tools can handle XML. Many people want to use the data in KANJIDIC etc. but have trouble handling its format. In addition, I want to take advantage of the much richer data structure available in XML to add additional information and features to the database.
As with the JMdict project, an internal format is used for storage and editing, and the XML version is generated from that, as will the original KANJIDIC/KANJD212 files.
The main documentation is in the form of comments in the DTD, however an overview is available. Information about what has changed in each release is in the What's New page.
The Files
Currently available are:
The XSD has been automatically generated from the DTD using dtd2xsd.pl. Don't ask me about it.
General
At this stage the KANJIDIC2 file is officially released, but please understand that it is still early days for the project and changes in the structure may occur, so don't assume anything is set in concrete if you use the file in a project.
One major change, which will only be implemented gradually as it will require a lot of manual work, is to group the readings and the matching meanings.
The structure allows for meanings to be in more than one language. There are sources of material in French, Portuguese and Spanish, which it will be good to add.
Comments to Jim.
Jim Breen
March 2004