神戸大学附属図書館デジタルアーカイブ
入力補助
English
カテゴリ
学内刊行物
ランキング
アクセスランキング
ダウンロードランキング
https://doi.org/10.24546/81006689
このアイテムのアクセス数:
2
件
(
2024-04-24
20:58 集計
)
閲覧可能ファイル
ファイル
フォーマット
サイズ
閲覧回数
説明
81006689 (fulltext)
pdf
11.7 MB
3
メタデータ
ファイル出力
メタデータID
81006689
アクセス権
open access
出版タイプ
Version of Record
タイトル
XML Encoding for Spoken Learner (and Other) Corpora : A Modest Approach
著者
著者名
Hardie, Andrew
収録物名
Learner Corpus Studies in Asia and the World
巻(号)
2
ページ
49-62
出版者
神戸大学国際コミュニケーションセンター
刊行日
2014-05-31
公開日
2014-06-25
抄録(自由利用可)
Since the earliest days of corpus linguistics, markup has been used to represent features of corpus texts other than the actual words of the text. The first systems used were somewhat ad hoc, often based on using (sequences o!l punctuation marks to indicate para-linguistic properties of texts, over time the field has standardized on markup systems based on tags delimited by <angle brackets>: first SGML, and more recently the derived XML system. While the official standards for the encoding of corpora with XML - most notably the guidelines of the Text Encoding Initiative - are extremely heavyweight, and therefore most suitable for the development of large-scale reference datasets, I argue that a more modest level of XML can productively be applied within the context of corpora developed by individual researchers or small teams. To apply "Modest XML", it is necessary only to comply with certain fundamental rules of XML <Such as the layout of tags for regions and for points within the text, the use of attribute-values to add extra information, the use of a single document"level tag in each XML file, and the use of entities to represent certain special characters). However, it is also useful to be aware of certain de facto standard XML tags and attributes very commonly used in spoken corpora <Such as <U>, <vocl>. <pause/>, <unclear> and so on}. Finally, it is always possible to extend the XML vocabulary that one uses to support the specific needs of a particular corpus development project.
カテゴリ
Learner Corpus Studies in Asia and the World
>
2号(2014-05-31)
紀要論文
詳細を表示
資源タイプ
departmental bulletin paper
言語
English (英語)
ISSN
2187-6746
OPACで所蔵を検索
CiNiiで学外所蔵を検索
関連情報
URI
http://www.solac.kobe-u.ac.jp/
ホームへ戻る