Language ID for Short Texts: Evaluation Report

Abstract

This document provides a report on a large-scale evaluation of Language Identification tools applied to short text (tweets) on 30 distinct language and script combinations. This project investigated the performance of 11 open source, COTS and GOTS tools and measured language and script coverage, accuracy, precision/recall and performance (speed, scalability and robustness).

[The report was not public released so we cannot communicate the evaluation results]

Public released

http://No link available

External link

Not all documents are
available for download

Karine Megerdoomian, PhD