NLP Toolkit contains implementations of several Natural Language Processing and Machine Learning algorithms. Although initial implementations are based on Turkish language, the
system currently contains basic modeling of 3 languages, namely English, Turkish, and Persian.
The algorithms implemented are
- Classification Algorithms
- C4.5 Decision Tree Classifier
- C4.5 Stump
- K-Layer Multilyer Perceptron
- Dummy Classifier
- K-Nearest Neighbor Classifier
- Linear Discriminant Analysis
- Linear Perceptron
- MultiLayer Perceptron
- Naive Bayes
- Quadratic Discriminant Analysis
- Random Classifier
- Random Forest
- Support Vector Machine
- Turkish Language Checker
- Turkish Sentence Segmentation
- Turkish Asciifier/Deasciifier
- Simple Deasciifier Based on Morphological Analysis
- N-Gram Based Deasciifier
- Information Retrieval[4, 6]
- Incidence Matrix
- Inverted Index
- Positional Posting
- Skip List
- Turkish Morphological Analysis
- Finite State Transducer based Morphological Analyzer[7, 8]
- Turkish Morphological Disambiguation
- Simple Morphological Disambiguation
- HMM based Morphological Disambiguation
- No Smoothing
- Additive Smoothing
- Interpolated Smoothing
- Laplace Smoothing
- Good Turing Smoothing
- Simple Smoothing
- English Part of Speech Tagger
- HMM based Part of Speech Tagging
- Syntactic Parser
- Probabilistic Parser
- Probabilistic CYK Parser
- Probabilistic Earley Parser
- Sentence Alignment
- Turkish Spell Checker
- Simple Spell Checker Based on Morphological Analysis
- N-Gram Based Spell Checker
- Translation Algorithms
- IBM Model-1
- IBM Model-2
- IBM Model-3
The system currently supports reading modules for
- Context Free Grammar
- Probabilistic Context Free Grammar
- Turkish Corpus
- Turkish dictionary
- English dictionary
- English-Turkish dictionary
- Ottoman dictionary
- Morphological Disambiguation Corpus
- NER Corpus
- POS Tagging Corpus
- English-Turkish Translation Corpora
- TreeBank Corpora[1, 2, 3, 5]
- English WordNet
- Turkish WordNet
- Gorgun, O., O. T. Yildiz, E. Solak, R. Ehsani, "English-Turkish Parallel Treebank with Morphological Annotations and its Use in Tree-based SMT", International Conference on Pattern Recognition and Methods (ICPRAM), pp. 510-516, Rome, Italy, 2016.
- Yildiz, O. T., S. Candir, E. Solak, R. Ehsani, O. Gorgun, "Constructing a Turkish Constituency Parse TreeBank", International Conference on Computer and Information Sciences (ISCIS), pp. 339-347, Krakow, Poland, 2015.
- Yildiz, O. T., E. Solak, R. Ehsani, O. Gorgun, "Chunking in Turkish with Conditional Random Fields", International Conference on Intelligent Text Processing and Computational Linguistics (CICLING), Cairo, Egypt, 2015.
- Duzagac, R., O. T. Yildiz, "Context Sensitive Search Engine", International Conference on Computer and Information Sciences (ISCIS), pp. 277-284, Krakow, Poland, 2014.
- Yildiz, O. T., E. Solak, O. Gorgun, R. Ehsani, "Constructing a Turkish-English Parallel Treebank", Annual Meeting of the Association for Computational Linguistics (ACL), Baltimore, U.S.A., 2014.
- Yildiz, O. T., A. Okutan, E. Solak, "Bilingual Software Requirements Tracing using Vector Space Model", International Conference on Pattern Recognition and Methods (ICPRAM), Angers, France, 2014.
- Gorgun, O, O. T. Yildiz, "Using Morphology In English-Turkish Statistical Machine Translation", Signal Processing and Communication Applications Conference (SIU), Antalya, Türkiye, 2012.
- Gorgun, O, O. T. Yildiz, "A Novel Approach to Morphological Disambiguation for Turkish", International Conference on Computer and Information Sciences (ISCIS), pp. 77-83, London, UK, 2011.
- Ak, K, O. T. Yildiz, "Unsupervised Morphological Analysis Using Tries", International Conference on Computer and Information Sciences (ISCIS), pp. 69-75, London, UK, 2011.