Home
 Publications
 C && Java İle Veri Yapılarına Giriş
 Android İle Mobil Programlamaya Giriş
 Swift İle Mobil Programlamaya Giriş
 Objective C İle Mobil Programlamaya Giriş
 Introduction to Data Structures with Java
 Projects
 Past Students
 Teaching
 NLP Toolkit
 Contact
 

About

NLP Toolkit contains implementations of several Natural Language Processing and Machine Learning algorithms. Although initial implementations are based on Turkish language, the system currently contains basic modeling of 3 languages, namely English, Turkish, and Persian.

Algorithms

The algorithms implemented are
  • Classification Algorithms[13]
    • Autoencoder
    • Bagging
    • C4.5 Decision Tree Classifier
    • C4.5 Stump
    • K-Layer Multilyer Perceptron
    • Dummy Classifier
    • K-Nearest Neighbor Classifier
    • Linear Discriminant Analysis
    • Linear Perceptron
    • MultiLayer Perceptron
    • Naive Bayes
    • Quadratic Discriminant Analysis
    • Random Classifier
    • Random Forest
    • Rocchio
    • Support Vector Machine
  • Turkish Language Checker
  • Turkish Sentence Segmentation (Training data)
  • Turkish Asciifier/Deasciifier
    • Simple Deasciifier Based on Morphological Analysis
    • N-Gram Based Deasciifier
  • Viterbi
  • Information Retrieval[9, 11]
    • Incidence Matrix
    • Inverted Index
    • Positional Posting
    • Skip List
  • Turkish Morphological Analysis
    • Finite State Transducer based Morphological Analyzer[12, 13]
  • Turkish Morphological Disambiguation[13]
    • Simple Morphological Disambiguation
    • HMM based Morphological Disambiguation
  • N-Gram[14]
    • No Smoothing
    • Additive Smoothing
    • Interpolated Smoothing
    • Laplace Smoothing
    • Good Turing Smoothing
    • Simple Smoothing
  • English Part of Speech Tagger
    • HMM based Part of Speech Tagging
  • Syntactic Parser
    • CYK Parser
    • Earley Parser
  • Probabilistic Parser
    • Probabilistic CYK Parser
    • Probabilistic Earley Parser
  • Sentence Alignment
  • Turkish Spell Checker
    • Simple Spell Checker Based on Morphological Analysis
    • N-Gram Based Spell Checker
  • Translation Algorithms[12]
    • IBM Model-1
    • IBM Model-2
    • IBM Model-3
  • Word2Vec

Data Sets

The system currently supports reading modules for

Download

References

  1. Ertopcu, B., A. B. Kanguroglu, O. Topsakal, O. Acikgoz, A. T. Gurkan, B. Ozenc, I. Cam, B. Avar, G. Ercan, O. T. Yildiz, "A New Approach for Named Entity Recognition", International Conference on Computer Science and Engineering (UBMK), pp. 474-479, Antalya, Turkey, 2017.
  2. Topsakal, O., O. Acikgoz, A. T. Gurkan, A. B. Kanguroglu, B. Ertopcu, B. Ozenc, I. Cam, B. Avar, G. Ercan, O. T. Yildiz, "Shallow Parsing in Turkish", International Conference on Computer Science and Engineering (UBMK), pp. 480-485, Antalya, Turkey, 2017.
  3. Acikgoz, O., A. T. Gurkan, B. Ertopcu, O. Topsakal, B. Ozenc, A. B. Kanguroglu, I. Cam, B. Avar, G. Ercan, O. T. Yildiz, "All-Words Word Sense Disambiguation for Turkish", International Conference on Computer Science and Engineering (UBMK), pp. 490-495, Antalya, Turkey, 2017.
  4. Sasmaz, E., R. Ehsani, O. T. Yildiz, "Hypernym extraction from Wikipedia and Wiktionary", Signal Processing and Communication Applications Conference (SIU), Antalya, Turkey, 2017.
  5. Gorgun, O., O. T. Yildiz, E. Solak, R. Ehsani, "English-Turkish Parallel Treebank with Morphological Annotations and its Use in Tree-based SMT", International Conference on Pattern Recognition and Methods (ICPRAM), pp. 510-516, Rome, Italy, 2016.
  6. Solak, E., O. T. Yildiz, O. Gorgun, R. Ehsani, "Attachment Errors of Nouns after Possessor Clitic", Research in Computing Science, Vol. 90, pp. 173-181, 2015.
  7. Yildiz, O. T., S. Candir, E. Solak, R. Ehsani, O. Gorgun, "Constructing a Turkish Constituency Parse TreeBank", International Conference on Computer and Information Sciences (ISCIS), pp. 339-347, Krakow, Poland, 2015.
  8. Yildiz, O. T., E. Solak, R. Ehsani, O. Gorgun, "Chunking in Turkish with Conditional Random Fields", International Conference on Intelligent Text Processing and Computational Linguistics (CICLING), Cairo, Egypt, 2015.
  9. Duzagac, R., O. T. Yildiz, "Context Sensitive Search Engine", International Conference on Computer and Information Sciences (ISCIS), pp. 277-284, Krakow, Poland, 2014.
  10. Yildiz, O. T., E. Solak, O. Gorgun, R. Ehsani, "Constructing a Turkish-English Parallel Treebank", Annual Meeting of the Association for Computational Linguistics (ACL), Baltimore, U.S.A., 2014.
  11. Yildiz, O. T., A. Okutan, E. Solak, "Bilingual Software Requirements Tracing using Vector Space Model", International Conference on Pattern Recognition and Methods (ICPRAM), Angers, France, 2014.
  12. Gorgun, O, O. T. Yildiz, "Using Morphology In English-Turkish Statistical Machine Translation", Signal Processing and Communication Applications Conference (SIU), Antalya, Türkiye, 2012.
  13. Gorgun, O, O. T. Yildiz, "A Novel Approach to Morphological Disambiguation for Turkish", International Conference on Computer and Information Sciences (ISCIS), pp. 77-83, London, UK, 2011.
  14. Ak, K, O. T. Yildiz, "Unsupervised Morphological Analysis Using Tries", International Conference on Computer and Information Sciences (ISCIS), pp. 69-75, London, UK, 2011.