Natural Language Processing – Part 1

We live in a very multifaceted world of smart devices and applications where they don’t just deliver information but to  some extent understand it , and that happens by aggregation and representing enormous amount of data into human form , you see that in digital translation , virtual assistance , chatbots…etc.

And the way these devices and application work is by deploying in a large extent Natural language Processing (NLP) techniques and methods starting from simple string manipulation into computational linguistics , semantics and machine learning , trying to convert human text to machine understandable form .

There are a couple of challenges though , first we cannot use rules , we tried in the absence of Machine learning (ML) , yet the challenge come from the semantic and reasoning of human words, understanding the context and the sentiment of the sentences that we are using…it’s the same power and flexibility that our natural language provide, enabling  us to express complex emotions is actually the biggest drawback for machines to understand the same meaning .

The second challenge is ambiguity ,  we have to admit that unlike Domain languages, we are using natural language that is rooted in our culture and bond by time, and in a way  geography, it’s a kind of  an undocumented agreement between the speaker and the listener  to agree on common form of understanding .

Welcome Machine Learning

And then came machine learning , a mathematical algorithm that try to map the input data and map it to a desired outcome, it does that by processing considerably amount of data trying to figure patterns to map it to an outcome

The model has allot of  optimization procedure (Hyper-Parameters)  to minimize the error of the model on the training data,  the fitted model then can be introduced to a new data and new text in which it will try to make predictions, returning labels and probabilities of how the output should look like…

Like any other ML learning process, there is a delicate  balance between being able to precisely learn the pattern in the known data and being able to generalize it to perform well on a new data that has not seen before. (Over fitting and under-fitting)

And because we trained that model on a specific input data implies that machine learning models is constrained by time and usage, in other words they do have a life cycle. Those models can also be retrained on new data to improve the model, Did Siri asked you to train her before?

The following post, I will detail the steps needed for NLP starting with Text Vectorization methods and procedures .

Choosing your IDE for Python and Machine Learning

Learning Machine learning is easy if you have some programming concepts (Or even if you are newbie) … There are obvious language(s) you need to learn and its hardly to choose the wrong one, since Python is kinds of the standard in this domain, this is a good thing cause Python is relatively easy to learn and master in no time (No time in programming is 3 month ) . Most of the Machine Learning Libraries and APIs are widely available on Python too for free.

You can start python on almost any platform , either Windows, Apple or Linux.

To code in Python, you will need an IDE , Integrated Development Environment, IDE is like Microsoft Word, an editor , the  application that help you write your code . The challenge with newbies is which IDE is the best and the answer IT DEPENDS , its your choice and the only way is to try some of them and see which one make sense for you, yet I have some advice for you.

If you are completely clueless, then go for  Anaconda , its free and advanced Data scientist use it .  if you have some programming background, you will go with Atom, or Visual Studio and then migrate to Anaconda … because Anaconda make you execute line by line, add visual comments , autocomplete and much more. Get Anaconda for your platform from here www.anaconda.com

You don’t need to install python or anything else, Anaconda comes with batteries included , python , libraries and everything you need to start… so go ahead, install it, play with it and we will take it from there

Just in case you need the old way of programming and like to code on a shell like IDE, Visual Studio Code (yes, its free and support Python) and Atom are not a bad after all… you can get them from the below links, but hey, you need to install Python too from python.org

Visual Studio code (Works on Windows, MAC and Linux) https://code.visualstudio.com/download

Atom  can be downloaded from here  https://atom.io/

Have fun,