DATA SCIENCE PROJECT TOOLS
When starting with machine learning projects the first questions come up in mind are tools. What are the tools I can use? Where can I get those tools and their installation??
In this blog, we will discover tools which are available in the market and later will describe how to use it.
ENVIRONMENT
The development environment that you use for machine learning may be just as important as the machine learning methods that you use to solve your predictive modelling problem. Anaconda is the most widely used distribution which provides much software that can help you to build your machine learning application.
The most important and yet simple application is JuypterNootebook. First, let’s see the installation and set up of anaconda. You need to visit the official site for downloading the setup.
If you want to code in python then specifically you have to install the Anaconda distribution of Python. But before downloading any set up just be sure about the hardware you are using i.e System requirements. Refer link for the same https://docs.anaconda.com/anaconda/install/
Now you have installed Anaconda, you will get menus Anaconda Navigator and Anaconda Prompt. It is a command-line shell (a program where you type in commands instead of using a mouse). The black screen and text that makes up the Anaconda Prompt doesn’t look like much, but it is really helpful for problem solvers using Python.
You can run Anaconda Prompt on your computer depending on the OS you are using.
Open menu Anaconda Navigator, you will get multiple exe there like
Why anaconda only? You can directly download python from Python.org
Anaconda distribution of Python is advantageous because it includes Python as well as about 600 additional Python packages. These additional packages are all free to install. The packages that come with Anaconda include many of the most common Python packages used to solve problems. If you download Anaconda, you get Python including the Python Standard Library plus about 600 extra packages. If you download Python from Python.org, you just get Python and The Standard Library but no additional modules. You could install the extra modules that come with Anaconda (that don’t come with plain old Python), but why not save a step (or about 600 steps) and just install Anaconda instead of installing about 600 different modules?
Integrated Development Environment
- IDLE:- When you install Python, IDLE is also installed by default. So this makes it easy to get started in Python. Its major features include the Python shell window(interactive interpreter), auto-completion, syntax highlighting, smart indentation, and a basic integrated debugger.
IDLE is a decent IDE for learning as it’s lightweight and simple to use. However, it’s not for optimum for larger projects.
- Sublime Text 3:- It has basic built-in support for Python when you install it. However, you can install packages such as debugging, auto-completion, code linting, etc. There are also various packages for scientific development, Django, Flask and so on. Basically, you can customize Sublime text to create a full-fledged Python development environment as per your need.
You can download and use Evaluate Sublime text for an indefinite period of time. However, you will occasionally get a pop-up stating “you need to purchase a license for continued use”.
- Atom:- Atom is an open-source code editor developed by Github that can be used for Python development (similar Sublime text).
Its features are also similar to Sublime Text. Atoms are highly customizable. You can install packages as per your need. Some of the commonly used packages in Atom for Python development are autocomplete-python, linter-flake8, python-debugger, etc.
- PyCharm:- This IDE is designed for professional developers. It has 2 versions
- Community:- Free open-source, lightweight, good for scientific development.
- Professional:- Comes with end to end support and best suited for Web development.
Pycharm is best suited for Data Science project development because of its code completion, code inspections, error-highlighting and fixes, debugging, version control system and code refactoring. All these features come out of the box.
- Visual Studio Code:- Free Open-source IDE developed by Microsoft. It provides features such as intelligent code completion, linting for potential errors, debugging, unit testing and so on.
VS Code is lightweight and packed with powerful features. This is the reason why it is becoming popular among Python developers.
- Spyder:- Open-source IDE best suited for scientific development. It is readily available With Anaconda Distribution. Install anaconda and from Anaconda Navigator you can launch Spyder IDE.
- Jupyter Notebook:- Open source web-based application introduced for beginners in machine learning. It offers data visualization, transformation, numerical simulation, statistical modelling, machine learning and much more. For details on Jupyter Notebook refer https://www.youtube.com/watch?reload=9&v=VV7pRrUj2rA&list=PLeRUz657THGjiO5M1b33JDj-R8e4dDx6f
LANGUAGE
Python and R language are the most popular languages for Data Science projects because of the support for libraries. However, Data science is not dependent on which coding language even Java, Julia offers good library support.
LIBRARIES
They are the heart of Machine learning projects. There are varieties of the library available for all possible tasks you can imagine.
- Pandas
- Numpy
- Sci Kit Learn
- Matplotlib
- Seaborn
- Plotly
- TensorFlow
- Keras
- Pytorch
- Weka
The decision on which library you use is really important. The decision is also driven by the language you use as libraries are not transferable between the Python and R. and many more.