This tutorial is part of a series “Data Science with Python“. A set of tutorials aimed at helping beginners get started with data science and Python.
Installing Python could be as simple as just downloading the Python executable from the official website. But the more common way to get Python in the Data Science world is to have a package manager like conda.
Conda is a popular package manager as well as environment manager.
There are 3 main advantages to installing Conda over Python –
- Your work will be reproducible. When you send your project to someone else you just have to tell them to install Conda (and maybe the version), instead of telling them the versions of all your dependencies. Using a dependency manager makes it easy to share your work.
- You will avoid package installation and dependency problems. There’s not much chance for dependency conflicts and such, as a beginner, but it’s still an important advantage.
- You can use it as a environment manager – that is if different projects you are working on use different dependencies (or different versions), you can create isolated environments for each project to avoid dependency problems, and easily switch between them.
Download the latest version of Miniconda as per your operating system (from the Conda download page) and install it. Installation for most people is just double-clicking on the downloaded file or running a command on the terminal.
- For Windows, you just have to double-click on the exe file you downloaded.
- For Mac, there are two options
- Download the pkg file and double-click to run it, or –
- Download the script and run it in the terminal using
bash ~/Downloads/Miniconda3-latest-MacOSX-x86_64.sh, and answer the questions asked.
- For Linux, you need to download the script and run it in the terminal using
bash ~/Downloads/Miniconda3-latest-Linux-x86_64.sh, and answer the questions asked.
If you need, the full installation instructions can be found here – https://conda.io/projects/conda/en/latest/user-guide/install/index.html#regular-installation.
I’ve deliberately kept the installation instructions minimal because there’s just about a ton of references and troubleshooting information if you just search the internet for ‘miniconda installation on <<operating system>>’. After you have successfully installed you should be able to run
conda --version in the terminal to see the version of Conda you installed. Also run
python --version to see the python version installed.
(base) ➜ data-science python --version Python 3.9.5 (base) ➜ data-science conda --version conda 4.10.3 (base) ➜ data-science
Writing a Python Program
You need a text editor to write your code. I use Visual Studio Code. You can install helpful extensions to Visual Studio Code if you like, but I’m not using any for this tutorial. The basic way to run Python programs is to write a program in your text editor and then running it in the terminal using the Python interpreter.
Now that we have everything we need setup, let’s write and execute a simple Python program. A program that prints ‘Hello World!’ on the screen.
1. Create an Environment
An ‘environment’ is the combination of the Python version and packages you need. For one project you might be using Python 3.9 and packages x, y and z. For another project you might be using Python 2.7 and packages a, b and c. Environments provide us a way of keeping these dependencies clear from one another and reduce confusion when switching between projects.
You can also share the environment as a
environment.yml file along with your code. So that the recipient can run your project without worrying much about the dependencies. Let’s create an environment named
dstut (for data science tutorial) to use along with this set of tutorials.
conda create --name dstut python=3.9 to create the environment. This will create an environment with Python version 3.9 and you will be asked to confirm with a set of default packages required for this.
Once the environment is created you have to ‘activate’ it every time you need to use it. Execute
conda activate dstut to activate the environment we created. You’ll see that the command prompt now includes
(dstut) to denote that you are in your new Conda environment.
2. Write your Code
Open your text editor and create a file with a single line of code –
Save the file as
3. Execute the Program
Switch back to your terminal and execute
python hello.py to run your code. If everything is in order, you should see the text
Hello World! printed on the terminal.
(dstut) ➜ hello-world python hello.py Hello World
Intro to Jupyter Notebook
A Jupyter notebook is a format where you can create a single document, which includes your python code, widgets, charts, documentation. It’s great for sharing your work. In fact in academic circles it’s kind of default to share your work through such ‘notebooks’. Almost every homework or assignment I submitted is using these notebooks.
So let’s also see how to run our ‘Hello World!’ program using a Jupyter notebook.
I hope you still haven’t closed the terminal. If you have, then open it back again, change to your directory (
cd <<directory-name>>) and execute
conda activate dstut to activate our tutorial environment.
conda install jupyterlab to install Jupyter in the environment. You will be presented with a list of packages and asked to confirm to install. Go ahead and finish the installation.
Note that you have installed Jupyterlab only inside the environment. That is, once you are out of the environment, (by closing the terminal or by doing
conda deactivate), it will be as if there is no Jupyterlab on your computer. And whenever you do
conda activate dstut, Jupyterlab is back on again! That’s one of the core functions of an environment manager like Conda.
Okay, now that Jupyterlab is installed, execute
jupyterlab notebook to start the notebook server. This will start the notebook server and open the interface automatically in your default browser. If you need to open the page by yourself, or on a different browser – the terminal will show an URL (like http://localhost:8888/?token=5410e03089b55baba71dubidabab57dudu85207ce07380a9). Copy that URL and paste it in your browser’s address bar to open the Jupyter notebook interface.
Creating our Document
Once you’re in the homepage, use the ‘New’ menu to create a new notebook.
On the new document that’s created, there’s one ‘cell’ by default. A Jupyter notebook is a set of such ‘cells’ with types. For now, we are going to have two cells – one for giving a title to our document and one for our ‘Hello World!’ code.
Change the first cell’s type to ‘Markdown’ using the dropdown on the toolbar or by using the ‘Cell -> Cell Type’ menu. And then type
# Program to Print 'Hello World!'. Then add a new cell using the ‘+’ button on the toolbar or by using the ‘Insert’ menu.
The new cell is by default of type ‘Code’ which is what we want too. Type your code in the new cell. If you remember, the code we wrote for hello.py is
print('Hello World!'). Type this code in the new cell we created.
Running the Document
Now let’s get the output by using the ‘Run’ command. You can either execute the two cells one by one using the ‘Run’ button on the toolbar. Or, use the ‘Cell -> Run All’ menu item to run both cells one by one.
We can also click on the ‘Untitled’ title on the top, and give it a meaningful name that suits our project.
Close the window and switch back to your terminal. You’ll find the server is still running there. Press
Ctrl + C to shutdown the server. If you type
ls you’ll see that Jupyterlab has saved your document as a file with an
ipynb extension. (ipynb stands for IPython Notebook). Then you can do
conda deactivate to deactivate our dstut environment or just close the terminal.
That’s it. Now you know how to setup a Python Data Science environment, write and execute Python code, and create Jupyter notebooks.