This tutorial is part of a series “Data Science with Python”. A set of tutorials aimed at helping beginners get started with data science and Python.
Consider this article a super fast tutorial for Python. But I’m not taking the usual feature-by-feature tutorial route. This is a super fast introduction to Python. Because there’s an unbelievable amount of Python tutorials already available on the internet. If you are completely new to programming or you are interested in learning Python more in depth, I advice you to read the official Python tutorial.
This tutorial can also be consumed as a Jupyter notebook available here . Let’s get started then.
Assignment Statement
- One thing that’s common in programs is to give names to values. This is called ‘assignment’.
person_name = 'John Smith'
is an assignment statement.person_name
on the left is a ‘variable’ (note that it has no quotes).'John Smith'
on the right is a ‘literal’person_age = 25
is also an assignment statement. Only difference is now we have assigned a number (25) to a variable named person_agetotal_value = 25 + 35 + 45
is also an assignment statement. First the value of25 + 35 + 45
is calculated and the result is assigned to a variable namedtotal_value
.- Variable names are not put in quotes. Text values like ‘John Smith’ and “This is my chat message” are put in single or double quotes. Numbers and Boolean Values (True, False) are not put in quotes.
The print function
print("Hello World!")
prints “Hello World!” to the screen. Text (like ‘Hello World!’) are calledstrings
in Python and should be enclosed in single or double quotes. Numbers and Booleans (True, False) should not be enclosed in quotes.print()
is called afunction
in Python.print
is the name of the function and the text you provide inside brackets is called anargument
. As you work with Python you will use a lot more functions and even write your own functions.print("Hello", "World!")
prints the same thing as above. You can put any number of items in and theprint()
function will print them separated by spaces. Now you have provided two arguments to print.print("John", "James", "Stuart", "Jacob", sep=", ")
prints the same thing as above but uses a comma as a separator. Now you have provided three arguments to the print function. One of them –sep
is a ‘keyword argument’ – an argument that has a name.
Dictionaries
In Python, a collection is a bunch of values grouped together. A dictionary is a type of such a collection. It is a list of values where each value has a key associated with it.
Say we want to store a list of voting ages in different countries. It would be cumbersome to create and work with lots of variables like usa_voting_age
, india_voting_age
, singapore_voting_age
and so on.
Instead, we create a dictionary. We will name this dictionary voting_ages
. The country names will be ‘keys’ and the voting ages will be ‘values’. When we need the voting age of India, we can simply fetch it by voting_ages['India']
.
Dictionaries are created by specifying a list of key:value
inside a set of curly brackets.
voting_ages = {
"India": 18,
"USA": 18,
"China": 18,
"Australia": 18,
"Singapore": 21,
"Malaysia": 21
}
print("Voting age in China is", voting_ages['China'])
Voting age in China is 18
Looping
How common are tasks like, ‘add up all the values in this list’, ‘print all the names from this list’, ‘check which of the items in this list weight heavier than 20 kilograms’? Pretty common right? Almost all of your time as a data science programmer will be spent doing loops. There are different types of loops in Python. Let’s learn one common loop – looping through a collection.
Syntax of a loop :
for variable_name in collection:
inside_the_loop()
print(variable_name)
do_some_more_things()
# outside the loop now
print('Loop finished.')
for variable_name in collection:
marks the start of a loop. This means ‘execute the following statements for every value in the collection’. Each value in the collection is assigned to the variable_name, and then the set of statements underneath it are executed. This is repeated for every element in the collection.
The following statements after that are indented to denote that they are part of the loop. The set of statements with the indent is called a ‘block’. The block ends when we stop indenting.
for country in voting_ages:
# country is a variable which gets each key in the dictionary
# in this case, each country name.
print("Voting age in", country, "is", voting_ages[country])
Voting age in India is 18
Voting age in USA is 18
Voting age in China is 18
Voting age in Australia is 18
Voting age in Singapore is 21
Voting age in Malaysia is 21
Preparing Data from Text
One common task done in data science is to read a bunch of text line-by-line and create better quality data from it. For example, if each line of your data is like "John Smith,35,Male,Australia"
– name, age, sex and country separated by commas. It would be easier to work with, if it was a dictionary with each of those values mapped to their corresponding names.
So that line of text gets converted into a dictionary – {"name": "John Smith", "age": 35, "sex": "Male", "country": "Australia"}
.
Obviously you have several lines of such data. So we can create a list
of this data for easier processing. A list
is another type of collection in Python. Dictionaries have ‘keys’ to access the values, whereas lists don’t have keys – it’s just a collection of values. You can access list values using the looping syntax we saw above, or by using a numerical index
. Lists are created by specifying a bunch of values inside square brackets.
list_names = ["John", "Jacob", "James", "Julie"]
print(list_names[0]) # will print "John"
print(list_names[1]) # will print "Jacob"
# Declaring a list of strings
data_list = [
"John Smith,35,Male,Australia",
"Lily Pina,13,Female,USA",
"Julie Singh,16,Female,India",
"Rita Stuart,20,Female,Singapore",
"Trisha Patrick,32,Female,USA",
"Adam Stork,32,Male,USA",
"Mohamed Ashiq,20,Male,Malaysia",
"Yogi Bear,25,Male,Singapore",
"Ravi Kumar,33,Male,India",
"Ali Baba,40,Male,China"
]
Dot operator
A function associated with a particular object is called a ‘method’. Methods do something with the object they are associated with. For example, strings have a method called ‘split’. It splits a string into multiple parts and returns the parts as a list. To call such methods, we use the dot operator.
names = 'John,Jacob,Jaden,Jill,Jack'
names_as_list = names.split(',')
# names is a string and 'split' is a string method.
# split(',') means split the string considering comma as separator
Now you can loop through the names using for-loop syntax like for name in names_as_list:
.
Similarly, lists have an .append()
method which can be used to add more elements to a list. It is common to declare an empty list using empty square brackets (like my_list = []
) and then adding elements to it using the append method (like my_list.append(25)
).
Data Types
Every variable in Python has a ‘type’ based on the value assigned to it. Handling data types is quite common when doing data science tasks because data is usually provided as text and it’s upto the programmer to convert it to any type that they want. This is important because what Python can do with the data differs by what type the data is.
For example, 25 can be a number, and 25 also can be thought of as a string.
a = 25
b = '25'
# a is a number, and b is a string
print(a * 3) # will print 75 : 3 times 25
print(b * 3) # will print 252525 : 3 times 25
To be clear about these things, we will have to check and convert data types wherever required. To convert a string value to an integer value, we use the int()
function. Example b = int('25')
will make ‘b’ a variable of type integer, even though we have given 25 in quotes. If we do b = float('25')
, b will be a number with a decimal point (like 25.0). The other way is also possible where you convert a number into a string – b = str(25)
will make b a string variable even though you have specified 25 without quotes.
[Continued in next part…]