New Year Resolutions

Photo by Chandan Chaurasia on Unsplash

I know most of us hardly stick to new year resolutions. But I still like making them. It makes me size up my life and think of what I have and what I miss. For this year, I want to –

1. Get Off the Computer

If you’re the typical developer, you probably have all the other parts of your life on the computer too. Your friends are all online. You hobby is probably playing video games. Reading books, watching videos, doing their budget, writing down to-do lists. Almost everything you can think of, you’re probably doing on the computer. When you think of a side project – it’s probably something to do on a computer – writing a blog 🙄 , freelancing as a coder and so on. If you study you’re probably going to take one of the online courses and study on the computer.

Well this year, I want to get off the computer. Of course I’ve already started an online course and I’m writing a blog. But I don’t think those are taking up much of my computer time. Apart from my work, and a couple hours for those things, I think I don’t have to sit at my computer at all. A major chunk of computer time goes for gaming, Facebook, YouTube and Netflix. I can altogether drop those. I resolve to have my entertainment and social life, away from the computer. This means I’ll have to let go of some of the new and the cool. Even for studying and blogging, I’m thinking to partially move them away from the computer. Maybe I can write drafts on pen and paper before I sit to publish it on the computer. I can also study the online course using a tablet, take notes with a pen and paper.

Less computer. More real world. More pen-and-paper.

2. Learn Continuously

This is partially already done because of the nature of development jobs. You just can’t avoid learning. New frameworks and tools pop up every day. Parts of the job are getting obsolete and new parts are getting created. Lines between business and IT are getting blurred. If you’re a developer, the only way you can be successful, or even just survive for long, is if you keep learning new things. This much goes without saying.

But what about the other things? Do you know anything at all besides coding? Have you learnt music or an art? To play a sport? Do you know about economics? Do you know to cook? Pick one or two skills outside of coding, and work on it regularly. Maybe cook food on weekends. Learn a new language (a spoken language – not another programming language). Join a workshop on reparing cars. Practice it by yourself or join a class. Do something that expands your mind. Do something that makes you interesting.

Develop a new skill. Keep improving.

3. Take Health Seriously

Whenever I get pain in my shoulders, neck or lower back, I go through a ‘fitness phase’. I get the pain treated and follow a workout routine for a few days after that. But as the pain is relieved, so is my motivation. Not only fitness, but even looks can become invisible when most of your life is online through a computer. I mean who cares if you’re fat if no one ever sees you right? After two years of living alone as a developer, I have literally become that fat guy who’s only contact with outside world is with the food delivery guy. Literally!

I resolve to take health seriously. Prevention is better than cure. I’m targetting a healthy fitness level, eating good food, moderate regular exercise and frequent breaks for ergonomics. My chiropractor said the best posture is no posture at all. I like the way he put it. You need to move frequently so that you’re not really getting ‘fixed into a posture’ in the first place. That’s the only way to avoid posture related problems.

Do about half an hour of exercise, almost everyday. Include some fruits and vegetables in diet. Do not keep sitting for more than half an hour at a time. Go to bed early – without your smartphone.

4. Give to Relationships

This one is particularly hard for me because I’m a complete introvert. I am truly fine just being single rather than have to put up with a family. I simply hate even just being around people. But I’m resolving to this anyway. Because I’m old enough to know that this is important. Being around people, having face-to-face interactions, depending on and being depended upon, are immensely more satisfying than a lonely-yet-comfortable life. I, like many other developers, have always avoided social interactions with people. I don’t do small talk, I don’t attend ceremonies, I don’t even go to casual meetings at work.

I try to avoid my family as much as possible. Even though I like to spend time with my little nephews, I’d rather not because it would mean I have to be around the grown ups too. I need to stop being such a loner. Wisdom tells me that friends are important, family is important.

Make more friends and invest some time and effort into the ones that exist. Be in touch with family.

5. Travel

There are so many YouTube channels that make you yearn to just drop your ‘normal’ life and become a nomad with a back pack. But that’s simply impractical and unnecessary. Not everyone can (or should) become a full time traveller. But I believe vacations are necessary. They create a disconnect that simply can’t be achieved otherwise. I’m at the end of a two week holiday but doesn’t even seem like I left work. Because I just spent it all at home. Still at my computer. Still thinking about work.

Traveling – without your laptop – actually traveling, acts like a reset for the mind. You don’t have to travel to fancy places or spend a lot of money. Just to a simple place a few hundred kilometers away. Maybe for beautiful views of nature, maybe for spirituality of an ancient temple. Maybe solo, maybe with friends or family. But make it a point to do it.

Travel to a far away place. Atleast 3 days together. Atleast once in 3 months. Create memories.

How to Not Panic in a Job Interview

One of the nice things about sticking for long periods in the same job, is that you build trust and get picked for interviewing potential new employees. I’ve so far conducted more interviews to pick junior programmers than I care to count. I will never say no to conducting interviews because hiring an employee is synonymous to expanding the company. I believe it’s invaluable contribution to help with screening and selecting the right candidate for your company. That candidate is not only going to contribute a lot of value, but they might also be a superstar factor who propels your company to better heights – and their journey in your company would have started with your interview.

The biggest reason I’ve had to reject candidates, is obviously incompetency – but quite a lot of times, I’ve felt maybe this person is competent, but they’re stressed and aren’t performing at their best. At these times, it’s quite easy for me to direct the conversation and put them at ease, and thus having a positive interview experience. But there are times, when the interviewee just panics. Their mind just shuts down, words just come out without much thought or imagination, and I sense they just want the interview to end. They don’t even care about getting the job anymore.

Photo by Maxime on Unsplash

Before the Interview

Preparing for the interview is such a no-brainer and yet, it’s shocking how many candidates turn up without preparation. They think interview is just a conversation and they can simply answer questions from their memory. Walking in without preparation is the number one reason interviews end in failure. The interviewer has probably sat down and prepared to interview you. It’s not only in your interest, but also a courtesy to them that you spend some time properly preparing before your interview.

1. Revise what you know about your skill

If you are about to interview for a programmer job, it’s very important that you study up on your technologies. Do not skip this part. Even if it’s a technology that you’re currently working on, do a revision and checkout common interview questions on that topic. When I interviewed for my last job, I actually asked them that I need a couple weeks to study up on Java programming. If you have studied well, you will be able to give crisp and confident answers. But if you haven’t, you will be dragging your answers around, trying to shoot all over the place hoping something will match with the interviewers expectation. That never happens though. If you have put something on your resume, make sure you’ve read up on it enough to answer questions about it.

2. Know Thyself

I know questions like ‘tell me about yourself’, ‘where do you see yourself in 5 years’ and so on, seem clichéd. But frankly, what else is an interviewer supposed to ask you? Interview is a situation where you need to be analyzed in a rather short time – often barely enough to know a person. There are tons of lists on the internet about ‘frequently asked interview questions’. Select about ten good questions and prepare answers for them. And surely prepare a good answer for ‘tell me about yourself’ – I have a 2 minute answer and a 10 minute answer. Prepare answers for any areas of concerns you might have – for example a break in your career, or why you got fired that one time, or why you have a low CGPA. If you prepare answers like that, you’ll be able to control the interview – distract them from negativities and guide them towards your strength.

3. Know about the Job and the Company

Another easy – yet often ignored – part of your preparation is to know about the job and company you’re interviewing for. Have you even read the job description? There is vital information in the description – it’s not some rough note put up. The HR and the job’s department works together to properly describe the job and what the interviewers will expect. This job description is a vital tool to help you prepare for the interview. And about the company, you need to know it’s core purpose and mission. The answer to ‘do you know about our company?’ should be ‘I know it does __ and __, but I’d like to know more’.

During the Interview

1. Remember the Interviewer’s Goal

The interviewer’s goal is to hire you. Often we have to interview way more candidates than we’d like. When I start an interview, I hope sincerely that the candidate gets selected and I can move on to other work. But candidates almost never have this realization. They always appear anxious as if I was there to push them out of their limits and reject them. That is far from the truth. I’m sitting there hoping to discover what your strengths are. My managers are probably urging me to hire someone soon. There is a project that’s struggling for the lack of a developer and it will be really great if you get hired. If you kept these facts in your mind, you’d be way more confident in your interview.

2. Keep Moving Forward

You might not have answered the question as well as you liked, but are you going to let that affect your next question? That’s what a lot of people do. They start focusing on a mistake they made and lose focus on what’s happening at present. Even a little stutter or slip of the tongue, derails people. And it’s worsened because it gets accumulated with each question. Doing something wrong in one question makes you lose focus and you make another mistake soon and so on, till the point where you are in full panic, accepted failure and just want the interview to end. Avoid this. Forget that you made a mistake. You don’t have to go back and fix it, or convince the interviewer to forget it. Consciously move on and start your next answer with a fresh and positive mindset.

3. Collect Your Thoughts Before Answering

The pressure to impress the interviewer reflects in the candidate’s behavior. I have no idea why, but people always assume they have to answer fast and quick. It’s not like the interview is a rapid-fire round on a game show. My expectation is almost always the opposite. When I finish stating the question, I always expect the person is going to pause a few seconds to compose their answer before responding. Even if the question is straightforward and you don’t need to recollect anything, it’s still useful to pause and take a breath before you start answering. If you answer in a relaxed pace, you appear confident, and you can keep up your energy for longer.

Still Unsure?

If you’ve taken the points I’ve mentioned and still lack the confidence about attending interviews, then the only thing left is to practice.

  1. You can do mock interviews with other people (friends or paid services). But I find them impractical and a bit dramatic. What I would advice is, to do mock interviews in your head. Just play out your interviews in your mind and observe you from a third-person point of view. Granted, you cannot predict how the interview is actually going to be, or what the interviewer is going to ask you. But picturing your behavior makes you less concious and removes one main source of anxiety during interviews.
  2. Consider each interview as practice. Keep attending interviews of similar jobs or similar companies. For the first few, don’t keep expectations of getting hired and keep in mind you are doing it only for practice. You’ll improve with each one. You will automatically get better anyways, but also retrospect and find out how you can improve yourself for your next interview performance. Sometimes the only way you can learn is by jumping right in.

Basic Modules: Data Science with Python Part 4

This article is part 4 of the “Data Science with Python” series. You can consider this a general introduction to common modules used in Python for doing data science.

Note that such a list might not always stay relevant. New modules and frameworks keep coming and going. But I believe these modules have proven so effective to the maths and science community that they have made their way to even academic courses. And as such, they will stay relevant for a long time and it would serve you well if you learn them at the beginning of your data science journey.

What are Modules in Python?

Although learning mathematics, data structures and algorithms are a significant part of any data science course, real data science jobs hardly start with those. Maybe if you’re advanced enough they will matter, but not for a beginner. As a beginner data science programmer, you just have to assemble previously developed and tested components to achieve your goals.

So your first set lessons should be what modules are best for data science. Or in other words, what modules are most commonly used in data science. There are some modules that have become foundational to any data science work. These are the modules that you would expect any data scientist should be familiar with. These modules are also what you would learn in most data science, machine learning or artificial intelligence courses.

4 Modules I Recommend You Learn First

Numpy

At it’s core, Numpy is just a better way to work with arrays that hold only one type of data. Python already has arrays by means of the ‘list’ type. That list object is fine for most regular programming tasks. But in data science and analytics, data structures must be optimized to hold much larger datasets than usual. Even homeworks given in data science courses involve datasets with thousands of records. The Numpy module provides an optimized array object to handle such work loads.

Using Numpy arrays vs. Python lists have 3 main advantages while doing data analytics –

  1. Speed: Numpy (and many other such modules), are faster mainly because internally, most of their functionality is implemented in lower level languages (like C).
  2. Space: Numpy takes advantage of data types – if the values you store in the array are all 8-bit numbers, then thats’ all the space they’ll take (with a little bit of fixed overhead). But in Python lists, each of those numbers will take space for a reference and an integer object – which is wasteful when you are working with big data sets.
  3. Functions: Python list functionality is limited to what you would need in a general programming task. But Numpy expands it considerably – including but not limited to, vector operations, algebra and matrix operations. Most of the time we can do operations on arrays without even writing loops – for example – array * 2 will multiply all values in the array by 2.

Pandas

I like to think of Pandas as the programmatic version of spreadsheet software. It excels at working with tabular data – that is, data that’s arranged in rows and columns. As far as I have seen, Pandas is the first python module that is introduced to data science students – simply because most of them start with loading data from a csv file and manipulating / analysing it. Y0u can load data from a variety of sources (like CSV files, Microsoft Excel files, REST APIs etc.), do data-wrangling tasks (like cleaning, enriching, transforming etc.) on the loaded data, produce analytic output (like summaries, charts etc.) – all using just Pandas.

Even if you are going to do much more advanced tasks like Machine learning, the first steps of loading, analyzing and cleaning data will probably be done by Pandas. If you are doing exploratory analysis (like Business Intelligence reports), I can safely say that Pandas is all you need.

Just like how Numpy has array as the core datastructure, Pandas has two core datastructures – dataframe and series. A series is similar to a one dimensional array (or simply a list of numbers). In a lot of places, Pandas series and Numpy array can be swapped without much difference (but we will see the differences as we advance). A dataframe is like a spreadsheet. It’s used to store data as rows and columns, and provides powerful features to manipulate and analyze tabular data.

Matplotlib

Data Science, Business Intelligence, or even simple analytics on data – none of it be complete without neat reports presenting the findings from the analysis. Matplotlib is the library that we use to make charts and other pictorial representations in our reports.

Matplotlib has functionality to render charts on the screen, output charts as image files and even display charts in IPython/Jupyter notebooks. I’ve come across comments that Matplotlib has a hard learning curve, but I don’t think so. You just need to have patience to learn it’s foundations rather than hurry up to produce charts – it’s not that difficult.

Even modern visualization modules like Seaborn actually use Matplotlib underneath. Seaborn is considered to make “prettier” charts than Matplotlib, and even if you would like to use it, I suggest you start by learning Matplotlib. Pandas also has chart producing capabilities – and yes, it uses Matplotlib internally. If you are working on a Python data science project – there’s a very high chance that your output is rendered using Matplotlib. That’s how common this module is for making charts.

Scikit-learn

Scikit-learn, also called sklearn, is the most used library for machine learning. It has functionalities central to machine learning, namely – clustering, regression and classification. It is a collection of a lot of complex algorithms – quite a lot, that I can say this is not one module – it is a collection of several machine learning modules. Scikit-learn also gives some data-wrangling functionality to preprocess your data where Pandas might come a bit short.

If you do a course related to data science, you will probably learn algorithms like k-means, random forests, nearest neighbors, logistic regression (to name a few). Although you learn how these work internally, it is never expected for a data scientist (or a data science programmer), to implement these algorithms themselves. They just use a module (probably Scikit-learn), which already has implementations of these algorithms in a generalized, re-usable way. We just need to pick the required components and implement our project using them.

Scikit-learn internally uses Numpy for it’s processing, and integrates naturally with Pandas and Matplotlib. Not only Scikit-learn, all the four modules introduced in this article, interoperate well with one another. It is an important reason they have become so popular and useful – they focus strongly on their own purpose, at the same time, working well in connection with one another.

Conclusion

Most data science projects have a pattern. We aquire data, do numerical and algebraic calculations, run our data science algorithms on it, finally present our results as visualizations. The four modules that I’ve recommended above, map directly to these four tasks. Pandas to aquire data, Numpy for crunching numbers, Sckit-learn for some algorithmic magic, Matplotlib to add charts to your reports.

Once you have learned those four modules, you can expand your skillset by knowing which direction you are going to go from there. By the time you’ve learned these, you will know what you want to learn next. Some notable mentions are Tensorflow, Scikit-image, Keras, and PyTorch. Four is a pretty small number, because there are countless libraries and modules in the world of Python and data science. But learning these four modules will give you the solid grounding you need to launch your data science journey.

Python Basics: Data Science with Python Part 3

This is the continuation of the Python Basics tutorial. This is the second part of Python Basics, and the third part of the Data Science with Python series. This tutorial can also be consumed as a Jupyter notebook available here . Let’s continue then.

Preparing Data for Analysis

Remember the list of strings that we created above –

data_list = [
    "John Smith,35,Male,Australia",
    "Lily Pina,13,Female,USA",
    "Julie Singh,16,Female,India",
    "Rita Stuart,20,Female,Singapore",
    "Trisha Patrick,32,Female,USA",
    "Adam Stork,32,Male,USA",
    "Mohamed Ashiq,20,Male,Malaysia",
    "Yogi Bear,25,Male,Singapore",
    "Ravi Kumar,33,Male,India",
    "Ali Baba,40,Male,China"
]

Let’s convert this list of bulky strings into a neat list of dictionaries with proper data type for age. What we are going to do is –

  • Create an empty list to store the processed lines
  • Loop over each string using the for line in data_list: syntax
  • Split each line into it’s components using the .split() method
  • Create a dictionary with proper field names
    • Use int() method to convert age into a number. Otherwise it will be stored as a string.
    • Append this dictionary to the processed lines list using the .append() method
processed_data = []
for line in data_list:
    fields = line.split(',')
    processed_data.append({
        'name': fields[0],
        'age': int(fields[1]),
        'sex': fields[2],
        'country': fields[3]
    })

processed_data

Using the processed list

Each element in a collection, can be a collection itself. That’s what we have done here. We have created a collection of collections – or rather, a list of dictionaries.

  • processed_data is a list – it’s elements are accessed using a numerical index (starting with 0)
  • processed_data[0] is the first element of the list. processed_data[1] is the second element and so on. Note that each element is a dictionary (that we appended in the previous step)
  • Elements of a dictionary are accessed using their key names. So processed_data[0]['name'] means to fetch the first element (which is a dictionary) and then fetch the ‘name’ field from it.
print(processed_data[0]['name'], processed_data[0]['age'])
print(processed_data[1]['name'], processed_data[1]['age'])

Stepping into Data Science

Let’s print some statistics from our data. First calculate average age of people in our dataset –

  • Average is sum divided by count.
  • Count can be easily obtained using the len() function that returns the size of any string or collection passed as argument
  • Sum can be obtained by looping through the list and accumulating the age values into a sum variable.
  • Finally divide sum by count to get the average
sum_of_ages = 0
number_of_persons = len(processed_data) # len function gives size of collection
for person in processed_data:
    sum_of_ages = sum_of_ages + person['age']
    
print("Number of persons:", number_of_persons)
print("Average age:", sum_of_ages / number_of_persons)

Conditions

A condition specified using ‘if’, ‘elif’ and ‘else’ keywords help us branch out our programs execution based on the given condition. Statement blocks to be executed Example –

if age < 18:
    person_type = 'kid'
    print('Person is just a kid')
elif age < 60: # elif means else-if
    person_type = 'adult'
    print('Person is an adult')
else:
    person_type = 'senior'
    print('Person is a senior')

The < in age < 18 is a ‘comparison operator’. Other comparison operators are <, >, <=, >=, ==, !=. == means True if both sides are equal. != means True if both sides are not equal.

Two more operators are in and not in. These are for conditions where you have to check whether a value is in a collection. Example if country in country_list: or if student not in class:.

Let’s use the conditions to report how many of our people are eligible to vote –

number_of_voters = 0
number_of_nonvoters = 0
for person in processed_data:
    # person['country'] gives a country name
    # which can be used to get voting age
    # from the voting_ages dictionary
    if person['age'] > voting_ages[person['country']]:
        number_of_voters = number_of_voters + 1
    else:
        number_of_nonvoters = number_of_nonvoters + 1

print("Number of voters:", number_of_voters)
print("Number of non voters:", number_of_nonvoters)

Logical Operators

Conditions often require to be combined to be useful. For example a person should be over 18 years and should be a male. To represent combinations of conditions like this, Python has logical operators – and, or and not.

if age < 18 and  sex == "Male":
    print("Male Child")

Another example –

if country == "India" or country == "China":
    print("Asia")
elif country == "Spain" or country == "Italy":
    print("Europe")

Slicing Operator

Slicing is an operation that can be used on lists and strings in Python. It is quite simple and very useful to quickly fetch a range of items from a list (or a range of characters from a string).

We access list items with their numerical indexes, but we can also give a range inside the square brackets to get multiple items at once – sort of like a sub-list. For example, my_list[2:6] will return the elements from index 2 to 5. Register this – [a:b] means from ‘a’, upto, but not including ‘b’. Also remember index starts with zero.

Using negative values with the slicing operator is also possible. It simply means elements are counted from the end. Or you subtract the values from the length. That is, if the length of the list is l, then [-a:-b] means [l-a:l-b]

Leaving out the values could means start from the beginning or till the end. That is, [:b] means from starting, upto, but not including ‘b’. And [a:] means from a, until the end.

my_list = ['apple', 'orange', 'grape', 'melon', 'lemon', 'cherry', 'banana', 'strawberry']
print(my_list[2:6])   # prints ['grape', 'melon', 'lemon', 'cherry']
print(my_list[2:])    # prints ['grape', 'melon', 'lemon', 'cherry', 'banana', 'strawberry']
print(my_list[:6])    # prints ['apple', 'orange', 'grape', 'melon', 'lemon', 'cherry']

print(my_list[-6:-2]) # prints ['grape', 'melon', 'lemon', 'cherry']
print(my_list[:-2])   # prints [

This works exactly the same with used with strings. For example, if my_name is a string variable, my_name[-2:] means last two characters from the given string. my_name[:2] means first two characters.

Experiment and learn the slicing operator until you are confident.

Functions

Functions are named, reusable bits of code. Which you first define and then call whenever required. Defining our own functions will help modularize our code and reduce duplication of code. It also makes it convenient to introduce changes in future. For example, let’s introduce a concept of short names for people in our data set. For now, a short name is made by joining the first letter of the first name and the first 4 letters of the last name. So for “John Smith”, the short name will be “JSmit”.

name = "John Smith"
first_letter = name[:1]
last_name = name.split(" ")[1]
short_last_name = last_name[:4]
short_name = first_letter + short_last_name

The above code gives short name for “John Smith”. But when we need short name for “Jacob Nilson”, we have to write the same set of 5 lines of code again. Any time we change our formula for creating short names, we have to change all this code. This is where functions help.

Functions allow us to package processing like this and reuse it as and where it is required. Functions are defined using the def keyword. Then a function name (in this example ‘short_name’) and then in brackets, a list of parameters that the function can take as input. Similar to the for loop and if conditions, the set of statements forming the function block is indented below the declaration line.

def short_name(full_name):
    first_letter = full_name[:1]
    last_name = full_name.split(" ")[1]
    short_last_name = last_name[:4]
    short_name = first_letter + short_last_name
    return short_name

print(short_name("John Smith"))
print(short_name("Jacob Nilson"))
print(short_name("James Maroon"))
print(short_name("Jill Jack"))

Now wherever we need this functionality, we can just call this function by it’s name. We don’t have to write the same code again and again.

Another advantage is when introducing a change, we just make the change in the function definition and it reflects in all places where we have called the function. So whenever you’re implementing a formula or an algorithm, it’s better to define it as a function and then call it wherever required.

Modules

Functions defined like the above are usually grouped together into a ‘module’ that we import before using the function. For example there are a ton of functions in the math module in Python. Keeping all functions in the global scope is bad for manageability. So we arrange them into modules and ‘import’ them into our programs if and when required.

Say we need the square root function. It’s in the math module. So we can import the math module and call math.sqrt function from it.

import math
print(math.sqrt(25))

Or, we can import only the sqrt function and use it without specifying a module name.

from math import sqrt
print(sqrt(25))

As you advance, you will not only define your own methods, you will also organize your code into modules.

Conclusion

So that concludes my super fast intro to Python course. It’s not really much when compared to the vastness of the Python ecosystem, but it’s a good start to the data science journey we’re taking up. There are more concepts but I find it easier to introduce concepts when we’re about to actually use them for something. So although the Python Basics part is over, I will continue to introduce Python concepts and methods as we progress.

Python Basics: Data Science with Python Part 2

This tutorial is part of a series “Data Science with Python”. A set of tutorials aimed at helping beginners get started with data science and Python.

Consider this article a super fast tutorial for Python. But I’m not taking the usual feature-by-feature tutorial route. This is a super fast introduction to Python. Because there’s an unbelievable amount of Python tutorials already available on the internet. If you are completely new to programming or you are interested in learning Python more in depth, I advice you to read the official Python tutorial.

This tutorial can also be consumed as a Jupyter notebook available here . Let’s get started then.

Assignment Statement

  • One thing that’s common in programs is to give names to values. This is called ‘assignment’.
  • person_name = 'John Smith' is an assignment statement. person_name on the left is a ‘variable’ (note that it has no quotes). 'John Smith' on the right is a ‘literal’
  • person_age = 25 is also an assignment statement. Only difference is now we have assigned a number (25) to a variable named person_age
  • total_value = 25 + 35 + 45 is also an assignment statement. First the value of 25 + 35 + 45 is calculated and the result is assigned to a variable named total_value.
  • Variable names are not put in quotes. Text values like ‘John Smith’ and “This is my chat message” are put in single or double quotes. Numbers and Boolean Values (True, False) are not put in quotes.

The print function

  • print("Hello World!") prints “Hello World!” to the screen. Text (like ‘Hello World!’) are called strings in Python and should be enclosed in single or double quotes. Numbers and Booleans (True, False) should not be enclosed in quotes.
  • print() is called a function in Python. print is the name of the function and the text you provide inside brackets is called an argument. As you work with Python you will use a lot more functions and even write your own functions.
  • print("Hello", "World!") prints the same thing as above. You can put any number of items in and the print() function will print them separated by spaces. Now you have provided two arguments to print.
  • print("John", "James", "Stuart", "Jacob", sep=", ") prints the same thing as above but uses a comma as a separator. Now you have provided three arguments to the print function. One of them – sep is a ‘keyword argument’ – an argument that has a name.

Dictionaries

In Python, a collection is a bunch of values grouped together. A dictionary is a type of such a collection. It is a list of values where each value has a key associated with it.

Say we want to store a list of voting ages in different countries. It would be cumbersome to create and work with lots of variables like usa_voting_age, india_voting_age, singapore_voting_age and so on.

Instead, we create a dictionary. We will name this dictionary voting_ages. The country names will be ‘keys’ and the voting ages will be ‘values’. When we need the voting age of India, we can simply fetch it by voting_ages['India'].

Dictionaries are created by specifying a list of key:value inside a set of curly brackets.

voting_ages = {
    "India": 18,
    "USA": 18,
    "China": 18,
    "Australia": 18,
    "Singapore": 21,
    "Malaysia": 21
}

print("Voting age in China is", voting_ages['China'])
Voting age in China is 18

Looping

How common are tasks like, ‘add up all the values in this list’, ‘print all the names from this list’, ‘check which of the items in this list weight heavier than 20 kilograms’? Pretty common right? Almost all of your time as a data science programmer will be spent doing loops. There are different types of loops in Python. Let’s learn one common loop – looping through a collection.

Syntax of a loop :

for variable_name in collection:
    inside_the_loop()
    print(variable_name)
    do_some_more_things()
# outside the loop now
print('Loop finished.')

for variable_name in collection: marks the start of a loop. This means ‘execute the following statements for every value in the collection’. Each value in the collection is assigned to the variable_name, and then the set of statements underneath it are executed. This is repeated for every element in the collection.

The following statements after that are indented to denote that they are part of the loop. The set of statements with the indent is called a ‘block’. The block ends when we stop indenting.

for country in voting_ages:
    # country is a variable which gets each key in the dictionary
    # in this case, each country name.
    print("Voting age in", country, "is", voting_ages[country])
Voting age in India is 18
Voting age in USA is 18
Voting age in China is 18
Voting age in Australia is 18
Voting age in Singapore is 21
Voting age in Malaysia is 21

Preparing Data from Text

One common task done in data science is to read a bunch of text line-by-line and create better quality data from it. For example, if each line of your data is like "John Smith,35,Male,Australia" – name, age, sex and country separated by commas. It would be easier to work with, if it was a dictionary with each of those values mapped to their corresponding names.

So that line of text gets converted into a dictionary – {"name": "John Smith", "age": 35, "sex": "Male", "country": "Australia"}.

Obviously you have several lines of such data. So we can create a list of this data for easier processing. A list is another type of collection in Python. Dictionaries have ‘keys’ to access the values, whereas lists don’t have keys – it’s just a collection of values. You can access list values using the looping syntax we saw above, or by using a numerical index. Lists are created by specifying a bunch of values inside square brackets.

list_names = ["John", "Jacob", "James", "Julie"]
print(list_names[0]) # will print "John"
print(list_names[1]) # will print "Jacob"
# Declaring a list of strings
data_list = [
    "John Smith,35,Male,Australia",
    "Lily Pina,13,Female,USA",
    "Julie Singh,16,Female,India",
    "Rita Stuart,20,Female,Singapore",
    "Trisha Patrick,32,Female,USA",
    "Adam Stork,32,Male,USA",
    "Mohamed Ashiq,20,Male,Malaysia",
    "Yogi Bear,25,Male,Singapore",
    "Ravi Kumar,33,Male,India",
    "Ali Baba,40,Male,China"
]

Dot operator

A function associated with a particular object is called a ‘method’. Methods do something with the object they are associated with. For example, strings have a method called ‘split’. It splits a string into multiple parts and returns the parts as a list. To call such methods, we use the dot operator.

names = 'John,Jacob,Jaden,Jill,Jack'
names_as_list = names.split(',')
# names is a string and 'split' is a string method.
# split(',') means split the string considering comma as separator

Now you can loop through the names using for-loop syntax like for name in names_as_list:.

Similarly, lists have an .append() method which can be used to add more elements to a list. It is common to declare an empty list using empty square brackets (like my_list = []) and then adding elements to it using the append method (like my_list.append(25)).

Data Types

Every variable in Python has a ‘type’ based on the value assigned to it. Handling data types is quite common when doing data science tasks because data is usually provided as text and it’s upto the programmer to convert it to any type that they want. This is important because what Python can do with the data differs by what type the data is.

For example, 25 can be a number, and 25 also can be thought of as a string.

a = 25
b = '25'
# a is a number, and b is a string
print(a * 3) # will print 75 : 3 times 25
print(b * 3) # will print 252525 : 3 times 25

To be clear about these things, we will have to check and convert data types wherever required. To convert a string value to an integer value, we use the int() function. Example b = int('25') will make ‘b’ a variable of type integer, even though we have given 25 in quotes. If we do b = float('25'), b will be a number with a decimal point (like 25.0). The other way is also possible where you convert a number into a string – b = str(25) will make b a string variable even though you have specified 25 without quotes.

[Continued in next part…]