Python options for Data Science

Python has become one of the most important tools used in the data science world. This article is for new students of data science who are just getting started and would like to know their options when it comes to installing a Python environment to work on and learn data science. Note: This is not a tutorial for setting up Python – I only try to assemble the different options available to create a Python environment to do your work.

My advice is to try out all the options and settle on one that you find most convenient to you. Even if you find one particular way to setup things convenient, you will still have to have some familiarity with other approaches also, because you don’t know what kind of projects you might have to work on in future.

The options available to you as a learner, to setup python on your computer are –

  1. Plain Vanilla
  2. Python Distributions
  3. pyenv and venv
  4. Cloud Services

Plain Vanilla

The basic way to get started. Install python on your computer. Then install whatever frameworks and libraries you need as you go about learning data science. First download Python from the official website, and follow instructions from there (mostly just double click on the file you downloaded). Then use pip install <package-name> to install the packages you need. As a beginning data science student, the packages you need would be (but not limited to) –

Python Distributions

The above option of having Python installed along with a few libraries through pip would be sufficient for most data science students when they are just beginning. Of late, the more convenient option seems to be to use one of the “Python Distributions”. Python distributions are nothing but a version of Python and a set of packages clubbed together. So you don’t have the task of installing or configuring the packages yourself. These are actually quite nice. My favourite one is the Anaconda distribution, but there are several others. A nice list of popular distributions can be found here.

Although apparently python distributions exist to simplify python installations and be a convenience, in my experience, I’ve found that installing python using pyenv and managing projects using venv is more convenient for me. My personal opinion is, unless your course/project expects you to use a distribution like Anaconda, better install Python (using pyenv) and use virtual environments to keep your life simple.

pyenv and venv

The above two options, although is quite simple and sufficient for most use cases, has one problem. What if you need multiple version of Python? There are cases where you might need both Python 2 and 3 or different versions of Python 3 itself. Also, what if you need different versions of dependencies? This is where pyenv and venv help. Using pyenv, you can install multiple versions of Python on your computer.

If you are taking this option, you should first install pyenv and then install Python using pyenv. Checkout the pyenv repository to see instructions how to install pyenv. pyenv is one of my favourite tools. It allows us to install all the popular versions and distributions of Python and very easily switch between them.

Even if you don’t need pyenv and are going to use only the latest version of Python always, I still recommend working with venv. Simply create a directory for your python project, then run python -m venv <<env-name>> to create a virtual environment. Then run source <<env-name>>/bin/activate to activate the environment. Now your command prompt will change to show that you are inside a virtual environment. Checkout the official page for documentation on venv.

venv is included with Python 3. If you are working with both Python 2 and Python 3, and you are using pyenv, I suggest you use the pyenv-virtualenv plugin to manage virtual environments rather than venv.

Cloud Services

Cloud services are also a popular option to learn data science using Python. The concept is that they provide an environment like Jupyter notebook but it’s hosted online on shared/dedicated machines on the cloud. This is a neat option in that you don’t have to install anything at all on your computer.

There are multiple options – CoCalc, Google Colab, Kaggle are just a few examples. You just have to register, login and start doing your data science work. These even allow to have ‘shared notebooks’ where you can collaborate with others on the same file. As a student, this is my favourite approach when I’m doing group assignments. I prefer starting a document on Google Colab, share it so my peers can contribute. And then for submission, I export a ipynb file from it and submit for grading.

Spyder IDE

If you are from the world of R and have used RStudio, you’ll feel right at home with this IDE. This is also a no hassle approach to get started with data science. It has a neat toolset, an internal python installation and pre-installed packages for doing data science work on Python. If your use cases are quite simple and you don’t need to install many third party packages, using this IDE can be such a simple option for you. But I don’t prefer this either – because I try out a lot of python packages and installing them is not straightforward. Even after I figured out how to install packages, I didn’t like to continue using this IDE. But that’s only my opinion you should definitely try this one out if you’re still figuring out your favourite setup.

If you’re willing to spend some money, and you like IDEs, you can also take a look at PyCharm. I don’t prefer IDEs for data science anyway. And definitely I won’t advice anyone to spend much money for learning because there’s a ton of stuff for free. Once you have learnt to a decent level, and are at professional capacity, definitely go for paid tools. At that point, they add a lot of productivity to justify their cost. But not while learning.

Conclusion

So that’s pretty much all the common ways to setup Python. I repeat, try all of them out and then settle with one. The data science world requires that you have familiarity with a broad set of tools. When you search for some solutions or samples, you don’t know what you will find. So it’s good to have some basic exposure to several tools.

As for me, my favourite approach is installing Python 3 using pyenv. Using venv to manage environments. And then run a local Jupyter notebook for doing reports. I use Google Colab for doing assignments that require collaboration.

Functional Programming Vs. Object Oriented Programming

Object oriented programming has been the de-facto programming methodology since they day I learnt that there is something called computer programming. Several of the most popular programming languages are primarily object oriented programming languages. The most commonly asked interview questions for programmers are about object oriented programming.

Until, functional programming just blew up a few years ago. Functional programming languages have been there since the 1960s mind you, but only a few years ago they gained traction among ‘commercial’ developers. I found a ton of people learning Scala and lambdas and observables and all the associated jargon. A group of us jumping ships and embracing functional programming as the way to go for all our new projects. And another group of us sticking to the familiar grounds that is object oriented programming.

So far I’ve done two projects in the functional programming style, both web applications. One in Scala and one in Java (Spring Reactor). Here’s what I’ve learnt so far as contrasts between Object Oriented Programming and Functional Programming.

Core Principle

The first difference that we need to appreciate is the core principle guiding these methodologies. Whenever I see coders struggling to adopt functional programming, it is because they don’t have a grasp of this.

Object Oriented

  • In object oriented programming, we think of everything as objects with state. The flow of the application is dictated by change in state of the objects involved.

Functional

  • In functional programming, we think of everything as operations. The flow of the application is dictated by the chain of operations.

Design Patterns

If you are like most other programmers, you would have learnt a bunch of design patterns to apply in your projects. And I bet many of them are not just ‘design patterns’, they are ‘object oriented programming design patterns’. You need to even unlearn many of these. If you try to apply these common design patterns when you do functional programming, it will be like hammering a square peg in a round hole.

For the last couple decades or so, object oriented programming has overshadowed all the other ways of programming and this has one bad side effect – so much of the learning material depends on real world analogies. We’ve all come across things like the Animal --> Dog --> Dalmatian kind of analogies right? Well you need to forget all that and realise that abstractions are only there to lighten the cognitive load.

Computer doesn’t do dogs and cats. Computers do sets of operations on very basic units of data. Data like bits and bytes. Functional programming relates more to this attitude in a programmer. You are not working with simulated models of objects. You are taking an input, doing some processing and giving an output. Try to apply that to any programming task that you do. For example, a web service, takes a request as an input, does its processing stuff, and then sends a response as an output. This can be broken down into several levels to achieve cognitive ease, by breaking down the processing into multiple functions, and then chaining them by giving the output of one function as the input to the next function.

Choosing between the Two

It is not simple to classify an application, because most applications fall into multiple categories for purposes like these. Think about the core purpose of your application and make your decision based on that.

Object Oriented

  • Object oriented programming is preferable when you are representing a world of objects. For example a simulator, or a video game.

Functional

  • Functional programming works well in scenarios where your application is a processing pipeline. For example a event stream processor or a data processing API.

When you are trying to choose between the two, do not think of peripheral activities like logging, IO (even database updates). Rather, think of the main purpose your application is solving for its user. Is it giving your user an object-state model that they can manipulate? Or is it providing an engine that transforms their input and gives them an output?

Combining the Two

In real world, you probably are going to combine the two programming paradigms, rather than use strictly only one. Functional programming seems like a good fit for a web service, but some of the components are still better represented by an object oriented model. We can think of the web service as a series of functions that, in the beginning take in a request, and in the end emit a response. But entities like service and repository can still stay as objects. In fact, a pure function will not have side effects, but this is hardly useful in the real world right? We need to almost always have side effects – update a database, write to a log file, send out an email and so on.

The Code

The most visible differences between the two programming methods is when you look at the code. Some important differences are –

Object Oriented

  • Objects are the core entities. The program flow is usually instantiating objects and modifying their states. Any processing that is done, is as a means to change some state.
  • Control flow in object oriented programs is done through simple and traditional constructs like loops and if-else blocks.
  • There is a global scope, then there is a session scope, then there is a thread scope. Or simply put, there is always some global state from where you can get your environment variables from. The ‘context’ (that holds things like current user, configuration parameters etc) are available from their respective scopes.
  • Concepts like threads and concurrency are handled by the application code. Even if you use a multi-threading friendly framework, you still might have to declare what is thread-safe and so on.

Functional

Functions are things. You create functions and assemble them in chains. You can assign functions to variables and pass them as arguments to other functions.

  • Control flow in functional programming is done by chaining, filtering and recursion. Streams are preferred to collections.
  • There is only a local scope. In functional programming, the best practice is to provide everything your function needs as arguments. The ‘context’ (that holds things like current user, configuration parameters etc) gets passed in as an argument to all functions that need it.
  • Since a ‘global state’ is not even assumed, programs are by default thread safe. And most functional programming platforms manage concurrency by themselves, upon this same assumption.

Conclusion

To put in a simple way, when you are doing functional programming, don’t think of objects and types – rather, think that you are making a lot of small black boxes. Each one takes an input and gives an output. And then you’re arranging all those black boxes to make a useful application. If you can grasp this, you’re probably going to have a very easy time settling into functional programming.

Please Stop Infinite Scrolling

I think this post is more like a follow up to my previous post on whether to use SPAs or not. Confession: I started writing that article, and this one, because of a website that completely irritated me. It was a matrimony website. If you didn’t know, matrimony websites are like dating websites, but you skip the dating and go straight to the wedding. The interfaces are made very similar to shopping sites – in fact it actually feels like a shopping site for brides and grooms.

Of course I have a problem with the ideology of those sites, but I’m here to talk about this particular one – and what irked me about it the most. The fact that it was an SPA was obviously off-putting because the concept of SPA was deliberately and unnecessarily thrust upon the poor unsuspecting wife-shopper. But

Where am I?

I have no idea how much I have already seen. A page with 16 or 20 items which I can checkout and then click on a ‘Next’ button to see the next set of items would have been so much clearer and easy to use. Without knowing how much I have seen and how many more is left, I don’t even know whether I should keep scrolling or whether I should give up. If there’s only another 10 items to checkout, I’ll continue browsing to checkout everything. But if there’s another 2000 items, I’d probably give up.

There’s no way to know that, when the page does ‘infinite-scrolling’.

How to Get Back to Where I was?

After scrolling through a countless number of items, what if I scroll back to the top for some reason? What if I refreshed the page? What if I want to look at the item that I saw a while ago? These are all not even possible when you are infinite-scrolling. You have to scroll back up all the way and find that item again. You have to scroll all the way down if you accidentally refresh that page.

Think about the User

I think the infinite scroll is a classic example of over engineering of a user experience. A classic case where a designer forgot all about user experience and just wanted to be fancy for the sake of their own vanity. Things like that are the bane of user interface design. The designer is so proud of something they did that they completely forget to get feedback from UX testers. Or even from actual unhappy users. The worse thing is, they probably actually spend more money on doing this, over the simple, easy-to-use traditional way.

Sometimes I Need to Reach the Bottom

This is the problem that I actually faced, and made me go on a rant on my blog. The menu I needed was at the bottom of the page. I had to scroll down to the bottom. Only, when I scrolled, the page just loaded more items and I had to scroll again. Then it loaded more items. After doing that a few more times, a brilliant idea struck me and I pressed the ‘End’ button – on a Mac, the end button takes you instantly to the bottom of the page. And surprise! Before I could move the mouse and click on my menu, the page loaded a bunch of items and the menu went back out of view. Not only I couldn’t click my button, the page loaded a ton of content that I wasn’t even interested in.

It’s Slow

Making one server request for a page displaying 25 items, is often faster than 25 separate requests for each item. Significantly faster. In most implementations of infinite scrolling, the page makes more and more requests as you scroll. Also, the experience of waiting for a second and seeing 25 items, is better than the items loading one by one with a fraction of a second gap in between. So infinite scrolling is not only actually slower, it also amplifies it’s own slowness, by reminding the user often that there’s something loading.

What’s Better Then?

What’s better is the plain old pagination. Don’t fix what’s not broken!

Do I know how far along I am, browsing the search results? Yes! Because a list of page numbers on the bottom always show me which page I am on, and among the list of the ‘finite’ number of things on each page, I can easily get back to where I was.

I can scroll down and see the footer, use the footer menu if there is one. My browser doesn’t have to load a ton of content that’s not useful at all to me. I get to have a calm peaceful life.

And the page doesn’t have to load a run a lot of JavaScript code if it’s avoiding fancy things like this. No matter what fancy techniques you use on your page, they will never beat speed. A snappy fast loading page, with familiar user experience is much better than fancy pages with things like infinite scrolling and animations.

Just do pagination if you’re showing me a catalog. Please. Thank you.

SPA or Not

The latter half of the last decade can be considered an explosion of SPAs. With introduction of Angular 2, ReactJS, Vue and a ton of such frameworks, creating highly interactive web pages became very easy. So easy that competing technologies like Java applets and Flash are being pushed into extinction.

Of course whenever a new UI technology comes in, it’s going to look exciting, have a big bunch of people jumping on the bandwagon, some realise that it’s actually not relevant to them, others realise something newer has come, then finally, most of them move on. But this time, a new ‘concept’ was spun out. The concept of SPAs.

What are SPAs?

SPAs or Single Page Applications are a concept, where your whole website is just one HTML page. Pieces of content inside the page get dynamically modified and updated using content from the server. For example, a single page that has a menu at the top and an empty box at the bottom – when you click on a menu item, the page will fetch the respective data from the server and populate the empty box. When you click another menu item, it will fetch different content, and replace the content of the (initially) empty box.

In contrast, traditional web applications are made of several pages. So the above example in traditional style would be each menu item would be a a link to the corresponding web page. Clicking on a menu item would just order the web browser to load a new page entirely. The disadvantage being, it’s a bit slower to load an entire webpage rather than populating just data into an existing container.

How to Decide?

SPAs are not an improvement on the web UI, and as such, it’s incorrect to assume that ‘modern’ websites are SPAs. SPAs are just a different way to make websites. So it’s important to choose whether or not to use them. This has become an important decision to make because there are significant differences in user experience between SPAs and traditional applications. So much so, that choosing the wrong type can either make or break the success of your web application.

When to Prefer an SPA

When you’re making an interactive user interface, where there is communication between the different components of the page, it’s better to do an SPA. For example, a dashboard that shows data as tables and charts. You probably would like if the charts are all interactive and respond to different clicks on the page – like if you click on a geography, all the charts get redrawn to show data only for that geography. Another example is a drawing application – a large canvas in the centre and a set of tools like pencil, eraser, shapes etc in a toolbar. These kinds of applications are even possible only because of the advances in UI frameworks and SPAs.

When to Prefer a Traditional Website

When your audience is going to consume information rather than interact with it, then it’s better to do a traditional website. Think of blogs, news websites, video streaming sites, forums – the bulk of the internet. It is unnecessary complication to do an SPA if the interactivity it brings is not utilised. Because it’s way more complex to develop SPAs than normal web pages. There are more possibility of bugs and weird behaviour. More importantly, you page is going to be unnecessarily large and slow to download – SPA frameworks are usually heavy.

Also, if you are making a website where people come to consume information, then you probably depend on search engines to bring you traffic. Well search engines are not very good with SPAs. Chances are that your website won’t even be indexed by search engines, if it’s entirely an SPA.

How to Choose

By now, it should be obvious that there is more chance that you do not need an SPA, because most websites exist for consumption rather than interaction. Most people come to the internet to read, watch or listen. And a smaller portion usage is interactive applications like posting blog entries, working with documents, editing images and so on. So the choice is simple. If your website is more for reading, watching or listening, then do a traditional site. If your website is more for interacting – filtering, sorting, drill-downs, slice-n-dice, drawing and so on, do an SPA.

How About Both?

The thing is most of the times, your website might have to do both. Think of a shopping website. The shoppers all have to read the pages, look at the product details, read reviews – so it seems like a site where people primarily read information. But, it also has to be interactive -filtering products, sorting search results and so on. What to pick in this case?

Such websites can benefit from both approaches. So I would use a combination of both. Start out with a normal traditional website. Then introduce SPA features into the pages where it’s necessary. So your website would be like a collection of pages, some of which are mini SPAs. For example, the search results page is a normal page without SPA functionalities. But to improve user experience, the product page might have features like commenting, reviewing, browse multiple product images, buttons to add/remove the product from the shopping cart etc. These can be done SPA-style, so that the user won’t be navigating away from the page to do these little actions.

Still Doubtful?

When in doubt, do a traditional website. It’s easy to get a normal website right. But getting SPAs right is hard work. Wait for circumstances to strongly push you towards SPAs – and then you can refactor your website to be an SPA. Because often when the developers are in doubt, it means there’s not much benefit in increasing complexity. Presenting an SPA when there is not need for one, will just make the user experience worse. Where the situation doesn’t demand it, SPAs stick out like sores and sometimes even end up irritating the user. So again, if you’re confused, just do a plain old website and live peacefully ever after.

How to Create Work Life Balance

Quite a significant proportion of people struggle with the concept of work-life balance. No question, it’s the buzzword whenever there’s a meeting between HR and the staff. No question, all of us want it. But even so, it’s something that is so elusive to actually achieve.

Know About Parkinson’s Law

Parkinson’s law is that work expands to fill whatever time is available. If you give more time for a task, you are more likely to put more effort into it, which sometimes might not even be necessary. Have you noticed some people wait till the last minute and then quickly churn up something to complete their task? And other people start way early, but still are in the same kind of rush towards the end to complete the task?

That’s parkinson’s law at work. When you have more time, your mind puts more into the task and makes it bigger. When you have only less time, your mind prioritizes the subtasks and gets you to still complete it within that time. How to overcome it? Keep milestones with deadlines. What will you complete before 1 o clock? What will you complete before 3 o clock? What will you complete by the close of work? Without doing this, you are prone to subconsiously think there’s time available until late night and complicate your work more or improve it’s quality more than necessary. Surely, quality is important – but remember, you can be indefinitely improving quality of your deliverable.

Photo by Yasmina H on Unsplash

Do Not Overestimate Yourself

One of the biggest problems when we plan is overestimating ourselves. If you have ever indulged in some retrospection, you’d remember how many times you overestimated yourself while planning. For example, when I was in school, I used to plan that I will study one chapter every hour of the day. I’m quite embarassed to say that this continues even now more that I’d like it to. Last week I planned I will close 8 bugs in my project at work – without even knowing the root cause of those bugs. But it’s okay because I catch myself most of the time – and you should too. Once you overcommit at work, then you will quickly find yourself sacrificing other areas to cope up for it.

It’s always better to undercommit and overdeliver. Ask for more time than you estimated, finish the task diligently, then use the remaining time to polish your code. That’s how you shine. Not by overworking yourself.

Be a Team Player

Are you often thinking it’s better to complete something yourself instead of passing it on to one of your team mates? Especially if you are in a leading position or you are managing a team? If you are not able to delegate tasks and get them completed, it usually means that you are not a good team player. Think and identify why you are unable to delegate tasks to your team mates. Are you having trust issues with your team? Does your team need more training to contribute better? Are you having communication problems – are you simply shy to talk to people?

The biggest strength of corporate structures is working as teams. If you are not taking advantage of it, then you have a serious problem you need to rectify. The typical workaholic mentality is ‘I can just do it myself in the time it takes to explain it to another person’. Wrong. Even if you are a super-skilled master programmer or something like that, it’s highly unlikely that you’re as efficient as a team working together.

Plan Ahead and Stick To It

Plan ahead for your work. I’ve found my sweet spot planning weekly. Some people like to plan more, or less, frequently than that. But you should always plan. Beware of becoming a perfectionist and spending to much time and effort in planning. I say this because that’s the reason a lot of people give up on planning. The planning itself should not become a stressful chore. At the beginning of the week, I like to just quickly chalk up a few tasks that I’d have to see completed in the week. Better if you can do it with your team. I’m not saying you should do the same, but I recommend you do something similar in a frequency that suits you. Right off the bat, it reduces stress by a lot because a lot of surprises get avoided.

More importantly, do as much as possible to avoid taking up unplanned work, once you have done this planning. Although there are genuine chances of unplanned – yet important – work turning up, in my experience, we often accept such work because we hesitate to say no. If you frequently find yourself unable to keep up with what you commit, you should try this – make a light and easy plan, keeping in mind it should be significantly easier than you can handle, and then stick to it. Lean towards sticking to your plan, rather than impressing people by handling unplanned activities. If you are building a reputation, let it be that you respect your time and will not let it be taken for granted.

Create a Life Outside of Work

Many people who struggle to create work life balance, simply don’t have a significant enough life outside work. You might have a family, but if you don’t give it importance, it’s as good as non-existent to you. Similarly you can’t say you have a hobby, if you only indulge in it a couple of times a year. Commit yourself to atleast a couple of things outside of work. For most people one of these things can be your family. Create habits that keep you connected with them. It can be a little habit like, everyday you will have dinner with your family at 7.30 in the evening. Commit to doing activities with your friends.

If you have more things in your life that makes you interesting, your mind won’t have to rely on your work performance to feel good about yourself. You’re less likely to link work to your ego. You’re more likely to plan and organize your day in a healthy way. It is absolutely necessary that you have time for face-to-face interactions with other human beings outside of the scope of your work.

Let Go of Your Ego

The more committed and successful someone get at their job, the more their ego get’s blown up. Soon they find themselves unable to say no, unable to ask for help. Even if they’re obviously overloaded and are struggling to cope up, they don’t reach out and express it. It’s as if they believe people will think less of them if they can’t handle their load. This is people’s ego talking. You need to recognize when your ego is hurting you. The fact is, people who don’t act on time regarding these things, end up hurting themselves more.

Imagine your project is slightly off track, and you hesitate bringing it to attention, because you think it would be perceived as your weakness. A few weeks later, it becomes a more serious problem, and your sense tells it’s better to bring this to people’s attention. But now, it’s harder, because you also have to answer for why you didn’t highlight it earlier. So you will probably try harder to bring the project back on track, without letting people know the seriousness. Avoid this mess by having a clear head, and never ever give in to your vanity at work. It’s okay to reach out and get help when you are overloaded. It’s okay to say the task might take longer than expected.

If you find yourself in a stressful mess, staying back late, day after day. Stop for a moment and ask yourself. What’s the worst that could happen if I tell my manager that i’m struggling and need help? Surely it’s not worse that having an unsatisfying life or spoiling your health.

Conclusion

It’s not very difficult to create a healthy work-life balance that you need to be researching the internet about it. It’s easy. You just need awareness. Stop being a robot and pay attention to what you are doing, what you are feeling, and whether you are happy. Then do something about it.