Learning new language: Go!

Hi!

Just leaving a quick note here that I started learning new language - Go (a.k.a Golang). This post will contain some info about resources I'm using for the learning process + the project I'll be coding in Go.

Why Go?

Two things stood out to me when doing the research on potential new languages to learn:

  • simplicity (only 25 reserved keywords)
  • performance (goroutines for performant and concurrent execution)

I don't see any use cases for Go in my current job, so I'll be doing it mostly for fun & "sharpening the saw". I hope to gain a new perspective from the process since so far I've been using only dynamically-typed languages (Python for almost 90% of my day-to-day work + some occasional JavaScript for side-projects). Adding statically-typed language to my stack would be awesome!

Resources

Order of below list is based on the order of me picking up particular resource.

  1. Tour Of Go - interactive tour of the language, teaching basic syntax & concepts/idioms used in the language. Since it's interactive, you can try out Go code directly in the browser - no IDE/text editor required to start.
  2. Go: The Complete Developer's Guide (Golang) - Udemy course from Stephen Grider. I haven't gone through whole course yet but I really like Stephen's way of explaining things. Plus, course can be purchased pretty cheaply with one of the "forever-lasting" Udemy promotions.
  3. Learn Go with Tests - this is an ebook that introduces test-driven development in Go. Even though I'm not using TDD for this app, this book has been pretty valuable resource that helped me with writing tests for basically all the code I've written for now in Golang.
  4. Go In Action (Manning Publications) - so far I've gone through first 2 chapters, but already loving the content. Author gives you good idea of how idiomatic Go should be written on the real-world examples.

Let's not fall into tutorial hell - first project in Go

What is employee engagement survey?

To avoid ending up in tutorial hell, from the very beginning I wanted to have a clear vision of first project I'll write in Go. I decided to write a calculation app that processes results of employees engagement survey. It's a kind of survey that company would send to its employees and ask about various aspects of working environment, for example:

  • I would recommend XYZCompany as a great place to work
  • I rarely think about leaving XYZCompany
  • The leaders at XYZCompany keep people informed about what is happening
  • I have access to the things I need to do my job well

And many, many more. Employees answering these question would usually use some numeric scale like a range from 1 to 5 where:

  • 1 means I completely disagree
  • 3 could mean Neither agree nor disagree
  • 5 could mean I completely agree
  • 2 & 4 would be some kind of transient answers (something between completely disagreeing and not having any opinion and something between completely agreeing and not having any opinion)

Organizational structure and demographic data

Each employee answering such survey is usually assigned a specific place in organizational structure of the company. For instance employee X could be working in Internal Finance team, employee Y is part of IT team, and employee Z is the boss, so he or she sits at the very top of the organizational structure (you can also perceive it as a root of organizational structure tree).

organizational tree visualization


Having information about organizational structure can provide you with more detailed insights that are broken down by organizational unit - for example you could see that 70% of Finance team responded that they feel undercompensated (maybe it's time to review their salary and compare against the market in order to prevent some potentially valuable employees leaving).


Another important aspect that can help getting more insights is demographic data, such as tenure (i.e 3 - 6 months , 6 months to 1 year, 1 year to 5 years, 5+ years), employee age (< 25, 25 to 35, 35 to 50, 50+), gender and many more.

Putting data together

If we take above mentioned data and try to compose it into single dataset representing survey results, we'll have a tabular data structure that we could store in a CSV file. Columns would consist of:

  • EmpID - it's good to have some attribute uniquely identifying particular employee, could be either int or string.
  • OrgNode - this column contains organizational structure assignment. We'll represent it as string using this N01.X notation that was shown in a picture earlier. N01. is a root of whole org tree. If N01.01. represents Finance, then all Finance team employees would be assigned to a unit starting with N01.01. prefix.
  • Questions data. They will be represented by Q1 to Qx, where x is the amount of all questions. Questions contains int data. Each question would usually have the same scale (i.e. 1 to 6) but there might be some exceptions (for example NPS - Net Promoter Score, which is represented by 0 to 10 scale). To allow for such exceptions, we'll make our application quite flexible in this area and will ask our user to provide schema - which would describe every question in detail (like minVal and maxVal).
  • Demographic data, represented by D1 to Dx headers. Similarly to questions data, demographic data contains int value ranges for each demographic will be described by schema structure.

Each row in our data is representing answers, org structure and demographic data of single employee. To make it more visual:

tabular dataset visualization

As you can see, this kind of data fits nicely into dataframe structure, which is leveraged by most of the modern data processing frameworks (i.e. pandas ).


I know that there are Go libraries implementing dataframes. But for learning purposes, I decided to stick to Go's standard library as much as possible, so I'll be coding dataframe of my own.

Calculating cuts

The combination of OrgNode and demographic data columns constitutes something called Cut. It's a "slice" of survey data (not in a Golang meaning, where slice is a dynamic array data structure) that enables you to calculate results for particular slice of organization & demographic data. For example, you could make a question like:
Give me results for all female Finance employees that are working for more than five years at our company.


Assuming that:

  • gender is represented by D1 column and female option being encoded as 1
  • tenure is represented by D2 column and 5+ years is encoded as 4

you could construct a following (pseudo-code) Cut:

Cut {
  OrgNode: "N01.01.",
  D1: 1,
  D2: 4,
  FilterType: Rollup,
}


There's one more thing in this cut that we haven't covered yet - FilterType. It's an enumeration represented by 2 options: Rollup or Direct.


Rollup filter type means that when filtering the employees, you're counting in every employee that belongs to specific organizational unit and all its subordinate org units. Finance team has three subordinate units: Accounts Payable, Accounts Receivable, Payroll. Rollup filter type for Finance (N01.01.) would include all of them. From application code's point of view, you would basically test if current unit starts with a prefix of N01.01.. If so, count this employee in cut's results.


Direct filter, on the other hand, includes only employees that have exactly N01.01. as their org unit assignment. Nobody from Accounts Payable, Accounts Receivable or Payroll would be included in cut's results.

Other business requirements

To provide appropriate level of anonymization, you'd usually set some threshold of employees amount for which you'll be showing aggregated results. For instance setting this threshold to 10 means that if any cut having less than 10 respondents will not be reported. My application should take such parameter as an input and include it in its calculations.


For reporting purposes, you might need to aggregate some answers and count them together for reporting distributions - for example you might want to count together 1s & 2s (both representing unfavorable sentiment) and 4s & 5s (favorable sentiment) and present their aggregated distribution. Application should enable such aggregations.

Inputs / outputs shapes

There will be 2 inputs required from the user:

  • CSV file with EmpId, OrgNode, Qx (questions), Dx (demographics) columns data
  • JSON / YAML file that would describe survey's schema - question texts, minVal, maxVal for question & demographic data, information about results aggregation for questions data.

For the outputs, the app, given above input, should generate a file with aggregated result. I'm not sure yet about it's final form, thinking about CSV / JSON/ Excel XLSX formats.

How did I came up with idea for this app?

I used to work in a company that had a branch devoted to conducting, analyzing and reporting employee engagement results (back then in my times it was called Aon, but it's been recently acquired by other company and transformed into Kincentric).


Processing data of this kind is well-known problem for me - I know the algorithms and I can fully focus on coding them in Go. This hopefully reduces the chances of getting stuck and not finishing the project.

Parallelization of problem

When you'd process such data in real-world scenario, you'd have a list of cuts for which you'd like to calculate the results. Depending on the size of the organizational structure and amount of demographic data available, that could be tens, hundreds, or even hundred of thousands of different cuts.


Each cut is totally independent from any other cut - they only thing they have in common is dataset. This means that if calculations you're trying to perform take too long, you could try to make them parallel. When tackling these in Excel, that could mean splitting cuts into multiple parts and running each part on separate machine in parallel. While trying to code similar application in Python, I did the parallelization part via leveraging multiprocessing.


Once I'll dive deeper into Go, I hope to find some nice way to speed up computations (goroutines?).

What's next?

Since with this post I've given brief introduction to business problem, next posts from this series will be diving straight into my clunky Go code for this app.


See you soon,
Kuba