Notes from a CTO #15: My first job and Time I rediscover Database & SQL Joins
My first job is always close to my heart, and my proudest achievement there.
1. Thought of the Month
This week, let's discuss how I rediscovered databases and SQL joins.
This might seem like a trivial topic and common knowledge, but you need to understand a few things to empathize with my situation. I studied industrial engineering and only took one programming subject during my bachelor's degree, barely passing with 33 marks (the cutoff was 32). Now, let's begin the story.
After finishing my industrial engineering degree, I applied for a business intelligence position at Kaymu (an e-commerce company that was just starting up, now known as Daraz, my first real job). They had an aptitude test with 52 questions to be completed in 20 minutes. Later, I received a call from HR saying my aptitude score didn't meet their requirements(later I found out I was second top scorer but lack experience so HR reject me for BI) for the business intelligence role, but they had a logistics manager position available if I was interested. I agreed but mentioned that I'd like to move to BI in a few months. The HR representative said okay and that we'd discuss it with the manager later.
I had one interview with Jai Patel, which was very casual. I still remember it was in a simple room with a sofa. The HR person asked me to sit and went to get Jay. Jay came back with a laptop, carrying it in a peculiar way - open and balanced on tip with one hand. He didn't seem pleased that HR had interrupted his daily operations (people came to the door three times during the interview to ask for something). We just had a casual conversation; I had prepared so much about logistics, but he didn't ask anything related to it. He just asked how I would handle certain situations. Although he didn't look pleased, maybe he didn't have any other options. I got a call from HR at the end of the day asking when I could join. I said in a week and joined accordingly.
It was a typical startup, 1.5 year old with around 50 people and an open floor concept. When I entered the office, HR gave me a brief 15-minute onboarding, and Jai assigned me a chair next to him. We had a big table for the sales team, where Rajeev (MD) and Jai were sitting. This concept is something I fell in love with - no matter how big the company is, I will make sure to spend half my time working at a shared table with people. It helps me get a pulse on people: who's frustrated, who's trying to hide things from me, who's hardworking, and who's faking it. It helps me choose which tasks to assign to whom.
The task as a logistics manager was very boring, as you mainly create product pickup sheets for logistics. For those who don't know how we started e-commerce in a country like Nepal works with complete cash on delivery, where people don't even know how to use smartphones properly or don't have smartphones at all, here are the steps:
We onboard sellers by getting their details. Some team members go to their shops to take pictures.
Another team creates listings and uploads them to the website.
When orders come in, the pickup team goes to the seller and gives them a slip with the order details(hand written) .
The seller searches for the product (hoping it hasn't already been sold) and takes a slip of paper as proof.
The pickup team brings the products to our Kaymu office.
The next team packs the orders (sometimes Jai and I help if there are many orders).
Pass these on to third party delivery. They deliver the products and collect cash.
During the next pickup, people give cash and get a slip of paper as proof of payment (if the slip has multiple orders, they just cross one out).
All of this was managed on one Excel sheet with various sheets and VLOOKUP functions. Nothing was deleted; we had various sheets for orders, pickups, in-transit items, deliveries, payments, and refunds. The sheet had more than 100K+ rows, and all other values were looked up from a master sheet, making it very slow. Saving took 5 minutes. We used to back up that Excel sheet in Dropbox twice a day; the entire Kaymu operation depended on that sheet. If we messed up, there was no other record. Yes, we had a backend called BOB, but it had its own issues. Order statuses weren't reversible, so if someone mistakenly marked an order with a different status, we were in trouble. We never trusted BOB except for getting fresh orders.
Jai added some improvements, eliminating physically written bills and creating order slips using Excel only, with barcodes and other details. He left after 6 months, and I and Abinash became the owner of that sheet. Our orders started to grow rapidly, and Excel began to crash and have other issues. At one point, it took 20 minutes just to save, and sometimes the save would fail entirely.
Rajeev came to me one day and said, "Bikram, this needs to be solved. I don't know how, but there must be a way." So I took it upon myself to solve it. I used to travel by public bus to the office, a 45-minute ride. I would leave a bit late in the evening, usually 45 minutes after the office time, as traffic was lighter then, and I could get a seat on the way home. There are two places where I think very clearly: on the commode and during my ride home.
I had been thinking about this problem for few weeks and visualizing tables and sheets in the air. Than one days on ride home, it hit me that all of these was like Jenga tower. All these columns were just the same data type, and I was just matching data between sheets. I visualized all of these columns and realized I just had to match and keep items from the next column. It was like combining two Jenga towers. I still remember making columns in the air and visualizing different sheets, playing it like two Jenga towers and kicking some values from one Jenga to the next - that was how I discovered database tables and SQL joins. Its seems stupid now, but at that point, moving from VLOOKUP to database and join was such a difficult concept. I thought I was stupid not to think of this solution earlier, so I also wrote about it on my blog (more detail on why excel people have bad habit due to vlookup), and it became my top page as it was a question many people who moved from Excel to R or Python searched for. It gave me little comfort to know that I was not the only stupid person.
I went home, did some research, and figured out I could use MS Access. I learned about real SQL and databases later, a story for another day. I learned Access in 5-7 days, replicated that Excel sheet in the next two weeks, even made forms, and battle-tested it 3-4 times completely. I think I remade that database more than 5 times before I was finally happy with the result. It addressed all the edge cases of Excel and had new features like a form to update single orders so people wouldn't change tables directly like in Excel, a process for bulk uploading tables, and instructional videos.
That system stood the test of time and was used for the next 5 years until it was replaced by Alibaba's system. Daraz had a new system called Delivery Hero, but it failed to replace the Access database as it had too many nitty-gritty details and performances were not an issue, and the team never fully adapted to the new system.
Here is a picture of the database dashboard: Thought out, learned, built, and implemented it in 4 weeks, along with all my day-to-day tasks.
Link to other dashboards I build at Daraz.
2. Podcasts/Essays
Lately, I have started thinking about LLMs and AGI. One interesting point I always think about is that we didn't get cars by trying to make a better cheetah or get planes by wanting to build a better bird. The same is true with LLMs—how do we get from LLM to AGI? What is AGI? Can we consider a calculator a kind of AGI since it is better at math than all of us?
To delve deeper into how we think and what makes us human, here's a good article on the subject:
How can we develop transformative tools for thought?
3. Interesting links
Repos:
open-webui: The best open-source LLM chat interface I have found, with all the required functions.
Logfire: Always been a big fan of good logging, tracking, and monitoring tools. How do I know if someone is a good programmer? It’s by how good they are at logging. This tool comes from the folks who built Pydantic.
Twinny & continue: Looking for an open-source GitHub Copilot alternative, and these two look like good options.
outlines: Converting LLM output to a structured format is always painful. With the latest models like LLAMA-70b and GPT-4, it's getting better. I am always looking for a library to make life easier.
skyvern: Automation using LLM excites me, with hope of make life better for so many hour we spend repeating same thing again and again. I hope to build some hobby projects on automation using LLMs soon.
For more, follow me on Github: bkrmdahal
Articles:
Clay’s Path to Product-Market Fit — A 7-Year 'Overnight Success': Clay is one of the startups that is really inspiring in how they have been able to solve problems and also ride the AI wave. I always thought they pivoted in the last 1.5 years, but it's good to know it was a seven-year overnight success. A good example of how they found PMF and stayed focused.
Fine-tuning LLM: We have not gotten good results from the fine-tuning experiment, but I still think there is a lot of potential in fine-tuning once we find the right balance of weight and task.
Multi token prediction: Inference speed is still a big issue for LLMs when you have to process a lot of tokens and also output a lot of tokens. Any novel approach to speed it up is always interesting.
Why the Pandemic Probably Started in a Lab, in 5 Key Points: I am extremely careful about conspiracy theories. The first conspiracy theory I believed was that we didn’t land on the moon. That documentary was on next-level. After learning more and improving my understanding, I don't want to have a closed mind in the future. When the COVID lab theory came out, I outright rejected it, thinking people can't be that stupid, but I should not underestimate people's stupidity. That was not the right time to discuss and divide the world with more hate at middle of pandemics. But now we are in a better position. Let's try to find the cause so we can stop future pandemics. Here is one article that caught my attention. I'm not endorsing it, but if leading experts say it's a possibility, let's keep our minds open.
4. Quotes/ Books
First, solve the problem. Then, write the code.
– John Johnson
Lately, I have been extremely careful about taking on new projects. Even hobby projects, I approach with caution, trying to solve them without writing code by using simple LLM prompts or no-code solutions before writing real code. As developers, we love to code, but lines of code are not equal to the problems we solve and the impact we make. Usually, it might even be the reverse.
Meme from our Slack ( two memes because We have too many in channel)
For ML engineers, YOLO v10 just came out
That’s it for this edition. I hope you find it useful.
Best,
Bikram Dahal
P.S If you learned something new today, please share “Notes from a CTO” with your friends and spread the love. ✌🏻