Tech Blog Menu

CoveoBlitz - BlitzDay

By

Update: Links to the Blitz registration page were added.

Blitzday

Saturday. 7 AM. Normally I’d be sleeping in. But alas, this Saturday is not like your regular Saturday. I’m wide awake and super hyped. No, this isn’t the first chapter of my new fantasy book, it’s real. It’s Blitzday. I’m ready to crush through code like never before.

While sitting in the classroom booked by Coveo for Blitz, I finally learned that all my research and practice over the holidays had paid off. The challenge announced for the Blitz was to build a search engine! Well…it was more than JUST a search engine, but my teammates were taking care of the other elements of the challenge.

If memory serves me correctly, here’s what we had to do:

  • Crawl many text files.
    • The faster your crawling is, the more points you get.
  • Index said text files.
    • The less memory you use, the more points you get.
  • Answer search queries.
    • The faster you are, the more points you get.
  • Support complex search queries (AND, OR, NOT, etc.)
  • Sort results by relevance.
  • Have a nice UI that also supports mobile.
    • Keep in mind it’s 2011, this was HUGE. Sent from my Blackberry.

We were a team of four ready for almost everything. Because of my research and practice, I owned the index process. Eric, who was already well-versed in front-end development, started working on our UI.

It might sound crazy, but back in the day there weren’t many options for front-end development. Mobile wasn’t as ubiquitous as it is today, so Eric had a handful of tasks for the little 8 hours we had. Etienne, who had previously dealt with web service development, handled the communication between the web interface and the index. He did this using WCF (is this still used today?). Francis was in charge of crawling the content. He needed all the luck he could get - there were a lot of files, and Coveo’s requirements were quite aggressive!

Knowing that splitting the work up and merging near the end is, in professional terms, a piss poor strategy, we knew that we needed some way to sync our work. We used Subversion (SVN) as a source control. For you kids out there who get to play with fancy “distributed” source control, you should know that life was harder back in the day. To make sure that we wouldn’t go on huge tangent, bikeshedding our way to the last minute, we used a pomodoro (more on why we used a pomodoro later) as a way to time our work and force us to synchronize at regular intervals.

Let’s start the day!

The Scramble

Okay, okay, I’ll admit it. I didn’t attend Coveo Blitz just for the challenge. I wanted that top prize. My laptop was 💩, and an Alienware laptop was the holy grail. I needed to go ALL OUT. You’re probably thinking the same thing I did, going with an index + reverse index coupled
with a word frequency metric for relevancy was an amateur move. I went all out with a full Latent Semantic Index. Can someone say “#YOLO?” (yeah that’s an anachronism, it didn’t exist back in the day).

Francis started right away on crawling and Etienne was finishing the SVN setup, then quickly moved to the server communication part. Eric downloaded the latest version of jQuery and got ready to setup a nice UI. We choose C# on backend side, because we’re all more or less comfortable with it, and, at the time, it was the safest choice for us.

Pre-Lunch

Remember the pomodoro? It’s more than a delicious sauce; it’s a technique. So a pomodoro is 25 minutes (+ the sync part) which means we had a few quick “sprints” of work before lunch time. I won’t dissect every sprint to save you time, and mostly because it was 7 years ago and I don’t really remember.

What I first did is build a word to document index and a reverse index (using fake documents). This would be useful later to provide results to searches. It wasn’t a working search engine yet and every pomodoro sync made me feel more and more nervous; it was not going well for me. Francis managed to crawl all the documents (it was slow but functional, similar to a sloth, or more akin to what he’s like to deal with in real life). Hopefully, I could ask him to do the integration between my part and his, so I could focus on getting something searchable. Eric and Etienne were doing quite well. Etienne managed to get the server up and running quite quickly allowing him and Eric to send each other fake queries and fake replies. Eric was getting a nice UI up and running.

I needed to get that search searching, or else the drive home will be very bad…and lacking Alienware laptops.

Lunch

Hahahahahahahhahahahahahaha. Did you think we’d stop to eat or that we even had time for that? If I recall correctly, the Coveo crew started bringing food to us seeing that no one would leave their desk for any reasons. (Coffee doesn’t count. It’s Saturday and I’m normally still sleeping at that time, so it’s a matter of life and death I guess at that point).

Post-Lunch

Hindsight is 20/20, I guess I should have divided the day by coffee intake, oh well.

Back to the story: It’s mid-day. We have a webpage, a server, and we’re slowly crawling documents (Francis is on it, he assures us), Etienne’s server is receiving queries and he is starting to parse complex query expressions while Eric is still adding nice visual features to the website. On my end I have a non queryable data structure which is taking a lot of memory resources and taking what feels like an infinite time to answer. AKA, it’s not working. Let’s do…this?

While I’m finally managing to get some answers out of a big data structure, I can hear my friends achieving great stuff next to me. Francis is crawling faster and faster. Etienne, after realizing that parsing query isn’t a trivial problem, starts to pair program with Eric. This could be a problem.

Oh, wait, it’s not. Why? Pomodoro technique, baby! What do you know, hard problems can be solved quickly if you put two smart people on it at the same time. While all of this is unfolding, I manage to get some queries out. It’s still slow, and it’s still taking a large amount of memory. Did I mention there are only two hours left? I better get optimizing!

FULL STOP.

What the…the…Coveo team is celebrating.

How is that even possible?

Every year Coveo gets a team of their employees to compete with the students, but they must be ninjas or something (I work with them now, I can confirm they’re basically ninjas.). While we’re slaving away, barely scraping any points on the evaluation charts, they’re celebrating and drinking beer because they’re done. Remember that student cockiness of mine? Yeah, it’s sobbing loudly in a corner while slowly rocking back and forth. Meanwhile, I’m trying to fit my damn data structure in a sane amount of memory while still answering queries in a timely manner.

One hour to go.

As per Coveo Blitz tradition, “The Final Countdown” by Europe is playing over the loud speakers. The pressure is on. Are we fast enough? Do we have enough features? How many points do the other teams have? Did I leave the oven on? MUST. KEEP. CODING.

Time’s up.

Coveo’s correcting team is going around and I can hear them praising other teams for doing incredible things. Like crawling the entire corpus of documents in less than 30 seconds while others had awesome search results with tuneable relevancy. While my dreams of taking a new laptop home are fading fast, I find myself trying to ease the blow. I can just buy another laptop, like an Acer! Acer’s are as good as Alienware, right? Right?

The Wait.

Before announcing the winners, Coveo brought us to their offices so we can have a tour and drink a beer (potentially several in case we lose). The stress level is rising, and in that moment I finally understood the introductory verse to Eminem’s hit song, “Lose Yourself.” While I didn’t have vomit on my sweater, my knees were indeed weak, and arms heavy.

In The End

Sadly, this isn’t some sort of “underdog” feel good kind of story. Nope. We lost.

I'm crying

Well, even in the face of failure, especially when you’re not getting that sweet Alienware laptop, there are still lessons to take away from all of this.

First, try to sync up as often as possible. Obviously, there is something to be said about productivity and interruptions, but syncing up often helps the group stay on track. Knowing where your team is at also allows you to ask for help and prevent being stuck for too long.

Second, getting something up and running as quickly as possible is an easy way to reduce the stress.

Francis managed to get his crawler up and running really quickly, which was an easy way for us to calm down. At least we have something working. My portion of the work didn’t work until late in the day. That was stressful and a huge risk.

do {Ship early, ship often} while(blitz).

Last, but not least: Enjoy the hell out of it! I created a stronger bond with my friends, met talented people who I now work with and, this is not negligible, got the chance to write that I participated in such a cool event on my quite empty early-career-resume.

Alright! Off I go, I still have some final touch ups to do on this year’s edition of the Blitz. You did register, right?! If not, make sure to join us next year or to follow the event on our Facebook live stream!

Andy out, Blitz on.

Written by
Machine Learning Platform Developer