Machine learning can be an opaque undertaking. As algorithms grow more and more complex, we need specialized tools to answer questions like, "Why did the computer think this was spam?" or "Why did your service recommend this movie to me?" In my last post, I wrote about a model that was elegantly straightforward: the information content of an item is the positive logarithm of its frequency. Newer models, however, are far less transparent: most famously, the output of Google's DeepDream pattern-recognition software produces phantasmagoric images that are at once completely fascinating and entirely unfathomable.
- machine learning
Clojure is great and it has ton of features that allow developers to increase productivity. With this post we intend to show how to configure clojure to use a repository manager as proxy and how to setup the clojure project to deploy JAR files for shared usage.
Clojure has a great development flow, but a few tools aren't included by default. By adding a few dependencies and plugins to
~/.lein/profiles.clj, you can make your development workflow smoother, quicker, and more effective.
Clojure, like many Lisps, sometimes struggles to attract newcomers who claim it's "hard to read". Any paradigm shift requires time, but I myself struggled to read Clojure I had written early on. Nested parentheses and REPL-driven development made the result come quickly, but it often looked ugly. However, the thread operator
-> and all of its cousins fix that.
In this post, I'll demonstrate my all-time favorite natural language processing (NLP) trick: "surprisal", a statistical measure of the unlikeliness of any event, which can be applied to just about anything that you can count. Scala is a wonderful language for this sort of data crunching, largely because of Apache Spark, a powerful distributed computing framework. For this post, I'll be using Apache Zeppelin as an interactive, web-based shell around Spark. If anyone's interested in following along, I encourage you to download a Zeppelin binary distribution and have fun!
At Spantree we periodically do hackathons, sometimes for internal projects, at other times for non-profit organizations. Every year on Martin Luther King, Jr. Day we held our annual hackathon for social good, which is always exciting.
We recently were tasked with setting up a container management solution on Google Cloud Engine (GCE). After standing up Mesos and Marathon on CoreOS, our initial tests worked fine and we deployed several docker apps just fine to Marathon. We ran into a snag, however, when switching from a public registry to a private Google Container Registry.
Starting out with Jekyll seemed intimidating - I’d be working in a text editor and running code from the command line - but it doesn’t get much more difficult than that.
As we continue to evaluate Clojure, new avenues present themselves. In the past, highly concurrent processing at Spantree has been done with node.js or a JVM solution on top of Jetty. While these work, I never found them particularly easy or enjoyable to use.
It’s possible that containers and container management tools like Docker will be the single most important thing to happen to the data center since the mainstream adoption of hardware virtualization in the 90s. In the past 12 months, the technology has matured beyond powering large-scale startups like Twitter and Yelp and found its way into the data centers of major banks, retailers and even NASA. When I first heard about Docker a couple years ago, I started off as a skeptic. I blew it off as skillful marketing hype around an old concept of Linux containers. But after incorporating it successfully into several projects at Spantree I am now a convert. It’s saved my team an enormous amount of time, money and headaches and has become the underpinning of our technical stack.
We've been using docker on our projects recently to ease development and deployment processes. Here are a few tips based on what we learned building and maintaining docker infrastructure for production.
Since its launch in 2009, Spantree’s team has shared a passion for volunteerism. As the second largest non-profit market in the country, Chicago is home to many worthy organizations and with a diverse range of personal interests and passions among us, we felt a unanimous pull to use our skills and talents to service our community as a team. With the spirit of social justice in mind, it seemed fitting to dedicate our community building efforts toward a special project each year on Dr. Martin Luther King, Jr. Day. Thus began our annual Hackathon for Social Good.
Reading from an Excel file is surprisingly easy in clojure. We'll see an example in groovy and then compare it with one in clojure.
I love games, and I love game theory. Recently Spantree started doing lightning talks, which gave me the chance to share some of my favorite group "games".
- client relations
Always interested in new approaches to testing on the front end, my ears perked when a functional testing framework called Sparrow.js popped up in my twitter stream. In essence, Sparrow allows you to run Selenium style tests defined with Jasmine style syntax. The immediate appeal for me was that front-end developers with experience in Jasmine could easily side-step into using Sparrow to write tests for broad user interactions that span across multiple pages. I played around with it for an afternoon, and here's what I learned.
- functional testing
On every professional software engineering job I've had, data transformation always played a role. Most recently, at Spantree we needed to get a massive corpus of synonyms into the file format readable by Elasticsearch.
Rendering html templates in Ratpack is very straightforward. We'll use the built-in templating features of Ratpack's Groovy module to render very simple templates, to complex templates with sub-templates; and templates with dynamic data.
Ratpack is a "simple, capable, toolkit for creating high performance applications" on the JVM.
It was originally inspired by Sinatra but it has taken a life of its own with very interesting concepts.
It is virtually a treasure cove of neat tricks with Groovy and its
@CompileStatic feature. It has not
reached 1.0 yet, and it is under heavy development, with releases the 1st of every month.
Ideally, in a production system, everything works perfectly. Services never mysteriously crash, free memory is constantly available, and CPU load rarely spikes above 50%. Unfortunately, this is not always the case.
Backbone.js does a great job of providing structure to complex front-end applications, but oftentimes we find we need to do more to further abstract domain logic so it does not depend on the UI layer or the backend. In this article, we talk about how to apply the repository pattern to encapsulate interactions with the backend.
Sometimes there is no amount of styling you can apply to Google Maps to make it look the way you want. To really control what is on the page, you will need to create your own maps and serve them up using your own tile server. But where do you start?
With the rise in popularity of Google Search Appliance and Elasticsearch, many companies are interested in the best search solution for them. At Spantree, we've been asked several times if the GSA or Elasticsearch is a better solution, and we have decided to cover the various strengths and weaknesses. While both are solid search tools which can meet most needs, they specialize in very different domains. To understand the strengths and weaknesses, it's important to note what the general philosophy of each technology is.