Hi fellow geeks,
Here I am living a new professional experience, this time in a company which core-business is audio/video streaming and real-time audio recognition. I’m starting on a couple of very interesting projects. One deals with real-time speaker recognition, while the other aims to be an automatic music-matching service powered by a revolutionary (or so they say) algorithm that calculates similarities and determines distances within a large universe of tracks.
Both projects share some algorithms and most of all, workflow. Both have to work in a distributed way, and by that I mean having multiple different jobs running in parallel in several machines, exhausting each machine processing units (CPUs/cores) while persisting resulting data in a distributed-filesystem.
The choices here were pretty obvious to us, JMS and Hadoop FS.
JMS is an API than can be better described by the publish-subscribe pattern and, what we’ll basically have is a bunch of Message-driven Beans (MDBs) per machine – let’s call it minion – that will receive messages with jobs to process. These jobs are sent from another application – let’s call it master – that’s responsible for load-balancing the queueing of the aforementioned messages, maintain a state-machine, etc.
Now, if you’ve worked with JMS before or at least with sockets, connections and sessions, you’ll know for sure that reusing such facilities is mandatory, since provisioning a new physical connection/session or acceptor on every client request will unleash hell on you soon enough as not only is it heavy on resources but worst, it will cripple your application throughput. You won’t be getting too much patting on your back, that’s for sure ;-)
And I faced such issues very recently only because I failed to understand how things work on a container-managed environment. Let me explain..
JMS objects were designed to be re-used, right? Right. Now, what about an application running on an application server? Imagine for instance that you have an EJB acting like a service to send messages.
I thought “hell, yeah! EJBs allow me to do some start-up and tear-down operations (@PostContruct and @PreDestroy) so that’s where I’m going to manage the JMS objects that I wish to reuse”. Did you think this too? Well, you’re wrong in a way. You don’t need this! The container does it for you.. but unfortunately it’s not transparent to a developer at first. Actually, it may bring issues with session transaction management as also. And that’s why I had to look further for an explanation! Here‘s what I got.
Hope this will help others.