Where We are Now

 

There have been increasing reports of connectivity problems on the site including DJs being disconnected, users being kicked out of rooms and chat messages being lagged or dropped. We are very aware of these issues and are doing everything that we can to try to fix them as quickly as possible to get Turntable back to normal.

We initially thought that the problem was caused by either the new server we added in anticipation of the iPhone client launch, or by the change in traffic characteristics that resulted from the iPhone clients. We use software that allows us to serve thousands of clients from one machine, but that sort of software tends to have problems when any of those clients are slow or block. We dug into it more, though, and realized that this wasn’t the problem that we’re experiencing.

We believe that the problems we have been seeing are all manifestations of the same issue but, unfortunately, we have had a harder time trying to track it down than we anticipated. Since we ruled out other suspected causes we’ve been poring through bug reports and server logs, dumping TCP traffic, debugging code on both the client-side and backend and working with authors of third-party libraries. We think that we are narrowing in on the problem, gradually shrinking the size of the haystack.

For the technically-inclined, we are seeing flakiness in our websocket connection from the browser to our servers. Sometimes this results in missing “heartbeat” messages, which makes the server think that you’re disconnected, causing it to remove you from a DJ spot or remove you from the room. Sometimes those messages are chat messages and they are lagged or just never come through, making your chat experience flaky.

In addition to this, we also had a bug in our code that was leading to communication problems between out servers, resulting in problems with things like search and upload. We initially thought that this problem was related with the bigger socket bug, but we recently discovered that it was a separate issue, found the source of the problem and rolled out a code fix that should make those services much more stable now.

We have finally managed to replicate the main problem in our development and testing environments, which is a major step toward our being able to fix it. We have all available engineers working to debug this code, trying every technique that we know of to track down a bug, from capturing and analyzing traffic, to inserting as many debugging statements as possible in the code to track down where the error occurs.

Thank you for staying with us while we bring this problem under control.

atomly, Sr. Software Engineer

27 Notes

  1. vetement-grande-taille reblogged this from turntable-fm
  2. djshay reblogged this from turntable-fm
  3. geesharpminor reblogged this from turntable-fm
  4. jazzaviva reblogged this from turntable-fm
  5. dampfeathers reblogged this from turntable-fm
  6. chilloutmixer reblogged this from turntable-fm
  7. nite4awk reblogged this from turntable-fm
  8. itchieland reblogged this from turntable-fm
  9. haseman said: I know how hard those things can be to find. Good luck and happy hacking!
  10. gatosmith reblogged this from turntable-fm
  11. deathnotekeeper reblogged this from turntable-fm
  12. pseudofailure reblogged this from turntable-fm and added:
    Flash player? It seems like Flash will stop running if...web browser is not
  13. turntable-fm posted this