net-at-hand™
all about what we are doing to make Net-at-hand a great web publishing system.
Post-mortem post
All of the Net-at-hand websites were down for about two hours today (from 10:20 am CDT until around 12:25 CDT). I started receiving notification of connection errors when the Net-at-hand application was trying to connect to the database. After restarting the server and verifying the problem with the database, I decided to stop the system temporarily and rebuild the database.
In the three years that Net-at-hand has been running on this server I have not rebuilt the database at all, and it makes sense that some of the data files might have been getting corrupted. So I made a fresh backup of the database and rebuilt the data tables from the backup. I also took the opportunity to get rid of some old data that was in the database to help keep things slim. No data was lost during this outage.
As I was working on this, I fired up the new database server that Net-at-hand is going to be migrating to. This is a server dedicated to the database only and will greatly help the performance of the system as Net-at-hand continues to grow. I had the new server at the ready in case rebuilding the database did not fix the connection problems that were happening, but it did so that has been put back on the normal schedule.
The timetable for migrating the Net-at-hand system to the new infrastructure is progessing on schedule and I plan to have it all in place within the next couple of months.
I am sorry for the downtime (especially in the middle of the work day). Please accept my sincere apology.
—Mark (Net-at-hand founder)
One good thing that did come out of this is that I was able to test the process I’ve been planning for migrating Net-at-hand to the new system (and under extreme pressure). With what I learned from this situation, the downtime should be just a few minutes (a couple of application restarts) instead of the couple hours that I was planning.