Archive for the ‘Useful Stuff’ Category

What Does Robots.txt Do?

Wednesday, December 28th, 2011

Robots.txtThe robots.txt file simply contains instructions for search engine robots on what to do with a particular website. While the search engine robots follow the instructions from that file, spam bots simply ignore it in most cases.

A web robot is a program that checks the content of a web page. If a robot is about to crawl a website, it will first check the robots.txt file for instructions. A command “Disallow”, for example, tells the robot not to visit a given set of pages on this site. Web administrators use this file to restrict the bots to index the content of a particular website for different reasons – they do not want the content to be accessible by other users; the website is under construction, or a certain part of the content must be hidden from the public.

While search engines such as Google use the robots to index web content and can be easily restricted and instructed by the robots.txt, spammers use spambots to reach e-mail addresses, for example, and do not follow the instructions from the robots.txt file. They look for and follow keywords that might be related to an e-mail address such as “post”, “message”, “journal” and so on. What is specific for a spambot is that it comes from many IP addresses and acts as different agents, and thus it can hardly be blocked. Some spambots even use search engines such as Google to look for particular information on a web page.

Fortunately, there are still things that can be done to prevent spambots of scanning your web site and stealing information. Neil Gunton came up with a Spambot Trap which blocks spambots and allows the good search engine spiders to visit your website.

Still, if you would like to leave instructions for the regular search engine bots which pages are to be indexed and which – not, you might want to be careful not to block the search engine completely. If you put in the wrong commands, your website will have no chance of showing up anywhere in search results. If you don’t have a robots.txt file at all, then the web robot will index every single thing that is on your website.

Here’s a list with some ready-to-use basic commands for the robots.txt file:

  • Exclude a file from a certain search engine:
User-Agent: Googlebot
Disallow: /private/privatefile.htm
  • Exclude a section/page from your site from all web robots
User-Agent: *
Disallow: /newsection/
  • Disallow any bot to index any part of your website
User-agent: *
Disallow: /

If you wish to add more complicated instructions, you can follow Thomas Brunt’s instructions.

If you go through your server logs and see a suspicious host, you can run it by our Blacklist Checker. It will tell you if the domain or IP has been blacklisted. If this is the case, then you can simply prevent this host from entering again.

The Importance of the WordPress Expires Header

Thursday, December 15th, 2011

The importance of “expires header” is growing along with the web page designs which are becoming richer in scripts, images, Flash, etc.

As a result of the growing complexity of web designs, a web page takes longer to load, which is why the site needs an expires header. It simply makes all components such as stylesheets, images and others cacheable or, in other words, prevents unnecessary HTTP requests after the first page view and hence load time is reduced.

The “expires header” needs to have a date set up and it’s important that this date is a future one. The far future Expires header tells the browser how long to have a web page component cached. If a past date is set up then caching would simply not occur. Note that “expire headers” do not affect the load time of the website the first time the user opens it.

Here’s how to add a far future expires header in WordPress:

If the server is Apache, you can use “ExpiresDefault” directive. For example, [ExpiresDefault “access plus 2 months”] means that the expiry date of the file is two months from now.  The time period could be from seconds to years.

In order to add the header, however, you need to add the following code to the .htaccess file:

#Expire Header
<FilesMatch "\.(ico|jpg|jpeg|png|gif|js|css|swf)$">
ExpiresDefault "access plus 2 hours"
</FilesMatch>

or

# Expire images header
ExpiresActive On
ExpiresDefault A0
ExpiresByType image/gif A2592000
ExpiresByType image/png A2592000
ExpiresByType image/jpg A2592000
ExpiresByType image/jpeg A2592000
ExpiresByType image/ico A2592000
ExpiresByType text/css A2592000
ExpiresByType text/javascript A2592000

It’s important to remember that with the “expires header” the files are “saved” in the browser until the expiration date. Thus, you need to use the header on images, Flash and others that will not be changing until the expiry date. If you are, for instance, changing the pictures on the home page on a regular basis, it will not be a good idea to set up an expire header on them. The header will cache them for the period you have selected, and it’ll not be of any use to cache something that is going to change in a shorter period of time.

Here’s an expample:

In the results above we can see that the date is set in the past, which means that the search engines, proxy servers and browsers will always consider the page out of date and try to fetch a fresh copy. This can lead to unnecessary server load. To avoid this problem simply stick to the rule mentioned above and always set a future date.

To check the expiration date of any web page, you can use our free HTTP headers test. It will return the HTTP header (the initial response of a web page, invisible to the end user) where you can find the expiration date.

Forgot to Pay for Your Domain?

Tuesday, November 29th, 2011

Many people who run a business website overlook the importance of the domain name and its expiration date, and they only hear about it when the website is already down. Well, it is crucial to avoid domain name expiration because it could harm both your business and reputation significantly.

If you forget to renew your domain, you will stop receiving e-mails through your website; the website will simply be not there and replaced with ads. The best case scenario is to lose the site, but still be able to register it.

The most unpleasant part, however, is that someone else can take your domain and register it as their own. Imagine how many regular visitors you will lose if you have to register under a different name because someone else took it. If your website administrator doesn’t know the expiration date, he will have to run a lot of checks in order to identify the problem. At first it is not always obvious what happened to the site. Thus, it’s going to take a while until the problem is identified and a lot more time to fix.

So, again, the domain name is probably the most important part of your online presence. It is part of your brand. It is one of the easiest things to be good at online – protecting your domain name.

Here’s a short checklist of actions you can take to protect your online presence:

  • Set up a domain name auto renewal in your billing system on the expiration day. GoDaddy and a lot of other domain registrars offer this service.
  • Register the domain name for years ahead. Dot Com domains can be registered for up to 10 years! This way you will not be worried about having insufficient funds on the day of your auto-renewal because if the payment doesn’t go through, your domain will expire. The risk here is to remember for how long you have prepaid the domain for and when to renew it again.
  • Always update your payment options in your account. If your card expires, your domain won’t be able to auto renew.
  • Ensure that you have an up-to-date e-mail address set up for your domain renewal because you usually receive notifications when the domain is close to expiration. Make sure you have access to it.
  • If you hire a 3rd party to develop your site, make sure they purchase the domain name to your company and have your contact details in the WHOIS records.

If you forget to pay for your domain and your website goes down, that does not ultimately mean that your business is gone. There is a grace period for each domain. You can still get it back if you are quick. One way to know if your domain is down for good is to employ our most basic $5.00/month ping service. If your domain goes down, you will know it within a few seconds.

And last, save your online business by simply watching out for your domain name because neither your services and products or your web design would matter if you site is just not there.

Protect Your Online Presence

Friday, September 30th, 2011

Google have recently come up with a new feature called “Authorship markup” which, they say, will connect the author to the particular content in order to give it more credibility.

The Authorship markup encourages quality content by helping out its authors to rank better in the search results, according to Sagar Kamdar, Google Product Manager. For this purpose, the markup connects the web content to a Google Profile of its author and then – back to the particular web page. This way the content shows up in the search results, the author is identified, and the reader even sees a photo of the author displayed alongside, when an image is available. Content then looks more trustworthy and credible, and the website content is more protected.

Google say Authorship markup is quite a new project, and is yet to be improved and simplified. Still, they claim to have made this feature “as easy to implement as possible”. Their first users of this markup have been The New York Times, The Washington Post, CNET and more. Google also claim to have gone even a step further by adding this Authorship markup to everything hosted by YouTube and Blogger. In the future, however, these two platforms will include this feature automatically.

While Google created a feature to protect your website content, WebSitePulse perfected its monitoring service to help you keep an eye on any type of server and network device connected to the Internet, and measure the performance and availability of your websites and applications. Give it a try!

SSD vs. HDD for Business

Wednesday, July 6th, 2011

Is SSD the solution for the ever widening gap between current hard drive technology and CPU advancements? Is this the next step in data storage and are SSD drives here to stay? How reliable are they? Are they actually worth it?

This is just a handful of questions from a huge, huge batch. SSD are still pretty expensive for everyday use. They are still the domain of computer enthusiasts and early adopters. Leaving the money question aside, let’s check whether they are a good solution for business workstations and high-end server hardware.

While SSD drives might not be the best choice for personal computing, they might be great for server hardware. In an average laptop, you might be better off with a 5400rpm drive. Most users don’t see any battery life improvement. In fact, the 5400rpm drives can drain the battery even less than a SSD drive. Unlike the SSD drive, HDD can actually spin down and actually reduce the battery drain to a minimum. SSD drives might use more power when running idle, but that is only an issue for personal computers and laptops. Server hard drives rarely stay idle.

SSDs are in fact great for database servers. Most requests are extremely small in size and are often random in nature. With SSD drives there is no mechanical latency to limit the performance. They offer great speed improvements even over 15000rpm drives. The lack of moving parts reduces the heat coming from the drive, thus requiring less power to cool down a server rack.

Many data centers cut costs from cooling. This way they stress their hardware more and need to replace it more often, but it pays off when you look at the power bill. You can have the best of both worlds with SSD. They generate almost no heat at all. Claims of 50% lower electricity bill might not be too farfetched, when you consider the less power required to cool down the server racks.

The life expectancy of a SSD drive is said to extend to 50 years, which is pretty hard to believe and most likely not applicable to servers. It must be somewhat close. Unfortunately, SSDs as we know it have been around only for a couple of years, so no one really knows. The life cycle is limited by the number of write cycles. This is why there are a lot of server applications for SSDs where information is only to be read from them.

Let’s not forget they do the job faster. This means that they complete tasks faster than traditional storage devices. Ideally, this could reduce the amount of disks required in an installation. This is highly unlikely before larger SSDs become available, but it is one of those features that will make a difference once the technology improves.

You get higher performance, high reliability, power savings, more than a reasonable lifespan, and a hefty price tag. Depending on the scale of implementation, the last one might not be true too, considering the lower power bill.

If you plan to upgrade your installation it might be wise to wait for a while. Prices are said to go down by 50% by the end of the year. Early adopters, who have chosen to use SSDs in their web and database servers rarely complain and speed is never the issue.

Raspberry Pi

Thursday, June 30th, 2011

I thought that after the SheevaPlug some time will pass before we see even smaller computers intended for light IT tasks. This time the project is even smaller. It is called the Raspberry Pi. At its current state it looks like this. You might think the product has a long way to go before it hits the market, but you are wrong. One of the ideas of the creator is to bring computing closer to students and inspire them to learn more about the hardware and how it works. That is why it will most likely look like this, when the first units head out to public schools.

At the moment, it is marketed as the 25 USD PC. It has a 700Mhz ARM CPU, 128MB of RAM, HDMI slot and a USB port. You can attach a keyboard, mouse and any other USB device. You can connect a hub. You can see that there is also an image sensor in the middle. It is a 12mp CMOS sensor. You can also insert a memory card in order to have some place to install the OS.

Despite its size, this little gizmo is fully capable to play HD video. For this feature I love and hate the device at the same time. I love the fact that such a small contraption can become part of my home cinema installation, and that is exactly the reason for me to hate it.

This ultra small PC is intended for educational purpose. It is for hobbyists, tinkering with it to power a weekend project, to run some kind of server. It is mostly for people willing to build up on top of it. I will hate seeing this be treated as a cheap HD player.

I can’t wait to get my hands on it. It is cheap enough for anyone to go ahead and play around with, without thinking too much about not being too great with the soldering iron. The CMOS sensor cam can probably have a wide variety of implementations – light detector, face recognition, webcam. By being that small, the Raspberry Pi enables enthusiasts to try out many new projects.

I will definitely try to run something light, such as a telnet server, or a webcam, with the cam being the server. The latter’s retail price is at about $100+. Here is a thought – if the device keeps the promised price, it will cost less than my personal hosting plan. I can move everything there and just pay for the IP address I’m already using.

How would you use or mod this device?

HTTP Archive

Friday, June 10th, 2011

I stumbled upon httparchive.org last month and I think anyone concerned with the performance of his website should take a look at it. It will give you a good idea of the current status of the web. There are some pretty interesting figures on it. The best way to describe the site is by quoting the title on their homepage “TheHTTP Archive tracks how the Web is built.” As simple as that.

The averages on that site are calculated by using raw data from all the sites listed in Alexa 500, Alexa US 500, Alexa 10,000, Fortune 500, Global 500 and Quantcast10K. To get accurate data each site is loaded 9 times. Then the data is fed and parsed to the database. What you get are some pretty interesting figures.

Httparchive.org provides intimate data for the most popular websites on the web. Not that the data is not available for anyone with the right toolset. It is just brought up pretty well. I personally enjoy the filmstrip tool, showing you how a site loads and what is visible through the different stages.

I am not too surprised to see that Steve Sounders is the person behind the website. For the few of you who haven’t heard of him, he is the guy who came up with YSlow and, yes, he works at Google.

Just to get a taste of the information on the site, take a look at the interesting stats. If you think you are using too much CSS or JS, look again. You might be surprised. The fact 56% of these sites doesn’t have cache control still keeps me awake at night.

There are usually two ways to learn good lesson:

Through your own mistakes
Through other peoples’ mistakes

Needless to say, the latter sounds better. On httparchive.org you can see what some of the top players did right, and where they went wrong. It is a great point of reference for web designers, web developers, system administrators and even business owners.

Browser emulation and a good part of the information there are sourced from webpagetest.org . You should definitely go on and check that site too. You can see how your website looks in different browsers and under different circumstances.

The Plugins That Work – Dropbox CDN and wp Time Machine

Friday, May 20th, 2011

There are two great WordPress plugins that everyone should know about. The first one is called Dropbox CDN. “Dropbox” as in “Dropbox – the personal file backup service, helping me work on my files from my PC and Mobile Phone”.

Dropbox CDN (Content Delivery Network) enables you to host your WordPress theme CSS, JS and images on Dropbox. This is huge! Setting up additional hosting space involves extra costs, time to manage, backups. A lot of hassle! With this plug in you enjoy all this for free. The reason why this is so great is that common bloggers can now have their blog load faster and be viewed by more people at once. Using CDN is listed as a great practice by Yslow, no matter the scale of you online presence. It is worth it to get a Dropbox account only for the sake of using this plugin. If you are not too happy with your hosting provider, or you are using a free service, this is a good moment to try out the plugin to increase your website’s performance.

Personal blogs rarely exceed 10GB of traffic per day, which is the limitation you get with Dropbox. I would say this is more than enough for any blog.

One thing still bugs me. What happened to Box.net? They had a great start, but came a few years before the mobile phone app hype. Maybe it is that, maybe it’s not. The fact is people out there are constantly coming up with new stuff and this little plugin really made my day.

The second plugin I would recommend is WP Time Machine. This one I heard about from a friend of mine, currently writing reconsideration letters to Google, after his WordPress got hacked. It took him 3 hours to recover. That is what happens when you don’t update and monitor your website. Guess who signed to the Free Trial earlier today :) . Back to the plugin. The reason I mention it here is because it is another plugin employing the services of Dropbox. This one lets you choose where to backup. You can use Dropbox, Amazon’s AWS S3 or a remote FTP of your liking.

Both plugins are compatible with the latest version of WordPress. If you like to be extra cautious with your site and not pay a single cent, try out our Free Service for Life and tell us what you think about it.

HTTP Fingerprinting with HttPrint

Wednesday, March 30th, 2011

HttPrint is a web server fingerprinting tool by Net Square. It reveals all the details about a web server and it makes a pretty decent conclusion what the web server used. Identification is based on the implementation differences in the HTTP protocol.

In a previous post I have discussed server masking as a way to protect you against crackers and potential threads. This tool kind of goes the other way around. There are two ways to look at it: one – as a weapon for crackers; two – as a way for you to make sure that you masked your server properly and there are no giveaways to your doings.

HttPrint goes beyond the banner string of a web server and looks at other characteristics before it jumps into conclusions. The tool looks at the HTTP protocol’s behavior and the way it is implemented by the server. This includes HTTP header field ordering, forbidden operation response, improper HTTP version response and improper protocol response.

Another giveaway it notices is the default usage of eTags on some servers. I don’t think they are actually using this as a signal at the moment, but it is a good giveaway. Lighttpd usually has this enabled by default, as for Apache, you need to turn it on manually.

One of the other great features of the tool is the html reports. When will that come useful? Well, when you are running the console version (probably automated) and you would like to see eye-friendly reports in your browser. You will also get the percentage of the results’ accuracy. They use multiple signals, so results might or might not be true, but they will surely be enough to be at least an educated guess.

According to the developers of mod_security, they can successfully fool HttPrint, which to me sounds like a relief. After all, this is mostly a defensive security measure.

If you need your server protected and accurately monitored for any suspicious behavior, drop us a line and we will come up with a great plan, depending on your server monitoring needs.

Custom Error Pages

Tuesday, March 22nd, 2011

As you most probably know, each year we have several seasonal report periods during which we monitor the leaders in the retailing industry to see if their online performance matches their reputation. This year is no different and last month we published the results for this Valentine’s day online retailer monitoring.

One of the interesting cases that caught our attention is the site of Victoria’s Secret. While at first glance their uptime does not strike the user with a 100% uptime percentage, the downtime recorded for their transaction happened regularly (every 2 days), always around 5 am and lasted usually for about 15 minutes. Since 5 a.m. is clearly not the busiest shopping time of the day, it was most likely a regular, scheduled site maintenance. However, this was not indicated either in their error message or elsewhere on their site. Instead, next to the picture of the stunningly beautiful Alessandra Ambrosio stood the awkward downtime excuse “We’re sorry, our site is temporarily unavailable.” .

This instance got me thinking about the importance of customizing the error pages which your customers will inevitably run into sooner or later, be it due to site maintenance or a navigation error. In the example above, if the company had simply changed the message and informed their customers that this was a planned maintenance, their uptime percentage will have reached the 100% limit.

Customized error pages help you retain your visitors and even help you attract new ones which have landed on your page by sheer force of chance and typing mistakes. Most visitors leave the site when they get to an error page, and only a handful will try a different URL. That is why the custom page should provide the clients with:
  • the correct page that they might be looking for,
  • a search engine that will help them find their required page
  • a sincere and/or fun explanation or image.

The customized error page is a great way to reassure your visitors that they have come to the right place. Furthermore, it gives your future prospects one more reason to remember your website and return to it and even recommend it later.

Aside from helping you save face, a customized error page can help you monitor your visitors behavior and see what kind of information the people are looking for on your website. All that you need to do is to set the error page to submit a broken link to the webmaster. In this way you will be able to fix and upgrade your website accordingly and in a timely manner. Another plus side is that custom 404 pages help the search engines consider these pages not as error pages when they are set to return a 200 OK response in the header. As a result they get indexed and appear in the SERPs. Keep in mind that this is not a very good practice as the 404 status should be indicated in the header in such cases.

Below you can see 15 original and fun examples of custom error pages. If you would like to see even more, we recommend clicking here and here.

 

clomid, synthroid, zithromax, accutane, celebrex