Importance of security for website owners

As technology evolves, hackers surf along as well with the technology. More and more we see phishing mails entering our mailboxes in order to get our financial information. In order to organize these kinds of attacks, cyber criminals do use as well Artificial Intelligence, machine learning and big data.

Often cyber criminals do use malware in order to damage as much as possible the computers or websites. We have seen already malware detected in plugin’s that are used in the backend of a website. We see more and more cyberattacks with ransomware where every device in the company is encrypted and a high ransom amount needs to be paid before everything can be decrypted.

Therefore, it is a must for taking several measures as a business owner in order to make it hard for a cyber criminal to reach your site.

At Bestwordpress Design we have extended our website offering with several measures which are explained here below.

Knowledge is control

Firewall as a gatekeeper

Often WordPress websites are hosted at a hosting company. They often play with attractive pricing to get your website, email and domain name hosted on their platform. Somewhere below their website they mention extra security features, but it is your responsibility to take these measures.

As a web design company we always advise to scan your website traffic via an intermediary server that scans the website traffic, before it arrives at the hosting company. This intermediary server has a firewall that reads this traffic and blocks traffic when threats are detected.

This firewall can be configured with several rules, in order to protect your WordPress website. Here you can find some examples of Firewall rules :

  • Geolocation : you can block IP addresses from different countries where it is known cyber criminals are working from. Or, if you are a local business you can only allow traffic from IP addresses in your own country. This heavily reduce the chance of SPAM messages on your website as well.
  • Blocking bad bots : Bots are software programs which scan the internet for new webpages. A cyber criminal can use bots for logging into your wordpress website. With a firewall rule, you could block all bots, except for the Google and Facebook bot. These last two are mainly for your SEO and Social media links.
  • Prevent DDoS attacks : The DDoS attack will attempt to make an online service or website unavailable by flooding it with unwanted traffic from multiple computers. DDoS stands for Distributed Denial of Service.
  • Avoid attackers injects malicious scripts on your web pages, so that they can’t access important data like credit card information of your customers.
  • Avoid data breaches

Other Threats

Man-in-the-middle-attack

Avoid man-in-the middle attack : attackers places themselves between two devices (often a web browser and a web server) and intercept or modify communications between the two. The attackers can then collect information as well as impersonate either of the two agents. In addition to websites, these attacks can target email communications, DNS lookups, and public WiFi networks. Typical targets of man-in-the-middle attacks include SaaS businesses, e-commerce businesses, and users of financial apps.

You can think of a man-in-the-middle attacker like a rogue postal worker who sits in a post office and intercepts letters written between two people. This postal worker can read private messages and even edit the contents of those letters before passing them along to their intended recipients.

In a more modern example, a man-in-the-middle attacker can sit between a user and the website they want to visit, and collect their username and password. This can be done by targeting the HTTP connection between the user and the website; hijacking this connection lets an attacker act as a proxy, collecting and modifying information being sent between the user and the site. Alternately the attacker can steal a user’s cookies (small pieces of data created by a website and stored on a user’s computer for identification and other purposes). These stolen cookies can be used to hijack a user’s session, letting an attacker impersonate that user on the site.

Man-in-the-middle attacks can also target DNS servers. The DNS lookup process is what allows web browsers to find websites by translating domain names into IP addresses. In DNS man-in-the-middle attacks such as DNS spoofing and DNS hijacking, an attacker can compromise the DNS lookup process and send users to the wrong sites, often sites that distribute malware and/or collect sensitive information.

Avoid hackers intercept your connection

Malicious Bot activity

A bot is a software application that is programmed to do certain tasks. Bots are automated, which means they run according to their instructions without a human user needing to start them up. Bots often imitate or replace a human user’s behavior. Typically, they do repetitive tasks, and they can do them much faster than human users could.

Bots usually operate over a network; more than half of Internet traffic is bots scanning content, interacting with web pages, chatting with users, or looking for attack targets. Some bots are useful, such as search engine bots that index content for search or customer service bots that help users. Other bots are “bad” and are programmed to break into user accounts, scan the web for contact information for sending spam, or perform other malicious activities. If it’s connected to the Internet, a bot will have an associated IP address.

 

Bots can be:

  • Chatbots: Bots that simulate human conversation by responding to certain phrases with programmed responses
  • Web crawlers (Google bots): Bots that scan content on web pages all over the Internet
  • Social bots: Bots that operate on social media platforms
  • Malicious bots: Bots that scrape content, spread spam content, or carry out credential stuffing attacks

 

What is malicious bot activity?

 

Any automated actions by a bot that violate a website owner’s intentions, the site’s Terms of Service, or the site’s Robots.txt rules for bot behavior can be considered malicious. Bots that attempt to carry out cybercrime, such as identity theft or account takeover, are also “bad” bots. While some of these activities are illegal, bots do not have to break any laws to be considered malicious.

In addition, excessive bot traffic can overwhelm a web server’s resources, slowing or stopping service for the legitimate human users trying to use a website or an application. Sometimes this is intentional and takes the form of a DoS or DDoS attack.

Malicious bot activity includes:

  • Credential stuffing
  • Web/content scraping
  • DoS or DDoS attacks
  • Brute force password cracking
  • Inventory hoarding
  • Spam content
  • Email address harvesting
  • Click fraud

To carry out these attacks and disguise the source of the attack traffic, bad bots may be distributed in a botnet, meaning copies of the bot are running on multiple devices, often without the knowledge of the device owners. Because each device has its own IP address, botnet traffic comes from tons of different IP addresses, making it more difficult to identify and block the source of the malicious bot traffic.

Manipulated Robot.txt files

 

A robots.txt file is a set of instructions for bots. This file is included in the source files of most websites. Robots.txt files are mostly intended for managing the activities of good bots like web crawlers, since bad bots aren’t likely to follow the instructions.

Think of a robots.txt file as being like a “Code of Conduct” sign posted on the wall at a gym, a bar, or a community center: The sign itself has no power to enforce the listed rules, but “good” patrons will follow the rules, while “bad” ones are likely to break them and get themselves banned.

A bot is an automated computer program that interacts with websites and applications. There are good bots and bad bots, and one type of good bot is called a web crawler bot. These bots “crawl” web pages and index the content so that it can show up in search engine results. A robots.txt file helps manage the activities of these web crawlers so that they don’t overtax the web server hosting the website, or index pages that aren’t meant for public view.

 

How does a robots.txt file work?

 

A robots.txt file is just a text file with no HTML markup code (hence the .txt extension). The robots.txt file is hosted on the web server just like any other file on the website. In fact, the robots.txt file for any given website can typically be viewed by typing the full URL for the homepage and then adding /robots.txt. The file isn’t linked to anywhere else on the site, so users aren’t likely to stumble upon it, but most web crawler bots will look for this file first before crawling the rest of the site.

While a robots.txt file provides instructions for bots, it can’t actually enforce the instructions. A good bot, such as a web crawler or a news feed bot, will attempt to visit the robots.txt file first before viewing any other pages on a domain, and will follow the instructions. A bad bot will either ignore the robots.txt file or will process it in order to find the web pages that are forbidden.

A web crawler bot will follow the most specific set of instructions in the robots.txt file. If there are contradictory commands in the file, the bot will follow the more granular command.

 

What protocols are used in a robots.txt file?

 

In networking, a protocol is a format for providing instructions or commands. Robots.txt files use a couple of different protocols. The main protocol is called the Robots Exclusion Protocol. This is a way to tell bots which web pages and resources to avoid. Instructions formatted for this protocol are included in the robots.txt file.

The other protocol used for robots.txt files is the Sitemaps protocol. This can be considered a robot’s inclusion protocol. Sitemaps show a web crawler which pages they can crawl. This helps ensure that a crawler bot won’t miss any important pages.

What is a user agent? What does ‘User-agent: *’ mean?

 

Any person or program active on the Internet will have a “user agent,” or an assigned name. For human users, this includes information like the browser type and the operating system version but no personal information; it helps websites show content that’s compatible with the user’s system. For bots, the user agent (theoretically) helps website administrators know what kind of bots are crawling the site.

In a robots.txt file, website administrators are able to provide specific instructions for specific bots by writing different instructions for bot user agents. For instance, if an administrator wants a certain page to show up in Google search results but not Bing searches, they could include two sets of commands in the robots.txt file: one set preceded by “User-agent: Bingbot” and one set preceded by “User-agent: Googlebot”.

In the example above, Cloudflare has included “User-agent: *” in the robots.txt file. The asterisk represents a “wild card” user agent, and it means the instructions apply to every bot, not any specific bot.

Common search engine bot user agent names include:

Google:

  • Googlebot
  • Googlebot-Image (for images)
  • Googlebot-News (for news)
  • Googlebot-Video (for video)

Bing

  • Bingbot
  • MSNBot-Media (for images and video)

Baidu

  • Baiduspider

Therefore, it is important to allow certain user agents and other not. When you want to track traffic for SEO purposes you want to the different user agents of Google and Bing and other not.

What are good bots?

There are many kinds of good bots, each designed for different tasks. Here are some examples:

  • Search engine bots: Also known as web crawlers or spiders: These bots “crawl,” or review, content on almost every website on the Internet, and then index that content so that it can show up in search engine results for relevant user searches. They’re operated by search engines like Google, Bing, or Yandex.
  • Copyright bots: Bots that crawl platforms or websites looking for content that may violate copyright law. These bots can be operated by any person or company who owns copyrighted material. Copyright bots can look for duplicated text, music, images, or even videos.
  • Site monitoring bots: These bots monitor website metrics – for example, monitoring for backlinks or system outages – and can alert users of major changes or downtime. For instance, Cloudflare operates a crawler bot called Always Online that tells the Cloudflare network to serve a cached version of a webpage if the origin server is down.
  • Commercial bots: Bots operated by commercial companies that crawl the Internet for information. These bots may be operated by market research companies monitoring news reports or customer reviews, ad networks optimizing the places where they display ads, or SEO agencies that crawl clients’ websites.
  • Feed bots: These bots crawl the Internet looking for newsworthy content to add to a platform’s news feed. Content aggregator sites or social media networks may operate these bots.
  • Chatbots: Chatbots imitate human conversation by answering users with pre-programmed responses. Some chatbots are complex enough to carry on lengthy conversations.
  • Personal assistant bots: like Siri or Alexa: Although these programs are much more advanced than the typical bot, they are bots nonetheless: computer programs that browse the web for data.

Good bots vs. bad bots

Web properties need to make sure they aren’t blocking these kinds of bots as they attempt to filter out malicious bot traffic. It’s especially important that search engine web crawler bots don’t get blocked, because without them a website can’t show up in search results.

Bad bots can steal data, break into user accounts, submit junk data through online forms, and perform other malicious activities. Types of bad bots include credential stuffing bots, content scraping bots, spam bots, and click fraud bots.

Share this article with other people
EN
EN NL FR