Service Level Agreement

1.    Definitions

Word

Definition

First Reply Time

How soon a Haylix Engineer acknowledges an alert and attempts to either,

alert the customers escalation contacts by email or phone call of a severity level or

respond to a customer’s ticket request by email.

Update Interval

How often a Haylix engineer will update the customers escalation points to the current progress by email. This may include messages stating, “No update.” Meaning nothing has changed since the previous update. A ticket in pending or on hold will not have a committed update interval time.

Analyse

How soon a Haylix engineer will send out a report regarding the issue. A ticket in pending will pause the committed analyse time.

Resolution Time

Resolution Time  is considered achieved if the incident is repaired/recovered, Haylix escalates the issue to the correct vendor, a solution to fix the incident is proposed or a work-around is implemented that reduces the urgency or eliminates the incident. Resolution Time  time is only achievable if a solution is deemed within Haylix’s control, does not include any uncontrollable 3rd party SLA, does not include any infrastructure rebuilds or vendor outages. A ticket in pending or on hold will not count towards the time to resolution.

Severity 1

The entire Service is down and is inaccessible for external/public access.

Severity 2

The Service is experiencing considerably degraded performance and functionality is severely impacted. The business and customers are considerably impacted. Failure of redundant infrastructure (i.e. load balanced instances) is a Severity 2 issue only where it impacts negatively on the Service.

Severity 3

The Service has slightly degraded performance, i.e. the Service is functioning at a speed that is considered less than optimal. Errors are present that do not significantly impact on the functionality of the Service or business processes. Non-critical Service components may be failing however a majority proportion of the site is functional. The business is unaffected. Failure of redundant instances that do not cause Service failure are considered a Severity 3 

Change Request

If a ticket is logged for a service that requires  a change to the environment or requires a significant amount of work, then a change request will need to be filled out and authorisation from the customer will need to be confirmed before the work is to be carried out. Jira Related Ticket


2.    Priority Matrix & SLAs

The priority matrix below outlines what SLAs are applicable for the service.

Impact

Severity 1

Severity 2 

Severity 3 

Urgency

High

The service needs to be up and running ASAP.  The whole business is affected.

Urgent

High

Medium

Medium

The service is critical but is only affecting some users.

High

Medium

Low

Low

Impacted services are non-critical.  This is only affecting a small number of people or is in the UAT\Staging\Dev environments

Medium

Low

Low

Only Urgent tickets have a 24/7 SLA, the SLA for all other tickets types is business hours only, business hours exclude Public Holidays unless otherwise previously agreed between Haylix and the customer.

Urgency 

SLA

Urgent (24/7)

First Reply Time: 1 Hour 
Time to Identify Resolution: 8 Hours 
Update Interval: 2 Hours
Analyse: 1 calendar Week

High

First Reply Time: 2 Hours 
Time to Identify Resolution: 24 Hours 
Update Interval: 24 Hours 

Normal

First Reply Time: 24 Hours 
Time to Identify Resolution: 48 Hours 
Update Interval: 48 Hours 

Low

First Reply Time: 48 Hours 
Time to Identify Resolution: 1 week 
Update Interval: 1 Calendar Week 


3.    Major Security Incident Response Process

3.1      Overview

This below describes the processes that should be followed to notify the client and determine the resolution steps of a security incident. The key objective is to prevent a serious loss of profits, public confidence or information assets by providing an immediate, effective and skilful response to any unexpected event involving Haylix provided or managed computer information systems, networks or databases. 

A security incident is a violation or imminent threat of violation of computer security policies, a malicious attack or violation of standard security practices

This plan outlines the steps to follow in the event of a security incident relating to the infrastructure provided by Haylix and includes, but is not limited to:

  • Secure data is compromised
  • Unauthorised access is gained to the hosted environment
  • Attempted or actual attacks are detected
  • Suspicious behaviour is detected

Some examples include:

  • Breach of Personal Information
  • Denial of Service / Distributed Denial of Service
  • Excessive Port Scans
  • Firewall Breach
  • Virus Outbreak

Importantly this document does not include a security incident that occurs client side on infrastructure or computers that are not managed by Haylix unless that incident has the ability to cause an incident within the Haylix managed infrastructure. An example of an incident that would be out of scope includes:

Users are tricked into opening a “report” sent via email that is actually malware; running the tool has infected their computers and established connections with an external host.

3.2      Response Process

The Haylix team is authorised to take appropriate steps deemed necessary to contain, mitigate or resolve a security incident.

The Haylix team will be the central point of contact for reporting computer incidents or intrusions and all communications to the client will be via a Haylix Incident Lead being either the Haylix Managing Director or Operations Manager. A preliminary analysis of the incident will take place by the lead on-call engineer, with the support of the Haylix team, and that will determine initial information and being planning the appropriate response. 

3.3      Notification Process

The initial notification once the incident is discovered by the Haylix team will be to the Haylix Incident Lead.  Contact with the client will be via the documented process within the clients' incident escalation document for severity 1 incidents.

3.4      Information Sharing

Haylix will not disclose any information to the media or similar channels including digital channels. It is the clients' responsibility to disclose any information outside its organisation.

3.5      Law Enforcement

Where laws have been broken in accessing the environment during the security incident Haylix should notify the necessary Law environment bodies once this has been discussed with the client. 

3.6      Notification Requirements

When notifying the client, the following information must to be provided to inform the client. Where urgency is paramount the client must be notified of an incident then a follow communication should be provided to the Haylix Incident Lead who in-turn will manage communications with the client. The initial summary communications do not need to be an extensively deep report but must at least provide the client a summary of the event. If a client decision is required, information provided must then be of sufficient detail to enable such a decision to be made. The initial summary should address the following:

The current status of the incident (new, in progress, forwarded for investigation, resolved, etc.)

A summary of the incident

Indicators related to the incident

Other incidents related to this incident

  • Actions taken by all incident handlers on this incident
  • Chain of custody, if applicable
  • Impact assessments related to the incident
  • Contact information for other involved parties (e.g., If a partner application (WCMS) has been breached)
  • A list of evidence gathered during the incident investigation
  • Comments from incident handlers
  • Proposed next steps to be taken (e.g., rebuild the host, upgrade an application).

Haylix must notify the client immediately that a suspect incident has occurred and then follow up communications once it has this information is at hand. Gathering this information is the upmost priority of the team lead to ensure adequate communication is provided to the clients

3.7      Tracking the Incident

Communications initially should be via phone contact with the client as per the incident escalation diagram. Where the client is not contactable the Haylix Incident Lead will confirm resolution plans. All communications must also be added to a support ticket to track the process of the incident. Where communications continue with the client via email these must also by via the Haylix ticketing system.

3.8      Functional Impact of the Incident

Incidents targeting IT systems typically impact the business functionality that those systems provide, resulting in some type of negative impact to the users of those systems. Incident handlers should consider how the incident will impact the existing functionality of the affected systems. Incident handlers should consider not only the current functional impact of the incident, but also the likely future functional impact of the incident if it is not immediately contained.

 

3.9      Information Impact of the Incident

Incidents may affect the confidentiality, integrity, and availability of the organisation’s information. For example, a malicious agent may infiltrate sensitive information. Incident handlers should consider how this information exfiltration will impact the organisation’s overall mission. An incident that results in the exfiltration of sensitive information may also affect other organisations if any of the data pertained to a partner organisation.

3.10  Recoverability from the Incident

The size of the incident and the type of resources it affects will determine the amount of time and resources that must be spent on recovering from that incident. In some instances, it is not possible to recover from an incident (e.g., if the confidentiality of sensitive information has been compromised) and it would not make sense to spend limited resources on an elongated incident handling cycle, unless that effort was directed at ensuring that a similar incident did not occur in the future. In other cases, an incident may require far more resources to handle than what an organisation has available.

3.11  Information Impact Categories

Category

Definition

None

No information was exfiltrated, changed, deleted, or otherwise compromised

Privacy Breach

Sensitive personally identifiable information (PII) of taxpayers, employees, beneficiaries, etc. was accessed or exfiltrated

Proprietary Breach

Unclassified proprietary information, such as protected critical infrastructure information (PCII), was accessed or exfiltrated

Integrity Loss

Sensitive or proprietary information was changed or deleted

3.12  Recoverability Effort Categories

Category

Definition

Regular

Time to recovery is predictable with existing resources

Supplemented

Time to recovery is predictable with additional resources

Extended

Time to recovery is unpredictable; additional resources and outside help are needed

Not Recoverable

Recovery from the incident is not possible (e.g., sensitive data exfiltrated and posted publicly); launch investigation

The recoverability from the incident determines the possible responses that the team may take when handling the incident. An incident with a high functional impact and low effort to recover from is an ideal candidate for immediate action from the team.

Effective Date: July 2020

Let's Get Talking

If you would like to learn more about Haylix, talk to us about a potential project, or simply discuss some DevOps ideas, call us on 1300 362 671 or send a note by completing the forms below. We’ll be in touch promptly.

Thankyou

Your request was created successfully.

Sending...

Please wait while we submit your inquiry.

Something went wrong

We were unable to send your inquiry. Please try again later or contact us if the problem persists.