Haylix – Cloud. Automated. Always.

SERVICE LEVEL AGREEMENT

1. Definitions

First Reply Time How soon a Haylix Engineer acknowledges an alert and attempts to either, alert the customers escalation contacts by email or phone call of a severity level or respond to a customer’s ticket request by email.
Update Interval How often a Haylix engineer will update the customers escalation points to the current progress by email. This may include messages stating, “No update.” Meaning nothing has changed since the previous update. A ticket in pending or on hold will not have a committed update interval time.
Analyse How soon a Haylix engineer will send out a report regarding the issue. A ticket in pending will pause the committed analyse time.
Resolution Time Resolution Time is considered achieved if the incident is repaired/recovered, Haylix escalates the issue to the correct vendor, a solution to fix the incident is proposed or a work-around is implemented that reduces the urgency or eliminates the incident. Resolution Time   is only achievable if a solution is deemed within Haylix’s control, does not include any uncontrollable 3rd party SLA, does not include any infrastructure rebuilds or vendor outages. A ticket in pending or on hold will not count towards the time to resolution. *define service as being Haylix CORE (link to Haylix CORE) *copy business hours definition from the SOW * define tickets and alerts
Severity 1 The Service is unavailable and inaccesible as intended.
Severity 2 The Service is experiencing considerably degraded performance and functionality is severely impacted. The business and customers are considerably impacted. Failure of redundant infrastructure (i.e. load balanced instances) is a Severity 2 issue only where it impacts negatively on the Service.
Severity 3 The Service has slightly degraded performance, i.e. the Service is functioning at a speed that is considered less than optimal. Errors are present that do not significantly impact on the functionality of the Service or business processes. Non-critical Service components may be failing however a majority proportion of the site is functional. The business is unaffected. Failure of redundant instances that do not cause Service failure are considered a Severity 3
Production Outage Production environment is unresponsive from two (2) of our three (3) Australian and New Zealand monitoring locations for five (5) consecutive minutes.
Initial acknowledgement Initial acknowledgement of a ticket or outage by Haylix.
Production Outage Production environment is unresponsive from two (2) of our three (3) Australian and New Zealand monitoring locations for five (5) consecutive minutes.
Change Request If a ticket is logged for a service that requires a change to the environment or requires a significant amount of work, then a change request will need to be filled out and authorisation from the customer will need to be confirmed before the work is to be carried out. Jira Related Ticket

2. Priority Matrix & SLAs

3. Major Security Incident Response Process

3.1      Overview

This below describes the processes that should be followed to notify the client and determine the resolution steps of a security incident. The key objective is to prevent a serious loss of profits, public confidence or information assets by providing an immediate, effective and skilful response to any unexpected event involving Haylix provided or managed computer information systems, networks or databases. A security incident is a violation or imminent threat of violation of computer security policies, a malicious attack or violation of standard security practices This plan outlines the steps to follow in the event of a security incident relating to the infrastructure provided by Haylix and includes, but is not limited to:
  • Secure data is compromised
  • Unauthorised access is gained to the hosted environment
  • Attempted or actual attacks are detected
  • Suspicious behaviour is detected
Some examples include:
  • Breach of Personal Information
  • Denial of Service / Distributed Denial of Service
  • Excessive Port Scans
  • Firewall Breach
  • Virus Outbreak
Importantly this document does not include a security incident that occurs client side on infrastructure or computers that are not managed by Haylix unless that incident has the ability to cause an incident within the Haylix managed infrastructure. An example of an incident that would be out of scope includes: Users are tricked into opening a “report” sent via email that is actually malware; running the tool has infected their computers and established connections with an external host.

3.2      Response Process

The Haylix team is authorised to take appropriate steps deemed necessary to contain, mitigate or resolve a security incident. The Haylix team will be the central point of contact for reporting computer incidents or intrusions and all communications to the client will be via a Haylix Incident Lead being either the Haylix Managing Director or Operations Manager. A preliminary analysis of the incident will take place by the lead on-call engineer, with the support of the Haylix team, and that will determine initial information and being planning the appropriate response.

3.3      Notification Process

The initial notification once the incident is discovered by the Haylix team will be to the Haylix Incident Lead.  Contact with the client will be via the documented process within the clients’ incident escalation document for severity 1 incidents.

3.4      Information Sharing

Haylix will not disclose any information to the media or similar channels including digital channels. It is the clients’ responsibility to disclose any information outside its organisation.

3.5      Law Enforcement

Where laws have been broken in accessing the environment during the security incident Haylix should notify the necessary Law environment bodies once this has been discussed with the client.

3.6      Notification Requirements

When notifying the client, the following information must to be provided to inform the client. Where urgency is paramount the client must be notified of an incident then a follow communication should be provided to the Haylix Incident Lead who in-turn will manage communications with the client. The initial summary communications do not need to be an extensively deep report but must at least provide the client a summary of the event. If a client decision is required, information provided must then be of sufficient detail to enable such a decision to be made. The initial summary should address the following: The current status of the incident (new, in progress, forwarded for investigation, resolved, etc.) A summary of the incident Indicators related to the incident Other incidents related to this incident
  • Actions taken by all incident handlers on this incident
  • Chain of custody, if applicable
  • Impact assessments related to the incident
  • Contact information for other involved parties (e.g., If a partner application (WCMS) has been breached)
  • A list of evidence gathered during the incident investigation
  • Comments from incident handlers
  • Proposed next steps to be taken (e.g., rebuild the host, upgrade an application).
Haylix must notify the client immediately that a suspect incident has occurred and then follow up communications once it has this information is at hand. Gathering this information is the upmost priority of the team lead to ensure adequate communication is provided to the clients

3.7      Tracking the Incident

Communications initially should be via phone contact with the client as per the incident escalation diagram. Where the client is not contactable the Haylix Incident Lead will confirm resolution plans. All communications must also be added to a support ticket to track the process of the incident. Where communications continue with the client via email these must also by via the Haylix ticketing system.

3.8      Functional Impact of the Incident

Incidents targeting IT systems typically impact the business functionality that those systems provide, resulting in some type of negative impact to the users of those systems. Incident handlers should consider how the incident will impact the existing functionality of the affected systems. Incident handlers should consider not only the current functional impact of the incident, but also the likely future functional impact of the incident if it is not immediately contained.

 3.9      DR Failover (where applicable)

The failover process shall commence at the customer’s declaration of a disaster event. The failover process shall be executed in accordance with the agreed DR plan, which defines the sequence of steps, roles, and responsibilities for the failover operation. Both parties shall maintain an up-to-date and mutually approved DR plan, which includes clear instructions for initiating, executing, and validating the failover process. Failback process shall commence at the customer’s discretion at no less than 48 hours following the initial failover process.

 3.10     Uptime

Both Client and Haylix acknowledge that the uptime information, SLA’s published are ultimately controlled by Cloud Provider or other third-party.  Haylix can work with Client to provide recommendations and implement architectural & service adjustments within the Cloud Provider environment to provide the required alignment of the service uptime with commercial and business considerations. A 99.95% uptime target is provided for Production environments that are architected by Haylix with appropriate High Availability (HA), and protected by Haylix PROTECT Safe Place. A 99.95% uptime target is provided for Production environments that are architected by Haylix, and protected by Haylix PROTECT Safe Place . Uptime is measured monthly excluding planned maintenance, application-level issues, and any issues that are not within the scope of Haylix’s control.

3.11     Third-Party disclaimer

The site is managed under the standard Managed Services Agreement and is further supported by the SLA’s provided within this document and by Cloud Provider or third-party vendor’s SLA. For further information; Microsoft AZURE – Amazon Web Services – AWS

3.12     Information Impact of the Incident

Incidents may affect the confidentiality, integrity, and availability of the organisation’s information. For example, a malicious agent may infiltrate sensitive information. Incident handlers should consider how this information exfiltration will impact the organisation’s overall mission. An incident that results in the exfiltration of sensitive information may also affect other organisations if any of the data pertained to a partner organisation.

3.13  Recoverability from the Incident

The size of the incident and the type of resources it affects will determine the amount of time and resources that must be spent on recovering from that incident. In some instances, it is not possible to recover from an incident (e.g., if the confidentiality of sensitive information has been compromised) and it would not make sense to spend limited resources on an elongated incident handling cycle, unless that effort was directed at ensuring that a similar incident did not occur in the future. In other cases, an incident may require far more resources to handle than what an organisation has available.

3.14  Information Impact Categories

Category

Definition

None No information was exfiltrated, changed, deleted, or otherwise compromised
Privacy Breach Sensitive personally identifiable information (PII) of taxpayers, employees, beneficiaries, etc. was accessed or exfiltrated
Proprietary Breach Unclassified proprietary information, such as protected critical infrastructure information (PCII), was accessed or exfiltrated
Integrity Loss Sensitive or proprietary information was changed or deleted

3.15  Recoverability Effort Categories

Category

Definition

Regular Time to recovery is predictable with existing resources
Supplemented Time to recovery is predictable with additional resources
Extended Time to recovery is unpredictable; additional resources and outside help are needed
Not Recoverable Recovery from the incident is not possible (e.g., sensitive data exfiltrated and posted publicly); launch investigation
The recoverability from the incident determines the possible responses that the team may take when handling the incident. An incident with a high functional impact and low effort to recover from is an ideal candidate for immediate action from the team.
The priority matrix below outlines what SLAs are applicable for the service.

Impact

Severity 1

Severity 2 

Severity 3 

Urgency

High

The service needs to be up and running ASAP.  The whole business is affected.

Urgent

High

Medium

Medium

The service is critical but is only affecting some users.

High

Medium

Low

Low

Impacted services are non-critical.  This is only affecting a small number of people or is in the UAT\Staging\Dev environment

Medium

Low

Low

Only Urgent tickets have a 24/7 SLA, the SLA for all other ticket types are business hours only, business hours exclude Public Holidays unless otherwise previously agreed between Haylix and the customer.

Urgency 

SLA

Urgent (24/7)

Initial acknowledgement: 30 minutes Time to Identify Resolution: 8 Hours Update Interval: 2 Hours Analyse: 1 calendar Week

High

Initial acknowledgement: 2 Hours Time to Identify Resolution: 24 Hours Update Interval: 24 Hours 

Normal

Initial acknowledgement: 24 Hours Time to Identify Resolution: 48 Hours Update Interval: 48 Hours 

Low

Initial acknowledgement: 48 Hours Time to Identify Resolution: 1 week Update Interval: 1 calendar week

Effective June 2023