CareGroup
Daniel Stone
Ohio Dominican University
Abstract
CareGroup, a healthcare system in
Massachusetts, experienced a network outage in November 2002. This outage was triggered by a researcher who
unintentionally installed a rogue software program onto the system that
overwhelmed the network. A summary of the
case documents causes and possible fixes. Recommendations are expanded upon the state
of the network and the staff at the time of the system failure. Lastly, a backup plan discussing dialup
modems, prevalent at the turn of the 21st century, and cloud backup, prevalent in
2013 are also expanded upon.
Keywords: CareGroup, Information Technology (IT),
system outage, rogue software application, backup plan
CareGroup Case
Although "health care did not suffer"
(Applegate, L., Austin, R. and Soule, D., 2009), the three-and-a-half day
complete collapse of the CareGroup's Information Technology (IT) system
strained this health-care system committed
to providing individualized care to patients through a wide array of
services. Between the morning and
afternoon of November 13th, 2002, the CareGroup's IT system went from the 21st
century back to the 1970s. Reverting to
a paper-based system during the outage, the four major CareGroup medical facilities'
IT systems were originally problematic.
In 1998, John Halamka became Chief Information Officer (CIO) of CareGroup,
and devised a backup system in an attempt to mitigate possible damages for the Year 2000 problem (Y2K).
However, this backup system was out-of-date and was missing
components. By 2002, a common system for
all hospitals were created for standard mode procedures (Applegate, L., Austin,
R. and Soule, D., 2009).
The three-and-a-half
day complete collapse of the CareGroup's IT system was started by a researcher
who was experimenting with a rogue software program that had not been tested (Applegate et al., 2009). A recommended
fix to this is a network security policy that is more strict in preventing
untrained IT professionals from installing untested software onto the CareGroup
network (Cisco Systems, Inc. 2005).
The homemade application caused huge data transfers that
took over the centrally located network switch (Applegate et al., 2009). Since the network was
physically redundant, the data that was not from the rogue software program had
other alternative paths to travel despite the major switch being disabled because
of the homemade application.
While
the CareGroup's network was blessed with redundancy that created alternate
routes for data to travel, it was also cursed.
This was due to the smaller networks being added in a way that caused
the overall network to confuse the primary and backup travel routes (Applegate et al., 2009). A
recommended fix to this is a more robust network system coupled with a
simulated downtime for a total network collapse (Halamka, J., 2008).
A
wide array of problems took place with the network down such as an ailing
communication system plan in the event there was a system outage (Halamka, J., 2008).
Paper forms from the Y2K changeover
were used, forcing the younger staff to take heed from the older staff. All the paper-based information gathered
during the outage was converted into electronic form within 48 hours after the
network was returned to normal. About
300 tests conducted during the outage could not be partnered with their
specimen, forcing staff to retest patients at CareGroup's expense (Applegate et al., 2009). A
recommended fix to this is to remain in backup mode while the network's
restoration can be assured by an outside vendor.
Finally,
24 hours after the homemade application took over the network, the original
networking manufacturer, Cisco Systems, Inc., was contacted. Once on site, Cisco had enough manpower and
equipment brought in to build an entire redundant core network if that was
needed to get CareGroup's IT system up and running again (Applegate et al., 2009). Working
around the clock, Cisco installed a larger and more modern switch while
upgrading the router. With the network
fully restored three-and-a-half days after the rogue software program took
over, the paper-based backup process was gradually phased out over the next 24
hours (Berinato, S., 2003) . A
recommended fix to this is to have an established relationship with Cisco that
would have allowed CareGroup to have been apprised about the state of their
network shortcomings (Halamka, J., 2008).
Recommendations
There
were several problems that played a part in the CareGroup's marginal network. These problems presented challenges in
delivering services to their patients,
threatened to disrupted the organization's cash flow, and took away from
the organization's bottom line. In
particular, these problems were: (a) a lack of security policies; (b) an
unclear backup plan and its implementation in the event of a system outage; (c)
reluctant to replacing inferior leadership of the organization's IT network;
(d) a lack of sharing of knowledge of the network system to other members of
the IT team; (e) a lack of knowledge of the system's capabilities; (f) nonexistent
relationships with original networking manufacturer, Cisco; and (g) not
contacting them promptly when a problem arises.
It is imperative that the following recommendations are looked into to prevent
another out of service situation.
Security Policy. CareGroup had too negligent of safeguards
allowing employees who were not IT professionals to install unapproved software
onto company computers which exposed the organization to instability (Halamka,
J., 2008). A rogue software program
installed by a researcher unintentionally crippled the CareGroup's IT system (Applegate,
L., Austin, R. and Soule, D., 2009). Without a clearly defined security policy, the accessibility
of CareGroup's network was compromised resulting in the network being disabled (Halamka,
J., 2008). It is recommended that usage
policy statements outlining users' roles and responsibilities with regard to
security needs to be implemented and clearly understood by everyone at
CareGroup (Cisco Systems, Inc. 2005).
Furthermore, it is recommended that risk analysis is conducted and
through consensus, the degree of comfort that the organization has with
homemade applications, tested on the CareGroup network needs to be addressed
(Cisco Systems, Inc. 2005). Lastly,
in regards to institutional control, it is recommended that a security team
structure is put in place with participants from each
of CareGroup's areas of operations (Cisco
Systems, Inc. 2005). The representatives on the team should be well-versed in security
policy. At the time of the system
outage, the entire network was so permeated that no one at CareGroup could
discover the root cause of the problem (Halamka,
J., 2008).
Best Practices in
Human Resources. It is recommended
that CareGroup act swiftly in removing employees that are not performing to
meet expectations. John Halamka, CIO of
CareGroup, felt that terminating CareGroup's single point of human contact
would result in an outage (Halamka, J., 2008).
By hesitating to act proactively to reinforce leadership of the network
team, he ended up with a system outage anyway (Halamka, J., 2008). As a
result of not removing the mediocre leadership of the network team, a message
to dedicated staff, new and prospective hires, and competitors near and far was
that mediocrity is being tolerated at CareGroup. Investors and shareholders could interpret
this weakness as a sign that CareGroup is in decline (Walters, C., 2001).
In addition to sending
the incorrect message to CareGroup's stakeholders and competitors, the
organization's resources, primarily management's time and effort, are
drained. Management's time and effort
would be better spent on training and preparing quality staff for new
opportunities (Walters,
C., 2001).
Best Practices in
Training. It is recommended that The
CareGroup train the entire IT team in all functions of the IT department so
that there are multiple points of human contact that are knowledgeable of the
network's systems inner workings. At the
time of the outage, Mr. Halamka was also cognizant that the other members of
the IT team could not configure the network due to poor documentation and a
lack of information sharing from the leadership of the network team (Halamka, J., 2008). Not only had the leadership of the network
team poorly documented how the network
operated, but this person was also not knowledgeable of the best practices for
redundant network cores and the
prevention of bandwidth monopolization by any one user or application (Halamka, J., 2008). In
the years that followed Y2K, a network was expected to incorporate many kinds
of impressive services which at the time were troublesome. Leadership of an organization's network team
has to stay abreast of the best practices and remain open to upgrades from when
times were easier and an organization could expect a network to support any IP
service (Avolio, F., 2000).
Service Partner
Relationship Management. It is
recommended that CareGroup establish a collaborative partner relationship with
their vendors. At the time of the system
outage there was a minimal relationship
between CareGroup and Cisco, the manufacturer of CareGroup's network. A Cisco partner was brought in to document
the network but, unfortunately, the network failure occurred before Cisco's
partner had finished documenting (Halamka, J., 2008). Instead of being just another vendor that
CareGroup purchased from, Cisco needs to be a partner that is involved in all
aspects of CareGroup's network infrastructure.
Had CareGroup and Cisco equally nurtured a relationship, CareGroup would
have enabled Cisco to warn Mr. Halamka
about CareGroup's vulnerabilities (Halamka, J., 2008).
Backup Plan
Hundreds
of tests conducted during the outage could not be partnered with their specimen
forcing staff to retest patients at CareGroup's expense (Applegate et al., 2009). On the one
hand, there were so many mismatched specimens because the paper system wasn't
robust enough to move between backup mode and standard mode several times. On the other hand, CareGroup's operations
team was given mixed signals to switch from backup mode to standard mode
prematurely several times before the network's stability had been assured (Applegate et al., 2009). The
end goal when having a problem such as having the data intact but inaccessible,
is to give and receive data, not just receive data as in paper mode.
One
solution to the problem and preferable to the stability of cash flow for
business continuity is for CareGroup's operations to be carried out in backup
mode via dial-up modems. While the
interaction would be far slower than in standard mode, it is faster than the
paper system and the network would have remained operational (Austin, R. and
Henningsson, S., 2013). In the early
2000s, when the outage took place, dial-up modems for connecting to networks were
prevalent. The benefit of dial-up
connectivity versus the paper system is that the backup mode and standard mode
remain digitalized with data being collected and processed electronically in
both modes, not a 48 hours after returning to standard mode as the case in
November 2002 (Applegate et al., 2009). Far fewer, if any mismatches on specimens
tested would likely be the result. After
the outage, CareGroup obtained extra analog telephone capacity and added modem
abilities to 50 computers (Austin, R. and Henningsson, S., 2013).
By
definition, laptops, tablets, smart phones, and WIFI hotspots connectivity are
prevalent today in comparison to dialup modems and would be a more realistic
investment in 2013. Cisco, the
manufacturer of CareGroup's network, increased cloud workloads from just 21% in
2010 to a projected 57% of all data-center workloads by 2015 (Williams, S.,
2012).
Cloud computing is more realistic
in 2013 in comparison to dialup modems and analog telephone capacity. By allowing essential staff at CareGroup
access to data stored remotely via cloud backup
is a more sustainable backup plan and would take the place of having
analog telephones and out-dated modem
abilities on computers (Williams, S., 2012).
Conclusion
A well-intended
researcher who was testing a homemade software application on the CareGroup
network triggered the system outage in November 2002. The overall network was a complicated blend
of networks due to the way that the networks were connected each time a new
hospital was added to the CareGroup system (Applegate et al., 2009). Main problems in combination to what triggered
this incident were an outdated and unclear backup plan, a security policy that
tolerated staff using untested software, a lack of documentation of the
network's inner workings, a wealth of the network's inner workers being held by
one person, and a fledgling relationship with Cisco (Halamka, J., 2008). Recommendations to avoid future issues
are to understanding a usage policy statements outlining users' roles and
responsibilities with regard to security needs to be implemented, act swiftly
in removing personnel that are not performing to meet expectations, train the
entire IT team in all functions of the IT department so that there are multiple
points of human contact that are knowledgeable of the network's systems inner
workings, establish a collaborative partner relationship with their vendors,
and contact the vendor promptly when an issue arises (Halamka, J., 2008).
After
the outage, CareGroup obtained analog telephone capacity and added modem
abilities to dozens of computers (Austin, R. and Henningsson, S., 2013). This backup plan was more current to
CareGroup's backup plan of paper forms, a carryover from Y2K. However, a more prevalent backup plan in 2013
is allowing essential staff at CareGroup access to data stored remotely via
cloud backup. While dialup modems were
an update to the paper system in 2002 (Austin, R. and Henningsson, S., 2013), dialup
modem computing would be out of date in 2013.
References
Applegate, L., Austin,
R. and Soule, D. (2009). Corporate information strategy and management: Text
and cases. 8th ed. , P. 322-338.
Austin, R. and
Henningsson, S. (2013). CareGroup case
discussion: IT risk management and robust operations. Copenhagen Business
School, April 2013.
Avolio, F. M.
(2000). Network Computing For IT and by
IT: Best Practices in Network Security. Retrieved from
http://www.networkcomputing.com/1105/1105f2.html
Berinato, S. (2003).
CIO: All Systems Down. Retrieved
from http://www.cio.com.au/article/65115/all_systems_down/
Cisco Systems, Inc.
(2005). Network Security Policy: Best Practices White Paper. Retrieved from
http://www.cisco.com/en/US/tech/tk869/tk769/technologies_white_paper09186a008014f945.shtml#t1
Halamka, J. (2008).
Life As A Healthcare CIO: The CareGroup Network Outage. Retrieved
from http://geekdoctor.blogspot.com/2008/03/caregroup-network-outage.html
Walters, C.
(2001). HR Works, Inc.: The
costs of not firing a mediocre employee. Retrieved from
http://www.hrworks-inc.com/our-solutions/recruiting/recruiting-articles/104-the-costs-of-not-firing-a-mediocre-employee
Williams, S. (2012). The Motley Fool: 5
Companies to Play to Cloud Computing Revolution. Retrieved from http://www.fool.com/investing/general/2012/08/13/5-companies-to-play-the-cloud-computing-revolution.aspx
http://daniel-j-stone.blogspot.com (C) 2009-13
1 comment:
This paper received marks 100 out of 100. Professor for this course will be using this paper as a future example of an excellent paper.
Post a Comment