Meet. Your. Full. Potential. Daniel J. Stone: CareGroup Case Study Paper

CareGroup

Daniel Stone

Ohio Dominican University

Abstract

CareGroup, a healthcare system in Massachusetts, experienced a network outage in November 2002. This outage was triggered by a researcher who unintentionally installed a rogue software program onto the system that overwhelmed the network. A summary of the case documents causes and possible fixes. Recommendations are expanded upon the state of the network and the staff at the time of the system failure. Lastly, a backup plan discussing dialup modems, prevalent at the turn of the 21st century, and cloud backup, prevalent in 2013 are also expanded upon.

Keywords: CareGroup, Information Technology (IT), system outage, rogue software application, backup plan

CareGroup Case

Although "health care did not suffer" (Applegate, L., Austin, R. and Soule, D., 2009), the three-and-a-half day complete collapse of the CareGroup's Information Technology (IT) system strained this health-care system committed to providing individualized care to patients through a wide array of services. Between the morning and afternoon of November 13th, 2002, the CareGroup's IT system went from the 21st century back to the 1970s. Reverting to a paper-based system during the outage, the four major CareGroup medical facilities' IT systems were originally problematic. In 1998, John Halamka became Chief Information Officer (CIO) of CareGroup, and devised a backup system in an attempt to mitigate possible damages for the Year 2000 problem (Y2K). However, this backup system was out-of-date and was missing components. By 2002, a common system for all hospitals were created for standard mode procedures (Applegate, L., Austin, R. and Soule, D., 2009).

The three-and-a-half day complete collapse of the CareGroup's IT system was started by a researcher who was experimenting with a rogue software program that had not been tested (Applegate et al., 2009). A recommended fix to this is a network security policy that is more strict in preventing untrained IT professionals from installing untested software onto the CareGroup network (Cisco Systems, Inc. 2005).

The homemade application caused huge data transfers that took over the centrally located network switch (Applegate et al., 2009). Since the network was physically redundant, the data that was not from the rogue software program had other alternative paths to travel despite the major switch being disabled because of the homemade application.

While the CareGroup's network was blessed with redundancy that created alternate routes for data to travel, it was also cursed. This was due to the smaller networks being added in a way that caused the overall network to confuse the primary and backup travel routes (Applegate et al., 2009). A recommended fix to this is a more robust network system coupled with a simulated downtime for a total network collapse (Halamka, J., 2008).

A wide array of problems took place with the network down such as an ailing communication system plan in the event there was a system outage (Halamka, J., 2008).

Paper forms from the Y2K changeover were used, forcing the younger staff to take heed from the older staff. All the paper-based information gathered during the outage was converted into electronic form within 48 hours after the network was returned to normal. About 300 tests conducted during the outage could not be partnered with their specimen, forcing staff to retest patients at CareGroup's expense (Applegate et al., 2009). A recommended fix to this is to remain in backup mode while the network's restoration can be assured by an outside vendor.

Finally, 24 hours after the homemade application took over the network, the original networking manufacturer, Cisco Systems, Inc., was contacted. Once on site, Cisco had enough manpower and equipment brought in to build an entire redundant core network if that was needed to get CareGroup's IT system up and running again (Applegate et al., 2009). Working around the clock, Cisco installed a larger and more modern switch while upgrading the router. With the network fully restored three-and-a-half days after the rogue software program took over, the paper-based backup process was gradually phased out over the next 24 hours (Berinato, S., 2003) . A recommended fix to this is to have an established relationship with Cisco that would have allowed CareGroup to have been apprised about the state of their network shortcomings (Halamka, J., 2008).

Recommendations

There were several problems that played a part in the CareGroup's marginal network. These problems presented challenges in delivering services to their patients, threatened to disrupted the organization's cash flow, and took away from the organization's bottom line. In particular, these problems were: (a) a lack of security policies; (b) an unclear backup plan and its implementation in the event of a system outage; (c) reluctant to replacing inferior leadership of the organization's IT network; (d) a lack of sharing of knowledge of the network system to other members of the IT team; (e) a lack of knowledge of the system's capabilities; (f) nonexistent relationships with original networking manufacturer, Cisco; and (g) not contacting them promptly when a problem arises. It is imperative that the following recommendations are looked into to prevent another out of service situation.

Security Policy. CareGroup had too negligent of safeguards allowing employees who were not IT professionals to install unapproved software onto company computers which exposed the organization to instability (Halamka, J., 2008). A rogue software program installed by a researcher unintentionally crippled the CareGroup's IT system (Applegate, L., Austin, R. and Soule, D., 2009). Without a clearly defined security policy, the accessibility of CareGroup's network was compromised resulting in the network being disabled (Halamka, J., 2008). It is recommended that usage policy statements outlining users' roles and responsibilities with regard to security needs to be implemented and clearly understood by everyone at CareGroup (Cisco Systems, Inc. 2005). Furthermore, it is recommended that risk analysis is conducted and through consensus, the degree of comfort that the organization has with homemade applications, tested on the CareGroup network needs to be addressed (Cisco Systems, Inc. 2005). Lastly, in regards to institutional control, it is recommended that a security team structure is put in place with participants from each of CareGroup's areas of operations (Cisco Systems, Inc. 2005). The representatives on the team should be well-versed in security policy. At the time of the system outage, the entire network was so permeated that no one at CareGroup could discover the root cause of the problem (Halamka, J., 2008).

Best Practices in Human Resources. It is recommended that CareGroup act swiftly in removing employees that are not performing to meet expectations. John Halamka, CIO of CareGroup, felt that terminating CareGroup's single point of human contact would result in an outage (Halamka, J., 2008). By hesitating to act proactively to reinforce leadership of the network team, he ended up with a system outage anyway (Halamka, J., 2008). As a result of not removing the mediocre leadership of the network team, a message to dedicated staff, new and prospective hires, and competitors near and far was that mediocrity is being tolerated at CareGroup. Investors and shareholders could interpret this weakness as a sign that CareGroup is in decline (Walters, C., 2001).

In addition to sending the incorrect message to CareGroup's stakeholders and competitors, the organization's resources, primarily management's time and effort, are drained. Management's time and effort would be better spent on training and preparing quality staff for new opportunities (Walters, C., 2001).

Best Practices in Training. It is recommended that The CareGroup train the entire IT team in all functions of the IT department so that there are multiple points of human contact that are knowledgeable of the network's systems inner workings. At the time of the outage, Mr. Halamka was also cognizant that the other members of the IT team could not configure the network due to poor documentation and a lack of information sharing from the leadership of the network team (Halamka, J., 2008). Not only had the leadership of the network team poorly documented how the network operated, but this person was also not knowledgeable of the best practices for redundant network cores and the prevention of bandwidth monopolization by any one user or application (Halamka, J., 2008). In the years that followed Y2K, a network was expected to incorporate many kinds of impressive services which at the time were troublesome. Leadership of an organization's network team has to stay abreast of the best practices and remain open to upgrades from when times were easier and an organization could expect a network to support any IP service (Avolio, F., 2000).

Service Partner Relationship Management. It is recommended that CareGroup establish a collaborative partner relationship with their vendors. At the time of the system outage there was a minimal relationship between CareGroup and Cisco, the manufacturer of CareGroup's network. A Cisco partner was brought in to document the network but, unfortunately, the network failure occurred before Cisco's partner had finished documenting (Halamka, J., 2008). Instead of being just another vendor that CareGroup purchased from, Cisco needs to be a partner that is involved in all aspects of CareGroup's network infrastructure. Had CareGroup and Cisco equally nurtured a relationship, CareGroup would have enabled Cisco to warn Mr. Halamka about CareGroup's vulnerabilities (Halamka, J., 2008).

Backup Plan

Hundreds of tests conducted during the outage could not be partnered with their specimen forcing staff to retest patients at CareGroup's expense (Applegate et al., 2009). On the one hand, there were so many mismatched specimens because the paper system wasn't robust enough to move between backup mode and standard mode several times. On the other hand, CareGroup's operations team was given mixed signals to switch from backup mode to standard mode prematurely several times before the network's stability had been assured (Applegate et al., 2009). The end goal when having a problem such as having the data intact but inaccessible, is to give and receive data, not just receive data as in paper mode.

One solution to the problem and preferable to the stability of cash flow for business continuity is for CareGroup's operations to be carried out in backup mode via dial-up modems. While the interaction would be far slower than in standard mode, it is faster than the paper system and the network would have remained operational (Austin, R. and Henningsson, S., 2013). In the early 2000s, when the outage took place, dial-up modems for connecting to networks were prevalent. The benefit of dial-up connectivity versus the paper system is that the backup mode and standard mode remain digitalized with data being collected and processed electronically in both modes, not a 48 hours after returning to standard mode as the case in November 2002 (Applegate et al., 2009). Far fewer, if any mismatches on specimens tested would likely be the result. After the outage, CareGroup obtained extra analog telephone capacity and added modem abilities to 50 computers (Austin, R. and Henningsson, S., 2013).

By definition, laptops, tablets, smart phones, and WIFI hotspots connectivity are prevalent today in comparison to dialup modems and would be a more realistic investment in 2013. Cisco, the manufacturer of CareGroup's network, increased cloud workloads from just 21% in 2010 to a projected 57% of all data-center workloads by 2015 (Williams, S., 2012).

Cloud computing is more realistic in 2013 in comparison to dialup modems and analog telephone capacity. By allowing essential staff at CareGroup access to data stored remotely via cloud backup is a more sustainable backup plan and would take the place of having analog telephones and out-dated modem abilities on computers (Williams, S., 2012).

Conclusion

A well-intended researcher who was testing a homemade software application on the CareGroup network triggered the system outage in November 2002. The overall network was a complicated blend of networks due to the way that the networks were connected each time a new hospital was added to the CareGroup system (Applegate et al., 2009). Main problems in combination to what triggered this incident were an outdated and unclear backup plan, a security policy that tolerated staff using untested software, a lack of documentation of the network's inner workings, a wealth of the network's inner workers being held by one person, and a fledgling relationship with Cisco (Halamka, J., 2008). Recommendations to avoid future issues are to understanding a usage policy statements outlining users' roles and responsibilities with regard to security needs to be implemented, act swiftly in removing personnel that are not performing to meet expectations, train the entire IT team in all functions of the IT department so that there are multiple points of human contact that are knowledgeable of the network's systems inner workings, establish a collaborative partner relationship with their vendors, and contact the vendor promptly when an issue arises (Halamka, J., 2008).

After the outage, CareGroup obtained analog telephone capacity and added modem abilities to dozens of computers (Austin, R. and Henningsson, S., 2013). This backup plan was more current to CareGroup's backup plan of paper forms, a carryover from Y2K. However, a more prevalent backup plan in 2013 is allowing essential staff at CareGroup access to data stored remotely via cloud backup. While dialup modems were an update to the paper system in 2002 (Austin, R. and Henningsson, S., 2013), dialup modem computing would be out of date in 2013.

References

Applegate, L., Austin, R. and Soule, D. (2009). Corporate information strategy and management: Text and cases. 8th ed. , P. 322-338.

Austin, R. and Henningsson, S. (2013). CareGroup case discussion: IT risk management and robust operations. Copenhagen Business School, April 2013.

Avolio, F. M. (2000). Network Computing For IT and by IT: Best Practices in Network Security. Retrieved from http://www.networkcomputing.com/1105/1105f2.html

Berinato, S. (2003). CIO: All Systems Down. Retrieved from http://www.cio.com.au /article/65115/all_systems_down/

Cisco Systems, Inc. (2005). Network Security Policy: Best Practices White Paper. Retrieved from http://www.cisco.com/en/US/tech/tk869/tk769/technologies_white_paper 09186a008014f945.shtml#t1

Halamka, J. (2008). Life As A Healthcare CIO: The CareGroup Network Outage. Retrieved from http://geekdoctor.blogspot.com/2008/03/caregroup-network-outage.html

Walters, C. (2001). HR Works, Inc.: The costs of not firing a mediocre employee. Retrieved from http://www.hrworks-inc.com/our-solutions/recruiting/recruiting-articles/104-the-costs-of-not-firing-a-mediocre-employee

Williams, S. (2012). The Motley Fool: 5 Companies to Play to Cloud Computing Revolution. Retrieved from http://www.fool.com/investing/general/2012/08/13/5-companies-to-play-the-cloud-computing-revolution.aspx

Meet. Your. Full. Potential. Daniel J. Stone

Google AdSense

Google Adsense

Wednesday, October 2, 2013

CareGroup Case Study Paper

CareGroup

Daniel Stone

Ohio Dominican University

1 comment:

Blog Archive

FEEDJIT Live Traffic Feed