The role of fault management in the embedded system design

Sammanfattning: In the last decade, the world of telecommunications has seen the value ofservices definitively affirmed and the loss of the connectivity value. This changeof pace in the use of the network (and available hardware resources) has ledto continuous, unlimited growth in data traffic, increased incomes for serviceproviders, and a constant erosion of operators’ incomes for voice and ShortMessage Service (SMS) traffic.The change in mobile service consumption is evident to operators. Themarket today is in the hands of over the top (OTT) media content deliverycompanies (Google, Meta, Netflix, Amazon, etc.), and The fifth generation ofmobile networks (5G), the latest generation of mobile architecture, is nothingother than how operators can invest in system infrastructure to participate in theprosperous service business.With the advent of 5G, the worlds of cloud and telecommunications havefound their meeting point, paving the way for new infrastructures and ser-vices, such as smart cities, industry 4.0, industry 5.0, and Augmented Reality(AR)/Virtual Reality (VR). People, infrastructures, and devices are connected toprovide services that we even struggle to imagine today, but a highly intercon-nected system requires high levels of reliability and resilience.Hardware reliability has increased since the 1990s. However, it is equallycorrect to mention that the introduction of new technologies in the nanometerdomain and the growing complexity of on-chip systems have made fault man-agement critical to guarantee the quality of the service offered to the customerand the sustainability of the network infrastructure.In this thesis, our first contribution is a review of the fault managementimplementation framework for the radio access network domain. Our approachintroduces a holistic vision in fault management where there is increasingly moresignificant attention to the recovery action, the crucial target of the proposedframework. A new contribution underlines the attention toward the recoverytarget: we revisited the taxonomy of faults in mobile systems to enhance theresult of the recovery action, which, in our opinion, must be propagated betweenthe different layers of an embedded system ( hardware, firmware, middleware,and software). The practical adoption of the new framework and the newtaxonomy allowed us to make a unique contribution to the thesis: the proposalof a new algorithm for managing system memory errors, both temporary (soft)and permanent (hard)The holistic vision of error management we introduced in this thesis involveshardware that proactively manages faults. An efficient implementation of faultmanagement is only possible if the hardware design considers error-handlingtechniques and methodologies. Another contribution of this thesis is the def-inition of the fault management requirements for the RAN embedded systemhardware design.Another primary function of the proposed fault management framework isfault prediction. Recognizing error patterns means allowing the system to reactin time, even before the error condition occurs, or identifying the topology of theerror to implement more targeted and, therefore, more efficient recovery actions.The operating temperature is always a critical characteristic of embedded radioaccess network systems. Base stations must be able to work in very differenttemperature conditions. However, the working temperature also directly affectsthe probability of error for the system. In this thesis, we have also contributed interms of a machine-learning algorithm for predicting the working temperature ofbase stations in radio access networks — a first step towards a more sophisticatedimplementation of error prevention and prediction.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)