vijesti

dom > Tvrtka > VIJESTI > vijesti > Satellite OBC Selection Guide (VII): Fault Tolerance – Addressing Failures Caused by Space Radiation

Satellite OBC Selection Guide (VII): Fault Tolerance – Addressing Failures Caused by Space Radiation

Radiation in the space environment (e.g., cosmic rays, solar protons) can trigger "Single Event Effects (SEE)" in OBC electronic components — including faults such as Single Event Upset (SEU, memory data errors), Single Event Latchup (SEL, component burnout), and Single Event Functional Interruption (SEFI, logic errors). Therefore, the OBC’s "fault tolerance" is critical to ensuring the reliability of satellite missions.
 
Satellite OBC Selection Guide (VII): Fault Tolerance – Addressing Failures Caused by Space Radiation
 

Implementation Layers of OBC Fault Tolerance: Hardware and Software

 

OBC fault-tolerant design must cover both "hardware" and "software" to form a comprehensive fault protection system:
 

1. Hardware Layer: Fault-Tolerant Design from Components to Architecture

 

  • Triple Modular Redundancy (TMR): In the FPGA (Field-Programmable Gate Array), the same logical function is implemented by three identical modules. The correct result is output through a "voting mechanism" — if one module malfunctions due to radiation, the correct results of the other two modules can override the error, ensuring normal logical operation. For example, the FPGA of STM MICROSATPRO adopts TMR design, which effectively resists logic errors caused by SEU.
  • Error Detection and Correction (EDAC): EDAC circuits are integrated into memory (e.g., RAM, flash memory) to real-time monitor data integrity. When radiation causes data inversion (SEU), the EDAC circuit can automatically detect and correct the error, preventing incorrect data from affecting OBC operation. EDAC is a "standard" fault-tolerant technology for OBC memory, especially suitable for OBCs in long-duration missions.
  • Latchup Current Limiter (LCL): An LCL is added to the OBC’s power supply unit to prevent "Single Event Latchup (SEL)". SEL causes a sharp increase in component current, which may burn the device. The LCL can quickly cut off the power supply when the current exceeds the threshold to protect the component, and restore power after the fault is resolved.
  • Hardware Redundancy: "Redundancy design" is adopted for core components (e.g., processors, power modules). For example, two identical processors are configured — one operates normally while the other is on standby. If the working processor fails, the standby processor can switch immediately to ensure uninterrupted tasks.
  •  

2. Software Layer: Fault Recovery and Protection

 

  • Watchdog Timer (WDT): WDT is the "last line of defense" against software faults. It is essentially a timer that starts counting down from a set time (e.g., 30 seconds). The OBC’s normal software must periodically "reset the WDT" (i.e., "kick the WDT") to restart the countdown. If the software falls into an infinite loop or malfunctions due to radiation and fails to reset the WDT on time, the WDT will trigger a "soft restart" after the countdown ends to restore normal OBC operation.
  • Software Redundancy: A "software copy" is stored in the OBC’s static memory. For example, the main memory stores the running software, and the backup memory stores the same software image. If the main memory software is damaged by radiation, the software can be loaded from the backup memory to avoid OBC paralysis.
  • Task Monitoring and Retry: A "task monitoring module" is added to the software to real-time track the execution status of core tasks (e.g., data compression, attitude control). If a task times out or fails, the monitoring module can trigger a "task retry" to ensure the task is ultimately completed.
  •  

Typical Fault-Tolerant OBC Example: STM MICROSATPRO

 

STM MICROSATPRO is an OBC model with high fault tolerance, featuring comprehensive fault-tolerant designs:
 
  • Adopts a "fault-tolerant FT-LEON3 processor core" with native radiation resistance support.
  • FPGA integrates TMR to resist logic errors.
  • Memory is equipped with EDAC to correct data inversion.
  • Power supply unit includes LCL to prevent SEL.
  • The software layer supports WDT and software redundancy for automatic fault recovery.
 
In summary, OBC fault tolerance requires "a combination of hardware and software": hardware design reduces fault occurrence at the source, while software design ensures rapid recovery after faults occur, ultimately guaranteeing the satellite’s reliable operation in a radiation environment.

Ako ste zainteresirani za naše proizvode, svoje podatke možete ostaviti ovdje, a mi ćemo vas uskoro kontaktirati.