The details of this depend on the facilities available. Generally the possible values we might want to generate alarms on fall into the following categories:

  1. DC values with hi/lo limits (e.g. voltages, temperatures)
  2. "Should never happen" error conditions (e.g. "L1A received while in Busy state") or "EvN mismatch"
  3. "Rate metered" conditions. e.g. "L1A received while in OFW state" may count at a low rate and this is OK
  4. Things which should (approximately) match, e.g. STATUS.GENERAL.L1A_COUNT and STATUS.LSC.SFPx.BUILT_EVENT_COUNT.

The alarms currently being used are defined here: https://svnweb.cern.ch/cern/wsvn/cmshcos/trunk/hcalAlarm/conf/

Hardware alarm conditions

These are unrelated to any running and indicate possible hardware failures in the board. We can establish high and low limits for warning and error.

STATUS.FPGA.DIE_TEMP	  V6 die temperature in unit of 0.1 degree Celsius
STATUS.FPGA.MV_0V75_VREF  0.75V DDR3_Vref power voltage in millivolt
STATUS.FPGA.MV_0V75_VTT   0.75V DDR3_Vtt power voltage in millivolt
STATUS.FPGA.MV_12V0	  12V power voltage in millivolt
STATUS.FPGA.MV_1V0	  1.0V analog power voltage in millivolt
STATUS.FPGA.MV_1V0_BRAM	  1.0V VccBRAM power voltage in millivolt
STATUS.FPGA.MV_1V0_INT	  1.0V Vccint power voltage in millivolt
STATUS.FPGA.MV_1V2	  1.2V analog power voltage in millivolt
STATUS.FPGA.MV_1V5	  1.5V power voltage in millivolt
STATUS.FPGA.MV_1V8_AUX	  1.8V VccAux power voltage in millivolt
STATUS.FPGA.MV_1V8_GTX	  1.8V VccAuxGTX power voltage in millivolt
STATUS.FPGA.MV_2V0	  2.0V VccAuxIO power voltage in millivolt
STATUS.FPGA.MV_2V5	  2.5V power voltage in millivolt
STATUS.FPGA.MV_3V3	  3.3V power voltage in millivolt

State timers

These are 64-bit timers (they have _LO and _HI words) which count the total time the AMC13 has spent in the TTS BSY/OFW/SYN states. You have to decide what to do about these, but any time spent in SYN is probably an error.

STATUS.GENERAL.BUSY_TIME_LO        busy time counter
STATUS.GENERAL.OF_WARN_TIME_LO     L1A overflow warning time counter
STATUS.GENERAL.SYNC_LOST_TIME_LO   L1A sync lost time counter

L1A when there shouldn't be any

These count (as it says) the number of L1A seen when there shouldn't be any. Problem is, the OFW and even BSY ones are reported to count at a low rate due to excessive latency in the GT response to TTS. Probably you need a software rate meter on these.

STATUS.GENERAL.L1A_WHEN_BSY_LO  L1A received when in BSY state
STATUS.GENERAL.L1A_WHEN_OFW_LO  L1A received when in OFW state
STATUS.GENERAL.L1A_WHEN_SYN_LO  L1A received when in SYN state

Errors in the data

STATUS.AMC.AMC_CRC_ERR	AMC event CRC error detected
STATUS.AMC01.BP_CRC_ERR	Backplane link CRC error

AMC links

These indicate problems with the uHTR data. They really should all be zero all the time, but we may need to put a rate-meter limit to let a few through without alarming.

These indicate problems with the data sent by the uHTR.

EVN_MISMATCH_COUNTER_LO              AMC Evn mismatch counter
ORN_MISMATCH_COUNTER_LO              AMC OrN mismatch counter
BCN_MISMATCH_COUNTER_LO              AMC BcN mismatch counter
BAD_EVENTLENGTH_COUNTER_LO           AMC bad event length counter
TRAILER_EVN_MISMATCH_COUNTER_LO      AMC event trailer Evn mismatch error counter
BAD_AMC_CRC_COUNTER_LO               Bad CRC on event from AMC
AMC_TTS_DISC_COUNTER_LO              TTS state 'disconnected' from AMC
AMC_TTS_SYNC_COUNTER_LO              TTS state 'sync lost' from AMC
AMC_TTS_ERR_COUNTER_LO               TTS state 'error' from AMC

These indicate problems with data received at the AMC13 end of the backplane

AMC13_EVN_MISMATCH_LO                HTR event EVN mismatch counter
AMC13_BCN_MISMATCH_LO                HTR event BCN mismatch counter
AMC13_ORN_MISMATCH_LO                HTR event OCN mismatch counter
AMC13_BAD_LENGTH_LO                  AMC bad event length counter

-- EricHazen - 07 Oct 2015

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 09 Oct 2015 - EricHazen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback