HCAL Operations

HCAL Twiki: https://twiki.cern.ch/twiki/bin/viewauth/CMS/HCALWikiHome HCAL Contact List: https://twiki.cern.ch/twiki/bin/view/CMS/HcalContactList DOC on call: https://twiki.cern.ch/twiki/bin/view/CMS/HcalDOCHowTo DAQ info: https://indico.cern.ch/event/519748/

On Call Cheat Sheet

Setting up Tunnels

Follow the HCAL Connectivity twiki to get tunnels working for lxplus,904, and p5 networks. The best choice for windows is to use linux with a virtual box. If you are on lxplus already at cern with a windows computer putty works well too.

Setup Environment

Bash files

Snippets

Getting the Hcal config files point 5: (should be the same for 904) $ cfgcvs checkout HcalCfg

Very similar to svn $ cvs commit -m "Comment here" $ cvs tag -F pro DTC.cfg

MCH Commands

telnet hcal-mch-20 (the hcal-mch-20 is like in HcalCfg/uTCA/connection.xml with hcal-card-crate-additionaloption)
>> show_fru
(M4 is on)
>> shutdown <fru_number>
(Probably do show_fru again to make sure the state goes to M1 of the particular card)
>> fru_start <fru_number>
 
>> exit
 
 
To shut down all cards:
>>shutdown all
To reboot ( this also restart the MCH so you will be disconnected. Wait ten seconds or so and then telnet back in to see status)
>> reboot

Run Control Websites

P5: http://cmsrc-hcal.cms:16000/rcms/gui/servlet/RunGroupChooserServlet

CMS TOP: http://cmsrc-top.cms:10000/rcms/gui/servlet/RunGroupChooserServlet

904: http://cms904rc-hcal.cms904:16000/rcms/gui/servlet/RunGroupChooserServlet

H2: http://cmshcalTB02:16000/rcms/gui/servlet/RunningConfigurationServlet

Building 28: http://cmshcal21:16000/rcms/gui/servlet/RunningConfigurationServlet

Configuration Chooser sets up the run. Once to the configuration screen do: Set enable parameters initialized configure start Stop the run before killing the run

uHTR

The uHTR tool is pretty straightforward. The only thing to watch out for is that the back and front fpga firmware should not be mixed up (and they can be…). Also, the versions should match between front and back. This should be both the detector part (HF, HBHE) and the number after it which is the speed (1600 = 1.6Gbps, 4800 = 4.8Gbps). Examples match below: uhtr_front_HBHE1600_1_00_07.mcs.xz uhtr_back_HBHE1600_1_00_00.mcs.xz

Correct Use for P5/904: $ uHTRtool.exe -c crate:slot (uHTRtool.exe -c 52:10) Not sure yet what the shell script does (uHTRtool.sh)

AMC13

Connecting to an AMC13: There is a shell script at p5 and 904 that will use the connection file and then the -i option:

AMC13Tool2.exe -c ~hcalsw/uTCA.connections.pro.xml -i hcal.crate$1.amc13 ${@:2}

This is the p5 script. The 904 is almost identical but the location of the connection file is hcalsw/HcalCfg/uTCA/connections.xml. Just use this to connect:

$ ~hcalsw/bin/AMC13Tool2.sh crate#

6/24/16 To add the AMC13Tool2.sh script to hcalsw used: $ sudo -u hcalsw then_a_command Example to edit the shell script: $ sudo -u hcalsw emacs AMC13Tool2.sh

Log Files

Go to 904 or p5 network Ssh cms904rc-hcal (for 904) Ssh cmsrc-hcal (for p5) Run the handsaw script (this is located in ~hcalsw/bin/): $ Handsaw.pl /var/log/rcms/hcalpro/Logs_hcalpro.xml] $ tail -f /var/log/rcms/hcalpro/Logs_hcalpro.xml | Handsaw.pl Tail streams the errors out. Probably the best to look at.

Parsing through old Logs: Copy the log file you want to look at. They are usually in a compressed (.gz). To open them: $ gunzip file.gz

Once you have the log file, you can use Handsaw and less to look through them: $ Handsaw.pl logfile.xml | less -R

Elog + Shiftlift

Main Page: https://cmsonline.cern.ch/webcenter/portal/cmsonline/pages_common?wc.contentSource= Direct to Elog: https://cmsonline.cern.ch/webcenter/portal/cmsonline/pages_common/elog

Elog: Common -> Elog -> Subsystems -> Hcal -> Hcal, Hcal904, etc

Shiftlist: Common -> Shiftlist

System Manager

This long living application writes IP addresses to the uTCA cards based upon their crate and slot. It should detect movement or power cycles and be able to write the new IP address. This does not work properly for the AMC13 yet and instead the application needs to be restarted if an AMC13 is exchanged or moved.

904:

P5: (Use on hcalutca01)

$ sudo systemctl restart sysmgr.service
Outdated:
$ ~hcalsw/bin/restart_sysmgr.sh
$ sudo -u hcalpro ~hcalpro/scripts/Service_fix.sh
$ sudo -u hcalpro sysmgr ~hcalsw/config_files/Sysmgr/sysmgrHCAL.conf

Random

Killing Stale Xdaq Processes When a run is destroyed there can be processes that were not destroyed. These will interfere with the next run and cause errors on initialize.

Processes NOT to kill: root 35246 18.7 7.3 6499672 2399100 ? Ssl Apr20 11439:33 /opt/xdaq/bin/xdaq.exe -h srv-s2f17-19-01.cms -p 9950 -u file.append:/var/log/hcal.xaad.log -e /opt/xdaq/share/hcal/profile/xaad.profile -z hcal

root 35247 8.2 0.2 4319652 91404 ? Ssl Apr20 5045:51 /opt/xdaq/bin/xdaq.exe -h srv-s2f17-19-01.cms -p 9999 -u file.append:/var/log/hcal.jobcontrol.log -e /opt/xdaq/share/hcal/profile/jobcontrol.profile -z hcal

Example of stale xdaq process to kill: hcalpro 41955 9.1 0.2 4261576 93072 ? Sl 22:37 1:01 /opt/xdaq/bin/xdaq.exe -h hcalutca01.cms -p 15002 -s 294342 -u xml://cmsrc-hcal.cms:16010 -l INFO

Stale XDAQ example: hcalpro 13038 4.1 0.3 2588468 122480 ? Sl Sep15 71:14 /opt/xdaq/bin/xdaq.exe -h hcalutca01.cms -p 16789 -u xml://cmsrc-hcal.cms:16010 -l INFO

Do not kill things that are run under root like job control, controlhub, xaad, etc. Be careful with pkill and such that use the name xdaq since it will pick up those other ones as well.

~hcalsw/bin/dump_all.sh See if relevant card ispingable. If uHTR of AMC13 is not pingable: -Restart sysmgr: Using script on hcalutca01 machines: ~hcalsw/bin/restart_sysmgr.sh

Basically ~hcalsw/bin/ contains all the magical scripts to do most things.

Crate Locations and FEDS: http://cmsdoc.cern.ch/cms/HCAL/document/CountingHouse/Crates/Crate_interfaces_transition.htm

Dump: ~hcalsw/bin/dump_all.sh

ps aux (just to check what is there, if no stal xdaq it should be fine to just run: (sudo service xdaqd restart)

sudo service xdaqd stop pgrep xdaq (just to check what is there) pkill xdaq sudo service xdaqd start

THESE CHANGE QUITE OFTEN: Restart Services 904 I have begun working on service fix at 904. To run it, type:

sudo -u hcalpro ~hcalpro/scripts/Service_fix.sh

The currently functioning options are:

-tomcat -sysmgr

The possible functioning option is:

-ccmserver

Please use this and email/elog any issues or desired functionality

Restart System Manager p5 Here is the start system manager command. It should be used sparingly. In most cases (such as swapping uHTRs), the system manager will not need to be restarted.

sudo -u hcalpro sysmgr ~hcalsw/config_files/Sysmgr/sysmgrHCAL.conf

-- DanielArcaro - 21 Jun 2017


This topic: Main > TWikiUsers > DanielArcaro > HCALOP
Topic revision: r6 - 21 Jul 2017 - DanielArcaro
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback