Please log AMC13 test activity below, blog style (new entries at top)

2016-02-12, dgastler

It turns out that cms1 is not very fast and is the bottleneck for these speed tests.

Dominique instructed me on how to get the FEROL to drop the data when it lands and now the CPU use has dropped dramatically and the throughput went way up.

Here are the commands to throw away the data:

>lspci -v
        02:03.0 Memory controller: CMS DAQ group Device fea0 (rev 01)
        Flags: 66MHz, slow devsel
        Memory at dd210000 (32-bit, non-prefetchable) [size=64K]
        Memory at dd200000 (32-bit, non-prefetchable) [size=64K]
>sudo busybox devmem 0xdd210000 32 0

With 1952 Byte events, I was able to max out the AMC13's random event rate at 130kHz.

To try to max out the link, I increased the event size until I got backpressure. This happened at FAKE_DATA_SIZE of 0x40 (6560bytes) with a max rate of about 75kHz.

This gives a throughput of 500MBps, which is what we would expect.

2016-02-10, dgastler

Stressing the FEROL with data from an amc13 with no writing to disk.

I was seeing a low max rate ~200Hz on 25kB event fragments, so I wanted to investigate.

Double checking my math for event sizes by using both the AMC13's read event and the FEROL's write to disk I have verified my event fragment size equations for the AMC13's fake data generator.

((AMC13_FAKE_DATA_SIZE+3)*nAMCs + (4+nAMCs))*8 Bytes per event fragment.

Choosing 0x10 for the fake data size gives 1952byte events.

Sending this to the FEROL and increasing the rate from 10hz, I didn't see any time in L1A_IN_OFW or OVERFLOW_WARNING until setting the AMC13 to 50khz.

With these settings the AMC13 reports an effective rate of about 30khz.

2016-02-10, dgastler

Work on FEROL on CMS1

To get a fresh start on the xdaq software, I purged cms1 of xdaq using:

% sudo yum erase “daq-*”
% sudo yum clean all

CMS1 is already pointing to the xdaq 12 repo, so after the purge, I did the following to re-install xdaq

% sudo yum groupinstall extern_coretools
% sudo yum groupinstall coretools
% sudo yum groupinstall extern_powerpack
% sudo yum groupinstall powerpack
% sudo yum groupinstall database_worksuite
% sudo yum groupinstall general_worksuite
% sudo yum groupinstall hardware_worksuite

After this, the xpci driver no longer worked. I installed all daq-ferol* and daq-xpci* packages from the repo to no effect.

This is because the package that is installed is for another kernel and we now (unlike before) have to compile the driver from source (https://twiki.cern.ch/twiki/bin/viewauth/CMS/CMSFedkitManual)

At this point the fedKit.py script will run and write to disk (most of the time) and for only certain sizes of events.

Here is a working AMC13 sequence:

Using AMC13 software ver:41528
>rg
General reset
>rc
Counter reset
>en 1-12 t f
parsed list "1-12" as mask 0xfff
Enabling TTS as TTC for loop-back
Enabling fake data
AMC13 out of run mode
AMC13 is back in run mode and ready
>wv CONF.AMC.FAKE_DATA_SIZE 0x10
Write to register CONF.AMC.FAKE_DATA_SIZE
>daq 1
SFP0 enabled
Best to do a DAQ reset (rd) after changing link settings
>fed 0 100
>lt c
>wv CONF.AMC.FAKE_DATA_SIZE 0x20
Write to register CONF.AMC.FAKE_DATA_SIZE

Increasing the event size worked until 0x100, where the system stopped recording data.

2016-01-21, dgastler

Trying to find the size where the events break.

First, run the fedKit.py with no writing to disk, but set it to dump the next several events "e100".

Then run this script on the connected AMC13

rg
rc
en 1-12 f t
localL1A o 1 1
daq 1
fed 0 100

#size if bytes: ((( CONF.AMC.FAKE_DATA_SIZE + 4)*12) + (2+1+1+12))*8
#2048
wv CONF.AMC.FAKE_DATA_SIZE 0x10
lt 1

#25088
wv CONF.AMC.FAKE_DATA_SIZE 0x100
lt 1
#26624
wv CONF.AMC.FAKE_DATA_SIZE 0x110
lt 1
#28160
wv CONF.AMC.FAKE_DATA_SIZE 0x120
lt 1
#29696
wv CONF.AMC.FAKE_DATA_SIZE 0x130
lt 1
#31232
wv CONF.AMC.FAKE_DATA_SIZE 0x140
lt 1
#32768
wv CONF.AMC.FAKE_DATA_SIZE 0x150
lt 1
#34304
wv CONF.AMC.FAKE_DATA_SIZE 0x160
lt 1
#35840
wv CONF.AMC.FAKE_DATA_SIZE 0x170
lt 1
#37376
wv CONF.AMC.FAKE_DATA_SIZE 0x180
lt 1

This will generate increasing events until the "copy worker" dies with the error here:

21 Jan 2016 17:48:11.625 [139692899636992] FATAL edu.bu.cms1.p: 33001.pt::frl::Application.instance(1) <> - Caught exception: pt::frl::exception::Exception 'Overflow of i2o: 32696' raised at processCopy(/usr/local/src/xdaq/baseline12/trunk/daq/pt/frl/src/common/CopyWork\
er.cc:301)
21 Jan 2016 17:48:11.626 [139693435836160] ERROR edu.bu.cms1.p: 33001.executive::Application.lid(0)  - Unhandled exception occured
Caught exception: pt::frl::exception::Exception 'failed to process copy for stream on index: 0 for copy worker id:0 - current frl event queue elements: 3 last processed input event dump: [[00000000]   F8 01 F9 01 FA 01 FB 01   FC 01 FD 01 FE 01 FF 01   ........ ........
[00000010]   00 02 01 02 02 02 03 02   04 02 05 02 06 02 07 02   ........ ........
[00000020]   08 02 09 02 0A 02 0B 02   0C 02 0D 02 0E 02 0F 02   ........ ........
[00000030]   10 02 11 02 12 02 13 02   14 02 15 02 16 02 17 02   ........ ........
[00000040]   18 02 19 02 1A 02 1B 02   1C 02 1D 02 1E 02 1F 02   ........ ........
[00000050]   20 02 21 02 22 02 23 02   24 02 25 02 26 02 27 02   ........ ........
[00000060]   28 02 29 02 2A 02 2B 02   2C 02 2D 02 2E 02 2F 02   ........ ........
[00000070]   30 02 31 02 32 02 33 02   34 02 35 02 36 02 37 02   0.1.2.3. 4.5.6.7.
[00000080]   38 02 39 02 3A 02 3B 02   3C 02 3D 02 3E 02 3F 02   8.9..... ........
[00000090]   40 02 41 02 42 02 43 02   44 02 45 02 46 02 47 02   ..A.B.C. D.E.F.G.
[000000a0]   48 02 49 02 4A 02 4B 02   4C 02 4D 02 4E 02 4F 02   H.I.J.K. L.M.N.O.
[000000b0]   50 02 51 02 52 02 53 02   54 02 55 02 56 02 57 02   P.Q.R.S. T.U.V.W.
[000000c0]   58 02 59 02 5A 02 5B 02   5C 02 5D 02 5E 02 5F 02   X.Y.Z... ........
[000000d0]   60 02 61 02 62 02 63 02   64 02 65 02 66 02 67 02   ..a.b.c. d.e.f.g.
[000000e0]   68 02 69 02 6A 02 6B 02   6C 02 6D 02 6E 02 6F 02   h.i.j.k. l.m.n.o.
[000000f0]   70 02 71 02 72 02 73 02   74 02 75 02 76 02 77 02   p.q.r.s. t.u.v.w.
]' raised at process(/usr/local/src/xdaq/baseline12/trunk/daq/pt/frl/src/common/CopyWorker.cc:583);
        originated by pt::frl::exception::Exception 'Overflow of i2o: 32696' raised at processCopy(/usr/local/src/xdaq/baseline12/trunk/daq/pt/frl/src/common/CopyWorker.cc:301)

The last working event was event 6 which was 31232 bytes long. (0x140 FAKE_DATA_SIZE) The next event crashes the

2016-01-21, dgastler

Trying to upgrade the firmware on the FEROL to p1_ferol_feb00027_1444987060.svf from feb 00020 Following https://twiki.cern.ch/twiki/bin/viewauth/CMS/UFEDKIT#Firmware_update

When using the fetKit.py write to disk option, it doesn't always start, and when it does, it won't write to disk and the "BU" xdaq icon has the error:

Caught exception: exception::DiskWriting 'Failed to get value from EVM at http://cms1.bu.edu: 33001/urn:xdaq-application:lid=12/eventCountForLumiSection?ls=3: Couldn't resolve host name' raised at getValueFromEVM(/usr/local/src/xdaq/baseline12/trunk/daq/evb/src/common/bu/RUproxy.cc:539)
The log file has:
21 Jan 2016 15:35:54.755 [139942756808448] FATAL edu.bu.cms1.p: 33001.evb::BU.instance(0) <> - Failed: Caught exception: exception::DiskWriting 'Failed to get value from EVM at http://cms1.bu.edu: 33001/urn:xdaq-application:lid=12/eventCountForLumiSection?ls=3: Couldn't resolve host name' raised at getValueFromEVM(/usr/local/src/xdaq/baseline12/trunk/daq/evb/src/common/bu/RUproxy.cc:539)
21 Jan 2016 15:35:54.755 [139942756808448] WARN  edu.bu.cms1.p: 33001.evb::BU.instance(0).RcmsStateNotifier <> - Unable to notify state change. RCMS state listener has not been found. Has findRcmsStateListener() been called at least once? State:Failed
21 Jan 2016 15:35:54.760 [139943433701120] ERROR edu.bu.cms1.p: 33001.executive::Application.lid(0)  - Unhandled exception occured
Caught exception: exception::DiskWriting 'Failed to get value from EVM at http://cms1.bu.edu: 33001/urn:xdaq-application:lid=12/eventCountForLumiSection?ls=3: Couldn't resolve host name' raised at getValueFromEVM(/usr/local/src/xdaq/baseline12/trunk/daq/evb/src/common/bu/RUproxy.cc:539)

2016-01-19, dgastler

  • With single events I've been ramping up the FAKE_DATA_SIZE register to see when xdaq breaks. With fake amcs 1-12 enabled,
>lt 1
detected number after 'lt'
Sending 1 local triggers
Trigger: 0 left
>wv CONF.AMC.FAKE_DATA_SIZE 0x20
Write to register CONF.AMC.FAKE_DATA_SIZE
>lt 1
detected number after 'lt'
Sending 1 local triggers
Trigger: 0 left
>wv CONF.AMC.FAKE_DATA_SIZE 0x40
Write to register CONF.AMC.FAKE_DATA_SIZE
>lt 1
detected number after 'lt'
Sending 1 local triggers
Trigger: 0 left
>wv CONF.AMC.FAKE_DATA_SIZE 0x80
Write to register CONF.AMC.FAKE_DATA_SIZE
>lt 1
detected number after 'lt'
Sending 1 local triggers
Trigger: 0 left
>wv CONF.AMC.FAKE_DATA_SIZE 0x100
Write to register CONF.AMC.FAKE_DATA_SIZE
>lt 1
detected number after 'lt'
Sending 1 local triggers
Trigger: 0 left
>wv CONF.AMC.FAKE_DATA_SIZE 0x200
Write to register CONF.AMC.FAKE_DATA_SIZE
>lt 1
detected number after 'lt'
Sending 1 local triggers
Trigger: 0 left

This broke on the last one causing:

19 Jan 2016 18:00:32.963 [140717694822144] ERROR edu.bu.cms1.p: 33001.executive::Application.lid(0)  - Unhandled exception occured
Caught exception: pt::frl::exception::Exception 'failed to process copy for stream on index: 0 for copy worker id:0 - current frl event queue elements: 2 last processed input event dump: [[00000000]   47 5A 00 00 80 00 01 FE   00 00 00 64 00 0\
0 00 06   GZ...... ...d....
[00000010]   08 64 40 1F 06 00 00 51   10 12 F9 01 40 83 C1 10   .d.....Q ........
[00000020]   00 00 01 00 03 02 00 0F   00 00 02 00 03 02 00 0F   ........ ........
[00000030]   00 00 03 00 03 02 00 0F   00 00 04 00 03 02 00 0F   ........ ........
[00000040]   00 00 05 00 03 02 00 0F   00 00 06 00 03 02 00 0F   ........ ........
[00000050]   00 00 07 00 03 02 00 0F   00 00 08 00 03 02 00 0F   ........ ........
[00000060]   00 00 09 00 03 02 00 0F   00 00 0A 00 03 02 00 0F   ........ ........
[00000070]   00 00 0B 00 03 02 00 0F   00 00 0C 00 03 02 00 0F   ........ ........
[00000080]   03 02 40 1F 06 00 00 01   00 00 21 91 06 00 07 00   ........ ........
[00000090]   08 00 09 00 0A 00 0B 00   0C 00 0D 00 0E 00 0F 00   ........ ........
[000000a0]   10 00 11 00 12 00 13 00   14 00 15 00 16 00 17 00   ........ ........
[000000b0]   18 00 19 00 1A 00 1B 00   1C 00 1D 00 1E 00 1F 00   ........ ........
[000000c0]   20 00 21 00 22 00 23 00   24 00 25 00 26 00 27 00   ........ ........
[000000d0]   28 00 29 00 2A 00 2B 00   2C 00 2D 00 2E 00 2F 00   ........ ........
[000000e0]   30 00 31 00 32 00 33 00   34 00 35 00 36 00 37 00   0.1.2.3. 4.5.6.7.
[000000f0]   38 00 39 00 3A 00 3B 00   3C 00 3D 00 3E 00 3F 00   8.9..... ........
]' raised at process(/usr/local/src/xdaq/baseline12/trunk/daq/pt/frl/src/common/CopyWorker.cc:583);
        originated by pt::frl::exception::Exception 'Overflow of i2o: 32696' raised at processCopy(/usr/local/src/xdaq/baseline12/trunk/daq/pt/frl/src/common/CopyWorker.cc:301)
19 Jan 2016 18:08:32.848 [140716833482496] WARN  edu.bu.cms1.p: 33001.pt::frl::Application.instance(1) <> - Caught exception: tcpla::exception::FailedToReceive 'Failed to receive, connection event reset in PublicServicePoint 10.0.0.5:34000 : U\
UID = '74997085-96d3-4e4b-af75-f3a72507ca06'' raised at processSocketEntries(/cmsnfshome0/nfshome0/dsimelev/baseline12/trunk/daq/tcpla/src/common/PublicServicePoint.cc:920);
        originated by tcpla::exception::ConnectResetByPeer 'Connection reset by peer in PublicServicePoint 10.0.0.5:34000 on file descriptor 9 socket error: Connection reset by peer' raised at receive(/cmsnfshome0/nfshome0/dsimelev/baseline12/\
trunk/daq/tcpla/src/common/PublicServicePoint.cc:560)

  • Tried with 6 fake AMCs which worked at 10hz, upping to 100hz crashed xdaq "copy worker"

Copy worker failed with

19 Jan 2016 16:15:37.684 [140287254984448] FATAL edu.bu.cms1.p: 33001.pt::frl::Application.instance(1) <> - Caught exception: pt::frl::exception::Exception 'Overflow of i2o: 32696' raised at processCopy(/usr/local/src/xdaq/baseline12/trunk/daq/pt/frl/src/common/CopyWork\
er.cc:301)
19 Jan 2016 16:15:37.685 [140287800329984] ERROR edu.bu.cms1.p: 33001.executive::Application.lid(0)  - Unhandled exception occured
Caught exception: pt::frl::exception::Exception 'failed to process copy for stream on index: 0 for copy worker id:0 - current frl event queue elements: 1 last processed input event dump: [[00000000]   47 5A 00 00 80 00 01 FE   00 00 00 64 00 00 00 01   GZ...... ...d....
[00000010]   08 64 50 56 01 00 00 51   00 ED 57 22 C0 81 61 10   .dPV...Q ..W...a.
[00000020]   00 00 01 00 03 04 00 0F   00 00 02 00 03 04 00 0F   ........ ........
[00000030]   00 00 03 00 03 04 00 0F   00 00 04 00 03 04 00 0F   ........ ........
[00000040]   00 00 05 00 03 04 00 0F   00 00 06 00 03 04 00 0F   ........ ........
[00000050]   03 04 50 56 01 00 00 01   00 00 D0 7E 06 00 07 00   ..PV.... ........
[00000060]   08 00 09 00 0A 00 0B 00   0C 00 0D 00 0E 00 0F 00   ........ ........
[00000070]   10 00 11 00 12 00 13 00   14 00 15 00 16 00 17 00   ........ ........
[00000080]   18 00 19 00 1A 00 1B 00   1C 00 1D 00 1E 00 1F 00   ........ ........
[00000090]   20 00 21 00 22 00 23 00   24 00 25 00 26 00 27 00   ........ ........
[000000a0]   28 00 29 00 2A 00 2B 00   2C 00 2D 00 2E 00 2F 00   ........ ........
[000000b0]   30 00 31 00 32 00 33 00   34 00 35 00 36 00 37 00   0.1.2.3. 4.5.6.7.
[000000c0]   38 00 39 00 3A 00 3B 00   3C 00 3D 00 3E 00 3F 00   8.9..... ........
[000000d0]   40 00 41 00 42 00 43 00   44 00 45 00 46 00 47 00   ..A.B.C. D.E.F.G.
[000000e0]   48 00 49 00 4A 00 4B 00   4C 00 4D 00 4E 00 4F 00   H.I.J.K. L.M.N.O.
[000000f0]   50 00 51 00 52 00 53 00   54 00 55 00 56 00 57 00   P.Q.R.S. T.U.V.W.
]' raised at process(/usr/local/src/xdaq/baseline12/trunk/daq/pt/frl/src/common/CopyWorker.cc:583);
        originated by pt::frl::exception::Exception 'Overflow of i2o: 32696' raised at processCopy(/usr/local/src/xdaq/baseline12/trunk/daq/pt/frl/src/common/CopyWorker.cc:301)

  • Upping to 2 fake AMCs went up to 4khz ok, filled buffer but did not crash above that.

  • Upping the rate from 10hz to 10kHz was fine in this configuration, upping to 20khz overloaded but did not crash the system.

We have a flow that has worked now, will test how reliable it is. THere is no data being saved to disk. Here is the setup of that run:

On the AMC13 (after a reconfigureFPGAs):

>rg
General reset
>rc
Counter reset
>en 1 f t
parsed list "1" as mask 0x1
Enabling fake data
Enabling TTS as TTC for loop-back
AMC13 out of run mode
AMC13 is back in run mode and ready
>daq 1
SFP0 enabled
Best to do a DAQ reset (rd) after changing link settings
>fed 0 100
>localL1A r 10 10
Configure LocalL1A enabled mode=2 burst=10 rate=10 rules=0
WARNING: AMC13 random trigger mode, setting burst to 1 L1A per burst

FedKit:

[dan@cms1 FEROL]$ /opt/xdaq/bin/fedKit.py 
Welcome to the optical FEDkit
=============================
Please select the data source to be used:
  1 - Real AMC13 data source        (L6G_SOURCE)
  2 - Generator core of the AMC13   (L6G_CORE_GENERATOR_SOURCE)
  3 - Loopback at the FEROL         (L6G_LOOPBACK_GENERATOR_SOURCE)
  4 - SLINK data source             (SLINK_SOURCE)
=> [1] 
Please enter the FED id you want to read out [100]: 
Do you want to write the data to disk? [No]:
Starting run...
Running. Point your browser to http://cms1.bu.edu:33001

m   display this Menu
f#  dump the next # FED fragments incl DAQ headers (default 1)
e#  dump the next # Events with FED data only (default 1)
q   stop the run and Quit
            
=> 

Back on the AMC13:

>lt c
Enable continuous local triggers

Screenshot_-_01192016_-_103020_AM.png Screenshot_-_01192016_-_103040_AM.png

Fed dump file: http://bucms.bu.edu/twiki/pub/Main/AMC13DebugLogdump_run000052_event00005873_fed0100.txt

Event dump file: http://bucms.bu.edu/twiki/pub/Main/AMC13DebugLogdump_run000052_event00006104.txt

2015-12-11, dgastler

Using an AMC13 + FEROL + 10Gb nic on cms1 I was able to get the save to disk mode to start by using the following startup procedure.

On the amc13:

reconfigureFPGAs
wait...
rg
rc
en 1-12 f t
daq 1
fed 0 100
localL1A o 1 1 
FEDKit:
fedKit.py 



Welcome to the optical FEDkit
=============================
Please select the data source to be used:
  1 - Real AMC13 data source        (L6G_SOURCE)
  2 - Generator core of the AMC13   (L6G_CORE_GENERATOR_SOURCE)
  3 - Loopback at the FEROL         (L6G_LOOPBACK_GENERATOR_SOURCE)
  4 - SLINK data source             (SLINK_SOURCE)
=> [1] 
Please enter the FED id you want to read out [100]: 
Do you want to write the data to disk? [Yes]:
Please enter the directory where the data shall be written [/tmp]: 
Starting run...
Running. Point your browser to http://cms1.bu.edu:33001

m   display this Menu
f#  dump the next # FED fragments incl DAQ headers (default 1)
e#  dump the next # Events with FED data only (default 1)
q   stop the run and Quit

AMC13:

  lt 5

This still doesn't write to disk, but it does get to the user menu for the fedKit.py script

2015-12-11, dgastler

Using an AMC13 + FEROL + 10Gb nic on cms1

Hardware: Connect the AMC13 daq link 1 to the FEROL SFP descibed on the following pages.

http://cms-frl.home.cern.ch/cms-frl/ferol/ferol.html

https://twiki.cern.ch/twiki/bin/viewauth/CMS/UFEDKIT

This needs to use the "insert SFP name" SFP for the 5gbit link

Also connect the FEROL's 10Gb link to the 10Gb nic in cms1 using the "find name" SFP.

Software: On CMS1 open two terminals so you can run the fedKit.py and AMC13Tool2.exe.

Setup the AMC13 for fake data taking:

>rg
General reset
>rc
Counter reset
daq 1
SFP0 enabled
Best to do a DAQ reset (rd) after changing link settings
>rd
DAQ reset
>en 1-12 t f
parsed list "1-12" as mask 0xfff
Enabling TTS as TTC for loop-back
Enabling fake data
AMC13 out of run mode
AMC13 is back in run mode and ready
>localL1A o 1 1
Configure LocalL1A enabled mode=0 burst=1 rate=1 rules=0

Setup the FEROL in the daq account as follows:

[daq@cms1 FEROL]$ fedKit.py 
Welcome to the optical FEDkit
=============================
Please select the data source to be used:
  1 - Real AMC13 data source        (L6G_SOURCE)
  2 - Generator core of the AMC13   (L6G_CORE_GENERATOR_SOURCE)
  3 - Loopback at the FEROL         (L6G_LOOPBACK_GENERATOR_SOURCE)
  4 - SLINK data source             (SLINK_SOURCE)
=> [1] 
Please enter the FED id you want to read out [1234]: 
Do you want to write the data to disk? [No]:
Starting run...
Running. Point your browser to http://cms1.bu.edu:33001

m   display this Menu
f#  dump the next # FED fragments incl DAQ headers (default 1)
e#  dump the next # Events with FED data only (default 1)
q   stop the run and Quit
            
=> 

At this point open cms1.bu.edu:33000 in a web browser and go to the FEROL's monitoring page. Now sending triggers in AMC13Tool2.exe should cause the event counter to increment on this page.

2015-11-10, hazen

(At CERN) Successful test of two 5Gb links with different FED numbers. Partial success testing 10Gb links. For details:

https://twiki.cern.ch/twiki/bin/view/Main/AMC13DAQTestingLog

2015-11-02, Hazen

Testing firmware 0x4037. Six AMC13 (S/N 284, 303, 306, 259, 308, 309) in AMC1-AMC6. All set for random size from 0x400-0x1000. Using TTTT with random rate at 0x800, which gives a rate of ~ 24kHz with lots of time in OFW.

See a significant number of "AMC13_BAD_LENGTH" and "BP_ERR" in AMC1 (S/N 284).

With random set to 0x8000 on TTTT (2550 Hz) and 12xAMC with fixed size 0x400 all is good. With size random from 0x400-0x3000 all good, rate 0x1000 (9.6kHz).

Set to 0x2000 with random 0x400-0x3000 rate is 9.2kHz with a few OFW counts.

2015-10-97, Hazen

Finding apparent bugs in AMC13 monitor buffer. Here are some notes for someone to pick up the trail. First, read Monitor Buffer documentation. Then, review the data format documentation.

One can set the size of fake data by writing to CONF.AMC.FAKE_DATA_SIZE. By default this is set to 0x400 which results in unsegmented (single-block) events. If you set this larger than the somewhat arbitrary value of 0x13fb (about 40k bytes) then you get a segmented event. The table below illustrates this:

FAKE_DATA_SIZE Blocks Block 0 size Block 1 size Block 2 size
<= 0x13fb 1 0x13fe - -
0x13fc 2 0x1000 0x3ff -
0x23fb 2 0x1000 0x13fe -
0x23fc 3 0x1000 0x1000 0x3ff
0x3000 3 0x1000 0x1000 0x1003

It's a bit confusing, but essentially you can generate events with different number of blocks by setting FAKE_DATA_SIZE. The bugs start when you have e.g. 2 free buffers and try to store an event which requires 3. Here is the command sequence to trigger this apparent bug:

  > wv CONF.AMC.FAKE_DATA_SIZE 0x100       # small size for single blocks
  > lt 0x3fe                               # generate 1k - 2 triggers
  > st 2 Monitor_Buffer                    # two empty buffers (0x400 - 2)
   Monitor_Buffer| COUNT| VALUE|
               --|------|------|
        AVAILABLE|      |     1|
            EMPTY|      |     0|
       UNREAD_EVT| 0x3FE|      |
  >wv CONF.AMC.FAKE_DATA_SIZE 0x3000
  >lt
  >st 2 Monitor_Buffer
   Monitor_Buffer| COUNT| VALUE|
               --|------|------|
        AVAILABLE|      |     0|           # if AVAILABLE=0 FULL should be 1
            EMPTY|      |     0|
       UNREAD_EVT| 0x3FE|      |
  >re                                      # read one event, freeing one buffer
  >st 2 Monitor_Buffer
   Monitor_Buffer| COUNT| VALUE|
               --|------|------|
        AVAILABLE|      |     0|
            EMPTY|      |     0|
             FULL|      |     1|           # why is it full now?
       UNREAD_EVT| 0x400|      |           # why did this go up?

2015-09-29, Gastler

New NIC installed as eth6 on CMS1. It should come up as 10.0.0.5. Running fedKit.py with option 3 (loopback at ferol) seemed to work, but I haven't gotten any data out using the f and e commands.

2015-09-24, hazen/gastler

FEROL connected in mini-crate to CMS4. Shows up in lspci as

  06:04.0 Memory controller: Device ecd6:fea0 (rev 01)
But, cms4 has FNAL Scientific Linux and thus no xDAQ frown Moving the minicrate/card to cms1 even though the 10GbE NIC doesn't work there. (The 10GbE NIC is in l3ibm and it does work there, but this is also FNAL Linux).

Hook the fibers in "loop-back" on the FEROL (referring to the Manual)

2015-09-01, hazen

Trying to get HCAL xDAQ running. First challenge is to get a source tree which will build. First, manage to finally get SVN to stop demanding my password every time, using a variation of the instructions here:

http://information-technology.web.cern.ch/book/how-start-working-svn/accessing-svn-repository#accessing-sshlinux

Not certain what I did differently, but had to explicity concatenate my id_rsa.pub to ~/.ssh/authorized_keys. Now it completes with no password prompts!

Here's the install/build sequence:

$ source /home/daqowner/dist/etc/env.sh
$ perl installDAQ_12_4_9.perl --mode=teststand \
 --ownsource=${HOME}/src/12_4_9 \
 --svnuser=ehazen \
 --packages=all
$ cd src/12_4_9/hcal
$ #---- EDIT MAKEFILE to add hcalUTCA to SUBPACKAGES ----
$ make
$ make install

Note have to edit top level Makefile. This should go in SVN.

Using files from here: http://ohm.bu.edu/~hazen/CMS/TestStand/xDAQ_2015-08-31/

Note that the hcalUTCA module statement in the XML must appear before the hcalTrig as hcalTrig depends on hcalUTCA and if it comes first it will look in LD_LIBRARY_PATH or somewhere evil and get the wrong so file.

Also found that David's code hack in DTCManager to set the calibration trigger orbit gap has no error check whatsoever and will fail badly for most values of the orbit gap settings. For now edit DTCManager.cc to:

    // Enable calibration events in orbit gap if config doc says so                                                                                                              
    if (m_calibEnable){
      m_dtc->write(amc13::AMC13Simple::T1,"CONF.CAL_ENABLE",1);
      try {
        // Set calibration orbit gap window // substract 3456 since only writing to editable part                                                                                
        m_dtc->write(amc13::AMC13Simple::T1,"CONF.CAL_WINDOW_LOWER_PROG", m_calibLower-3456);
        m_dtc->write(amc13::AMC13Simple::T1,"CONF.CAL_WINDOW_UPPER_PROG", m_calibUpper-3456);
      } catch( std::exception& e) {
        LOG4CPLUS_ERROR(getApplicationLogger(), ::toolbox::toString("DTCManager::init() setting orbit gap to %d-%d", m_calibLower, m_calibUpper));
      }
    }
    else m_dtc->write(amc13::AMC13Simple::T1,"CONF.CAL_ENABLE",0);

2015-06-30, dzou

While installing LabTools (impact, chipscope, etc.). There was a problem getting teh Xilinx USB Cable to work. Tried typical troubleshoot suggested in Xilinx forums including manual install scripts. But the Cable was not working (power light should turn on when plugged in while drivers installed correctly and permission set correctly).

The eventual problem ended up being this (courtesy of Jeroen Hegeman):

The problem was as usual in a detail: in the USB settings file deployed by Xilinx here

/etc/udev/rules.d/xusbdfwu.rules

an environment variable name was spelled in capitals instead of lower-case: $TEMPFILE instead of $tmpfile. Took me a while to find that again.

2015-04-08, hazen

Installed Magnum mini-crate with FEROL and 10GbE NIC (I think it's an Emulex OCe10102-N).

Downloaded OneConnect-Flash-10.0.803.37.iso from emulex.com. Make USB stick, but MoBo can't boot from it frown so burn to CD-ROM. But when you boot from the CD-ROM it can't find the utilities it needs!

Here is what I managed:

Copy ISO to both CDROM and USB stick
insert CDROM only and boot from it (F12 for boot menu)
wait for it to time out, failing to find CDROM
insert usbstick

  # mount /dev/sdb /usbdrive
  # cd usbdrive/UFI
  # flash -x -f 10.0.803.31.ufi

reboot

Now the fs#@ing computer hangs on some Emulex firmware handshake and won't boot!

2015-03-17, hazen

Attempting to get FEROL (uFEDKIT) working. Dan has installed a 10GbE NIC in CMS1 and configured it at 10.0.0.5. It responds to pings.

Now using S/N 98. Update firmware to latest 0x401d/0x0027.

Tried running the /opt/xdaq/bin/fedKit.py script but it fails.

2015-03-09, hazen

Testing firmware 0x222, 0x27 with software rel 34315 on CMS4. Seeing uHAL timeouts while reading data using 'df' command. Example:

  $ AMC13Tool2.exe -c 192.168.1.82
  $ Address table path "/home/hazen/work/amc13/amc13/etc/amc13" set from AMC13_ADDRESS_TABLE_PATH
  use_ch false
  Created URI from IP address:
    T2: ipbusudp-2.0://192.168.1.82:50001
    T1: ipbusudp-2.0://192.168.1.83:50001
  Using AMC13 software ver:34315
  > en 1-12 f t
  > localL1A o 1 10
  > lt c
  > lt d
  > df test.dat 9999
    ...
   Read 12340 words
   Wrote 12342 words to test.dat
   calling readEvent (53)...

  Caught microHAL exception.
  	Timeout (1000 milliseconds) occurred for UDP receive from target with URI: ipbusudp-2.0://192.168.1.83:50001

This problem does not occur when connecting from cms1.bu.edu. Tried both build 32534 (my current) and Dan's test build with uHAL 2.4. Back to cms4, update to 34988 (trunk) and re-build. No improvement.

Seems like some sort of network problem.

Use controlHub, all is well.

2015-02-09, hazen

TTC firmware superficially working. Lots of clean-up on address table to do. Need a "repeat" feature which is now documented here: AMC13AddressTable (new P_Repeat and P_Offset items). Temporarily implemented using a perl script csv_expand_repeat.pl; should eventually build into the C++.

Today: program new 0x4015 firmware which should fix some DAQLDC register readout bug. Test T2 TTC capture.

2015-02-03, hazen

Testing new spartan TTC firmware 0x26. Move S/N 39 to MCH2 site in Crate 2. Reprogram to 0x21d/0x26.

2015-01-22, dzou

Strange behavior of SN50 board.

  • NVMem IP address saved as 192.168.1.65 on T2
  • When trying to changed with storeConfig gets error: ' Given data "256" is invalid. '
  • Created a special connection file with T2 and T1 ip addresses specified.
  • When checking 'fv' gives: *0: SN: 95 T1v: 021c T2v: 0029 cf: /home/dzou/amc13/amc13/etc/amc13/connectionSN50
    • reports strange SN and T2 has some sort special T2 version.
  • solder blobs correspond to SN50
  • MMC version out of date (current is 2.2 but it has 2.1)
  • Updating MMC to check if problem is fixed...
    • MMC updated to 2.2, eeprom erased
    • T2 NV memory now no longer holds an IP address so uses default (192.168.1.154)
    • Still cannot write to T2 NV memory (' Given data "256" is invalid. ' error)

This sounds like a pure software problem (esh)

2015-01-13, hazen

Looking at S/N 75/49 returned from Cornell with an alleged TTC data problem. Install in MCH2 site in crate 2.

2015-01-06, hazen

Investigating S/N above 128. Take board#99, change to 0xe3 (227). Won't power up properly in crate 2. Move to crate 1 slot 2. Looks good according to scanCrate.pl:

2: MMC: 2.2 IP: 192.168.2.56 192.168.2.57 vv: 0x1002 sv: 0x0021 sn: 227

So IP last octet is 255-2*(sn & 127) and 254-2*(sn & 127). This is OK except that we should probably avoid IP address 128 as it would result in last octet = 255/254.

MAC addresses set as follows

Nmap scan report for rfc1918.address.not.used.bu.edu (192.168.2.56)
Host is up (0.000064s latency).
MAC Address: 08:00:30:F3:01:9C (Network Research)
Nmap scan report for rfc1918.address.not.used.bu.edu (192.168.2.57)
Host is up (0.000074s latency).
MAC Address: 08:00:30:F3:00:5C (Network Research)

The documented scheme is:

  bits 0-5     complement of serial number bits 0-5
  bit 6        =0 for T2 board (lower IP)
               =1 for T1 board (upper IP)
  bit 7, 8     serial number bits 6, 7 (not complemented for S/N < 64 only)

I'm confused.

Update T1 FW per Wu's suggestion (to 0x21b) and now it looks better:

Nmap scan report for rfc1918.address.not.used.bu.edu (192.168.2.56)
Host is up (0.000087s latency).
MAC Address: 08:00:30:F3:01:9C (Network Research)
Nmap scan report for rfc1918.address.not.used.bu.edu (192.168.2.57)
Host is up (0.000093s latency).
MAC Address: 08:00:30:F3:01:DC (Network Research)

2014-11-13, hazen

Possible FEROL installation. What about this adapter?

http://www.startech.com/Cards-Adapters/Slot-Extension/PCI-Express-to-PCI-Adapter-Card~PEX1PCI1

2014-11-07, hazen

Recompiling AMC13 T1 Firmware

In Vivado v2014.3 (64-bit) on Ubuntu 14.04:

Download file: http://physics.bu.edu/~wusx/download/AMC13/AMC13XG_HCALv0x400a.xpr.zip

Find K7version constant and change to 0x4aaa. Unzip, open and recompile with default settings. Find .bit file and copy to /tmp. Execute TCL command:

write_cfgmem -loadbit "up 0x0 /tmp/AMC13_T1.bit" -format MCS -size 128 /tmp/AMC13_T1_test.mcs

(guessed at some arguments). Rename, program into flash. It works!

http://ohm.bu.edu/~hazen/CMS/AMC13/AMC13_T1_v0x4aaa_utilization.txt

http://ohm.bu.edu/~hazen/CMS/AMC13/AMC13XG_T1_v0x800a_utilization.txt

2014-09-02, dzou

Suggestion for troubleshooting AMC13 with uHTR initialization:

  1. Reload AMC13 firmware
  2. Disable uHTR DAQ Path
  3. Power cycle modules
-- OR --
  1. Reload firmware for module using software (in principle this should have the effect as powercycle)
  2. Rerun initialization

2014-08-21, Eric

To run csv_to_xml.pl script in ...amc13/dev_tools/cfg/addrTableTools, need to install perl module Tree::Simple. Do this with:

   $ sudo yum install perl-Tree-Simple

2014-08-19, Dan, Eric

Started to get xDAQ working. Jeremy now provides a perl script standalone_setup.pl in ...hcalUpgrade/test in the HCAL xDAQ release. It creates standalone-daq.xml which is then used by run-standalone.sh to start xDAQ. How to answer the questions from standalone_setup.pl

Crate Number of the System used to find names of boards in the connection file
Location of the XML connections file File name with path if needed to connection file
Orbit delay for data alignment of links Who knows, enter a small integer
Set of slots to use, separated by commas 1-based list of slots
Pipeline length for pulling out the data from the UHTR Who knows, enter a small integer

Sample connection file: sample_xdaq_connections.xml. Set for crate number 1.

Must edit standalone-daq.xml because of a bug somewhere... in the UHTR { ... } section find the line "if ( SLOT in..." and change to:

  if ( SLOT == 2 ) {

If using local triggers also add to DTC{...} section:

  localTtcSignal=true
  internalPeriodicTriggerEnable=true

(once only) start control hub:

  sudo /etc/init.d/controlhub start

Then: sh run-standaone.sh. Should see ~30 lines of messages with no errors, ending in "Ready.".

Start web browser on cms1 (e.g. "ssh -X cms1; firefox --no-remote").

Go to "localhost:40010" (xDAQ main page). Navigate to:

* hcalSupervisor

* Control Panel

* Setup (again, ~30 lines of messages ending in "New state is:Ready")

* Start

Go back to "localhost:40010" in another tab, open "hcalDTCmanager".

* Peek into...

* Basic View

2014-08-08, dzou

AMC13 old software: Changing all references to 1-based enumeration of AMC slots:
  • Status display:
    • AMC Link Status:
      • Enabled links
      • Locked links
    • Port Status:
      • AMC Link Versions incorrect
      • Unsynced AMC Ports
    • AMC Bc0 Status: Bc0s locked
    • AMC Counters
  • Address Tables and Mr. Wu Spec files?

Places to change code:

  1. singleBit_OnOff is a method that checks bits and populates a vector of strings indicating with AMC ports have their relevant bit set (in 0-based enumation). All Status display items from AMC Link Status through Bc0 Status use this to enumerate. (Actions.cc, line 2625)
    • Up front my guess at the fix:
      1. (ss bit shift by i+1? and the if statement to i<=10) or
      2. (loop i starting at i=1 and make the appropriate shifts throughout loop)
  2. ctr64_status_amc is the method use in AMC Counters. (Actions.cc, line 2657, ena_amcs[i] )
    • ena_amcs is populated by singleBit_OnOff, so if singleBit_OnOff is made to give a vector of 1-based strings, then it may not be necessary to change ctr64_status_amc at all.

2014-07-18, dzou

Multiple users have experienced problems with receiving/sending Bc0s from their AMC13. Unfortunately, we do not have the same hardware and we are experiencing difficulties providing support, as it is hard to determine if the issue is in fact due to the AMC13.

Below is a place to document tests using our setup with the AMC13 and uHTR to determine setups work/don't work in hopes of illuminating possible problems that may be occurring in the AMC13 side.

All tests using uHTR running most recent firmware (front: 0.C.0, back: 0.E.22) to receive Bc0. AMC13 used is Serial #86 in AMC13 slot of old crate.

Since the uHTR will send a signal back after receiving Bc0s so that the AMC13 can tell if the Bc0s are locked

AMC13 T1 firmware TTC source Bc0s locked? Notes
0x209 TTT yes 1, 2
0x10a TTT yes 1, 2

Note 1: Sending triggers does not successfully lead to building events to monitor buffer, Note 2: Bc0 locked only successfully locked if setup is done after complete power cycle of boards (physically removed from crate)

2014-07-03, hazen

Trying Wu's new 0x1001 version. Not obvious that it works. Meanwhile, try to reprogram S/N90 T2, but somehow this fails and the module is bricked.

2014-07-02, hazen

Working on flash programming exceptions problem. Add a "readRepeat" command to new AMC13Tool which takes chip / address / count. Occasionally it fails:

>readRepeat 0 0x1080 100000
terminate called after throwing an instance of 'uhal::exception::UdpTimeout'
  what():  Timeout (1000 milliseconds) occurred for UDP receive from target with URI: ipbusudp-2.0://192.168.1.90:50001

Take GbE switch out (connect yellow cable direct from CMS2 to Vadatech MCH). Now flash programming is reliable. Maybe the TTT was causing trouble? Disconnect it.

2014-07-02, hazen

S/N 93 flakes out when installed in crate with many other boards (T1 doesn't configure).

Now we have thirteen AMC13 in the crate at once:

Using AMC13Tool found at /home/hazen/bin/AMC13Tool.exe
 1: MMC: 2.2    IP:      192.168.1.62     192.168.1.63 vv: 0x0109 sv: 0x0021 sn:  96 temp:  732
 2: MMC: 2.2    IP:      192.168.1.70     192.168.1.71 vv: 0x0109 sv: 0x0021 sn:  92 temp:  627
 3: MMC: 2.2    IP:     192.168.1.126    192.168.1.127 vv: 0x0108 sv: 0x0021 sn:  64 temp:  761
 4: MMC: 2.2    IP:      192.168.1.60     192.168.1.61 vv: 0x0109 sv: 0x0021 sn:  97 temp:  708
 5: MMC: 2.2    IP:      192.168.1.54     192.168.1.55 vv: 0x0109 sv: 0x0021 sn: 100 temp:  595
 6: MMC: 2.2    IP:      192.168.1.52     192.168.1.53 vv: 0x0109 sv: 0x0021 sn: 101 temp:  686
 7: MMC: 2.2    IP:      192.168.1.72     192.168.1.73 vv: 0x1000 sv: 0x0020 sn:  91 temp:  386
 8: MMC: 2.2    IP:      192.168.1.78     192.168.1.79 vv: 0x1000 sv: 0x0020 sn:  88 temp:  413
 9: MMC: 2.2    IP:      192.168.1.76     192.168.1.77 vv: 0x1000 sv: 0x0020 sn:  89 temp:  370
10: MMC: 2.2    IP:      192.168.1.74     192.168.1.75 vv: 0x1000 sv: 0x0020 sn:  90 temp:  400
11: MMC: 2.2    IP:     192.168.1.106    192.168.1.107 vv: 0x0108 sv: 0x0021 sn:  74 temp:  706
12: MMC: 2.2    IP:      192.168.1.56     192.168.1.57 vv: 0x0109 sv: 0x0021 sn:  99 temp:  668
13: MMC: 2.2    IP:      192.168.1.90     192.168.1.91 vv: 0x0209 sv: 0x0021 sn:  82 temp:  702

2014-07-1, aguld

Testing boards 51, 52, 68, 72, 78, 79, 83, 94

51: T1 has a short in the power supply. When doing chipscope test with T2, the error counter is changing while it is supposed to remain constant.

52: T1 is working well, T2 was shorted out when the 12V power was turned on.

68: The blue indicator light does not turn off when the handle is released, T2 is good. T2 is currently with S/N 78's T1 and T3. Does not have a T3 with it.

72: T1 is working well. When T2 is used on any working T1, the blue indicator light does not turn off when the handle is pushed in.

78: T1 is good. When T2 is used on any working T1, the blue indicator light shuts off after a few seconds, even when the handle is not pushed in.

79: Board was powered up in test stand with T1 and T2 in the wrong slots. T2 does not work at all, but T1 works fine.

83: T1 works. T2 has a shorted 3.3v power and has been put away

94: T1 has power supply damage and is likely permanently damaged. T2 is probably still usable.

2014-06-30, hazen

Trying to fill the 2nd crate. Install AMC13 in slots 1-8 and 13. Scan crate behaves badly (last couple of boards report 255's for IP addresses). Pull slots 1-4 so only 5-8 and 13 are filled. Slot 8 still has a problem. Only 7-8 and 13 filled, still slot 8 is bad.

Now things are totally flaky. Suspect an MCH or software problem. After a few minutes things are ok again with one module.

Accessing Vadatech MCH: web server at http:192.168.2.2:8080 on CMS2 works. Web interface is horrible but with enough poking one can find and recognize the AMC13 sensors and readings.

2014-06-12, dzou, hazen, aguld

Setting up Vadatech MCH (UTC002)

Experienced initial problems/confusion with communicating with Vadatech MCH due to a combination of a bad Ethernet cable and the IP address of the local machine being set to the same IP address as what the Gigabit Ethernet port on the MCH was set to.

Connecting using serial port using minicom:

  • Was able to use serial port cable to connect using minicom w/ the following settings (taken from Vadatech MCH "Getting Started" guide)
    • 115200 baud, 8 data bits, no parity, and 1 stop bit
    • In /etc/rc.d/rc.conf found information on IP addresses for Ethernet ports (interface0 is the 10/100 Ethernet port, interface 1 is the Gigabit Ethernet port)
    • Gigabit Ethernet ports set to (192.168.1.2)
    • 10/100 Ethernet port set to (192.168.2.2)
  • eth1 of cmssun4 was set to (192.168.1.2), which conflicted with the Gigabit Ethernet port IP
    • change eth1 of cmssun4 to 192.168.1.4

Connecting to Gigabit Ethernet port directly to local computer:

  • Connected MCH Gigabit Ethernet port directly to cmssun4 and was able to ping Gigabit Ethernet (192.168.1.2)
    • Was able to connect to AMC13 that was in Vadatech crate, using VM running SLC6 on cmssun4.
      • AMC13Tool able to connect to AMC13.
      • Python scripts (readIPs, applyConfig, etc) confirmed working
      • Used (192.168.1.2) as MCH host IP for python scripts.
    • Not able to ping 10/100 port, when connected to Gigabit Ethernet port (NOTE: This is likely because the broadcast of cmssun's eth1 is not set to communicate w/ address outside of 192.168.1 while cms2 is set to for all addresses between 192.168.1.-192.168.255.):
ping 192.168.2.2
PING 192.168.2.2 (192.168.2.2) 56(84) bytes of data.
From 128.197.254.113 icmp_seq=1 Time to live exceeded
From 128.197.254.113 icmp_seq=2 Time to live exceeded
From 128.197.254.113 icmp_seq=3 Time to live exceeded

Ethernet cable problems when originally connecting Vadatech MCH to switch:

  • Ethernet cable that was originally being used to connect from Vadatech and the switch was bad, adding to problems/confusion
    • LEDs on switch and lights on Gigabit ethernet port did not light up when cable was connected
  • Switched out Ethernet cable to a known working cable, LEDs on both the switch and the Gigabit Ethernet port turn on.

Currently, both the Vadatech MCH crate and the original crate (using a different brand of MCH) are working and connected to the switch.

  • Both cmssun4 and cms2 are able to ping Gigabit Ethernet port and connect to AMC13 in Vadatech crate.
    • cms2 seems to be able to ping 10/100 port (192.168.2.2), but cmssun4 cannot (NOTE: This is likely because the broadcast of cmssun's eth1 is not set to communicate w/ address outside of 192.168.1 while cms2 is set to for all addresses between 192.168.1.-192.168.255.).
  • Connection only works when connecting using Gigabit Ethernet port.
  • Connecting to 10/100 port does not seem to work (cannot ping any of the Vadatech crate IPs).
  • NOTE: AMC13 slot seems to be finicky, difficult to install AMC13 into slot properly. Although it does seesm to work once you install it properly.

2014-06-11, dzou

uHTR and AMC13 backplane test

Using AMC13 SN 86 in slot AMC13 (Firmware:T1: 0x207, T2: 0x21) Using uHTR SN 7 in slot AMC2 (Firmware: Front: 00.0c.00, Back: 00.0e.10)

uHTR is able to receive clock. But the link does not report a lock, and there are various error counters incrementing (single, multi, BC0/BcN mismatch):

AMC13 status:

*****AMC13 Status*****
Status display detail level: 1
Control 0: 56000009
  DAQLSC Link Down
  Monitor Buffer Empty
Control 1: 02070105
  TTS out is TTC signal out
  Generate Internal L1A
  Run Mode
AMC Link Status: 70000002
  AMC13 Enabled Inputs: 01
  --No AMC links locked--
AMC Port Status: 0fff0000
  --All AMC Link Versions Correct--
  Unsynced AMC Ports: 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11
AMC Bc0 Status: 00000018
  --No BC0s locked--
Local Trigger Control: 00000000
  Periodic L1A every 0x1 orbit at BX = 500
  0x1 trigger per burst
EVB Counters:  (All 32-bit counters read 0x0)
                      Run time [0048]: 0000000b fe3aec10
                    Ready time [004a]: 0000000b fe3be392
                     Busy time [004c]: 00000000 00000001
            L1A ovfl warn time [0050]: 00000000 00000001
AMC Counters:
                                       <---Link 01----->
              Single Bit Error [0042]: 00000001 77701dd0
               Multi Bit Error [0044]: 00000000 a6c5db68
                  BC0 Mismatch [0046]: 00000000 00a543fe
                  BcN Mismatch [0048]: 00000000 00000280
                        Resend [004a]: 00000000 003bf6b2

uHTR status and uHTR DTC status:
 ID[0] IP[192.168.115.8]  Type: uHTR
 ID of uHTR (-1 for exiting the tool) ::  [0]
 uHTR baseboard V1.0   Serial number 65535
 Front Firmware revision : HF-4800 (41) 00.0c.00
 Back Firmware revision : HF-4800 (41) 00.0e.10
  Clock expected at    25.0000 MHz :    25.0000 MHz (front)     25.0000 MHz (back)
  Clock expected at   100.0000 MHz :   100.0000 MHz (front)    100.0000 MHz (back)
  Clock expected at    40.0800 MHz :    40.0004 MHz (front)     40.0004 MHz (back)
  Clock expected at    80.1600 MHz :    80.0008 MHz (front)     80.0008 MHz (back)
  Clock expected at   120.2400 MHz :   120.0012 MHz (front)    120.0012 MHz (back)
  Clock expected at   160.3200 MHz :   160.0016 MHz (front)    160.0016 MHz (back)
  Clock expected at   240.4800 MHz :   240.0024 MHz (front)    240.0024 MHz (back)
  Clock expected at   320.6400 MHz :   320.0031 MHz (front)    320.0032 MHz (back)
  Clock expected at    11.0000 kHz :    11.2230 kHz (front)     11.2240 kHz (back)
  Clock expected at     0.1100 kHz :     0.0000 kHz (front)      0.0000 kHz (back)
  Clock expected at    40.0800 MHz :    40.0004 MHz (front)     40.0004 MHz (back)
  Clock expected at    40.0800 MHz :    40.0004 MHz (front)     40.0004 MHz (back)
  Clock expected at   240.4800 MHz :   240.0024 MHz (front)    320.0032 MHz (back)



   STATUS       Status summary of the uHTR card
   LINK         Status and control of frontend links
   DTC          TTC link and information received from AMC13/DTC
   CLOCK        Clock module work
   SENSOR       I2C sensors and controls
   TRIG         Trigger-path work
   DAQ          DAQ-path work
   TEST         Functionality tests of uHTR Board
   FLASH        Flash programming and readback menu
   LUMI         LUMI-DAQ work
   EXIT         Exit this tool
 > dtc

   STATUS       Status of the DTC/TTC
   ERR_RESET    Reset the error counters
   DELAY        Set the delay of the TTC stream manually
   QUIT         Back to top menu
 > status
 ================================================
                     Front FPGA:   Back FPGA:  
 Event Num        :      5052417      5052417
 Lumi Nibble      :            0            0
 Lumi Section     :            0            0
 CMS Run          :            0            0
 LHC Fill         :            0            0
 RATE_40MHz (MHz) :        40.00        40.00
 RATE_ORBIT (kHz) :        11.22        11.22
 Bunch Count      :         1225         2333
 BC0 Error        :        37034        58266
 Single Error     :        29428        56570
 Double Error     :        36388        36832
 TTC Stream Phase :            0            0
 TTC Stream Phase :       Locked       Locked 

2014-05-28, dzou, aguld

AMC13 boards with issues:
  • SN 78
    • Error when trying to configure Device 1 (T2) in Chipscope
    • When trying to reprogram MMC, Error occurs upon attempting to read Device:
Unable to enter programming mode. The read device ID does not match the slected device or any other supported devices. 
Please verify device selection, interface settings, target power and connections to the target device

Severity:                Error
ComponentID:        20100
StatusCode:          13101
ModuleName:         TCF (TCF command: Device:startSession failed.)

Unexpected JTAG ID 0xFFFFFFFE (expected 0x01edd03f).
  • SN 83
    • While in teststand, after turning on 12V power, meter reads that the board is drawing more than 12V (up to ~15V)
    • 12 V power supply indicator changes from green to orange
  • SN 88
    • Connecting via console fails.
    • Windows detect new hardware when we connect a USB serial port to the console port of AMC13 board, but there is an error
      • "USB Device not recognized"
      • Looking in Device Manager of computer, no USB Serial Port displayed (so cannot connect using putty/minicom)
  • SN 94
    • Failure to connect through JTAG cable using chipscope.
    • No Device detected error message
  • SN 95
    • Was able to program firmware on to board and connect using AMC13Tool
    • However, the board is acting strangely
      • Basic command do not seem to be working
      • T1 firmware reads 0x0 (instead of 0x108, which is what is programmed onto the board)
      • Event builder counters seem to be stuck at non-zero values (general and counter resets do not reset them)
Pick an action (h for menu): st

*****AMC13 Status*****
Status display detail level: 1
Control 0: 00320020
  TTC Not Ready
Control 1: 00000000  (All bits read 0x0)
AMC Link Status: 00ffffff
  AMC13 Enabled Inputs: 00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11
  AMC Input links locked: 00, 01, 02, 03, 04, 05, 06, 07
AMC Port Status: 00000036
  AMC Link Versions incorrect: 01, 02, 04, 05
  --All AMC Ports Synced--
AMC Bc0 Status: 0c62c231
  Ports w/ Bc0 Locked: 01, 05, 06, 10, 11
EVB Counters:
                 SDRAM Page No [000c]:          000c34ff
             Unread SDRAM Evts [000e]:          54834023
               uHTR CRC Errors [000f]:          00340f7f
               (All 64-bit counters read 0x0)

2014-05-24, hazen

Checking out new data format

Using EricReadTest v0.1 (in amc13simple). Enter commands:

amc13> i 3 0x20000         initialize two inputs for fakes length 0x20000
amc13> t                   generate one trigger
amc13> r 0xd               read size
  0000000d:  0000400a      size is 400a 32-bit words (0x2005 64-bit words)
amc13> d 0x20000 0x20      dump header

Here's the header dump. Seems OK, but AMC1 payload header wrong

amc13>rd 0x20000 0x20
000000: 510000011f400008    CDF Header
000001: 1020000000016dc0    block header, nAMC=2
000002: 2e02000300010000    AMC1 info size=0x20003  2=first block  e=EPV
000003: 2e02000300020000    AMC2 info size=0x20003  2=first block  e=EPV

Here is the AMC1 payload

000004: 010000011f420003    AMC1 header length=0x20003 (upper byte incorrect)
000005: 0007000616dc0000    OrN, ID ok, counter start with 0007
000006: 000b000a00090008
000007: 000f000e000d000c

Here is the AMC2 payload

022002: 3ff73ff63ff53ff4
022004: 3ffb3ffa3ff93ff8
022006: 3fff3ffe3ffd3ffc  Trailer is missing(?)
022008: 020000011f420003  AMC2 header size=0x20003
02200a: 0007000616dc0000  AMC2 header 
02200c: 000b000a00090008
02200e: 000f000e000d000c
022010: 0013001200110010

Here is the end of the block

amc13>rd 0x24004 0x20
024004: 3ffb3ffa3ff93ff8
024006: 3fff3ffe3ffd3ffc   end of payload
024008: cef9355d000011f4   This looks like the block trailer
02400a: cef9355d000011f4   past end rubbish

Now on to the next block

amc13>w 0xc 0
amc13>rd 0x20000 0x20
020000: 1020000000016dc0   block header 
020002: 3a00100000110000   AMC1 info  size=1000 3=mid block a=EV (no P)
020004: 3e00100000120000   AMC2 info  size=1000 3=mid block e=EPV

020006: 4003400240014000   AMC1 payload
020008: 4007400640054004

Second AMC

amc13>rd 0x22000 0x20
022000: 7ff77ff67ff57ff4
022002: 7ffb7ffa7ff97ff8
022004: 7fff7ffe7ffd7ffc
022006: 4003400240014000   Hmm, seems like a transition, but no header/trailer
022008: 4007400640054004

End of block

amc13>rd 0x24000 0x20
024000: 7ff77ff67ff57ff4
024002: 7ffb7ffa7ff97ff8
024004: 7fff7ffe7ffd7ffc
024006: fb2ba02f001011f4   block trailer for block 1 ok
024008: 0f870f860f850f84   past end
02400a: 0f8b0f8a0f890f88
02400c: 0f8f0f8e0f8d0f8c

Block 3


amc13>rd 0x20000 0x20
020000: 1020000000056540  Block header
020002: 3e00100000210000  AMC1 info
020004: 3e00100000220000  AMC2 info
020006: 8003800280018000
020008: 8007800680058004
02000a: 800b800a80098008

022000: bff7bff6bff5bff4
022002: bffbbffabff9bff8
022004: bfffbffebffdbffc
022006: 8003800280018000  no header
022008: 8007800680058004

024000: bff7bff6bff5bff4
024002: bffbbffabff9bff8
024004: bfffbffebffdbffc
024006: a2e1d516002011f4  block trailer
024008: 0f870f860f850f84
02400a: 0f8b0f8a0f890f88

Try size=0x1800 (1-1/2 blocks)

amc13>i 3 0x1800 amc13>t amc13>r 0xd 0000000d: 0000400a amc13>rd 0x20000 0x10 020000: 510000011f400008 CDF header 020002: 102000000002ab50 Block header 020004: 2e00180300010000 AMC1 info 020006: 2e00180300020000 AMC2 info 020008: 010000011f401803 AMC1 header 02000a: 000700062ab50000 payload 02000c: 000b000a00090008

022006: 3fff3ffe3ffd3ffc end payload 022008: 020000011f401803 AMC2 header 02200a: 000700062ab50000 payload 02200c: 000b000a00090008 02200e: 000f000e000d000c 022010: 0013001200110010 022012: 0017001600150014

024004: 3ffb3ffa3ff93ff8 024006: 3fff3ffe3ffd3ffc 024008: 77303f30000011f4 Block trailer 02400a: 77303f30000011f4

Next block

amc13>w 0xc 0
amc13>r 0xd
0000000d:  00002016
amc13>rd 0x20000 0x10
020000: 102000000002ab50  Block header
020002: 1f00080300110000  AMC header size=803 1F=end EPVC
020004: 1f00080300120000  AMC header size=803 1F=end EPVC
020006: 4003400240014000  payload  end should be 006 + (2*803) = 0x100c
020008: 4007400640054004
02000a: 400b400a40094008
02000c: 400f400e400d400c
02000e: 4013401240114010

021000: 5ff75ff65ff55ff4
021002: 5ffb5ffa5ff95ff8
021004: 5fff5ffe5ffd5ffc
021006: 6003600260016000
021008: 6007600660056004
02100a: 42c466d001001803  Here is the trailer
02100c: 4003400240014000  continue AMC2 payload
02100e: 4007400640054004
021010: 400b400a40094008
021012: 400f400e400d400c
021014: 4013401240114010
021016: 4017401640154014

022000: 5feb5fea5fe95fe8
022002: 5fef5fee5fed5fec
022004: 5ff35ff25ff15ff0
022006: 5ff75ff65ff55ff4
022008: 5ffb5ffa5ff95ff8
02200a: 5fff5ffe5ffd5ffc
02200c: 6003600260016000
02200e: 6007600660056004
022010: a8b0cc1e01001803
022012: c84d1ccb001011f4  
022014: a000301063220000  CDF trailer
022016: a000301063220000  end

2014-05-19, hazen

HOWTO add new fields to the board database

  1. Check out a test copy somewhere cgi scripts can be run (David can to this in his public_html on ohm)
    1. svn co https://edf.bu.edu/svn/edf/AMC13/boards
  2. Point browser at the copy you have checked out to be sure it works, e.g.:
    1. http://edf.bu.edu/~hazen/DBTest/boards/view
  3. Edit the file ...boards/tools/CMSConf.pm
    1. follow the instructions to edit the variables flist, bSQL and invDef.
  4. Modify the database itself
    1. change to directory ...boards
    2. enter command
      $ sqlite3 test.db
    3. enter commands
      sqlite> alter table amc13 add column dna varchar(30)
      =sqlite3> .quit=
      (e.g.) to add column "dna"

Test the new feature thoroughly. See me to replace the live database.

2014-05-05, dzou, wusx

Notes from wusx: initial test of AMC13XG on the special test stand (using ChipScope): Open CHIPSCOPE(v14.5 or later) and loading project file D:\vproject\testAMC\testAMC.runs\impl_3\testAMC.cpj Configure T2 and then T1 with files amc13_t2test.bit and amc13_t1.bit in D:\vproject\testAMC\testAMC.runs\impl_3. Open VIO consoles of MYVIO0 and MYVIO2, set prbssel to 0x111111111111 and amc1 thru amc12 in MYVIO0 would get some counts and then stay unchanging. If any channel is counting continuously, the corresponging AMC Tx/Rx has a proprblem. amc_en in the same window should read 0xfff. Connecting TTC signal to the bottom SFP and you should be able to see TTC clock signals on the terminating resistors on the test stand.

2014-04-29, dzou, wusx

Power Supply Problems on AMC13 SN54 and SN57
  • MMC programmed to 2.1
  • Putting board into crate and pushing in handle, front panel lights turn off, but then turn back on again after a few moments.

Hardware Problem on AMC13 SN51

  • T1 has a short
  • There was an attempt to solder a fix, which initially seemed to have fixed that problem, but the problem resurfaced upon testing the board
New Board Problems w/ AMC13 Config python scripts
  • Tests using SN 53, which is otherwise working using firmware (T1: 0x107, T2:0x1d)
    • SN-based IP applied upon powerup. Can be ping and connected to via AMC13Tool
    • readIPs.py only reads 0's for IP address
    • applyConfig and storeConfig also seem to not be working
    • readNVMem does seems to be working.
 > ./storeConfig.py --slot=1 --ip=192.168.1.148
Storing IP addresses to board in slot 1 from host 192.168.1.41
Unable to send RAW command (channel=0x7 netfn=0x32 lun=0x0 cmd=0x40 rsp=0xc3): Timeout
substr outside of string at ./config_tools/amc13config line 137.
Use of uninitialized value in split at ./config_tools/amc13config line 137.
Use of uninitialized value in multiplication (*) at ./config_tools/amc13config line 140.
Use of uninitialized value in multiplication (*) at ./config_tools/amc13config line 140.
Configuration Too Large 38 > 0 (with header) at ./config_tools/amc13config line 140.

2014-04-28, Dave

Testing new AMC13 boards:
  • SN 56: MMC programmed, 12V, bit file programmed. No IP address assigned.
  • SN 53: MMC programmed, 12V, bit file programmed. No IP address assigned.
  • SN 58: MMC programmed, 12V, bit file programmed. No IP address assigned.
Solution:
  • After MMC programmed, connect console and perform eeperase and mreset
  • Program configuration file (.bit file), w/o power cycling, connect using SN-based default IP address
  • python script currently not work for board w/ new T2, this may have been why it seemed that IP address were not assigned.
  • above SN are working now

2014-03-13, Dave

Testing storeConfig problems
  • SN 34 in slot 4 has IP address 192.168.1.100 (all IP address are stated for the T2, T1 is assumed 1 higher unless otherwise stated) and has that IP address in EEPROM.
    • Stored it w/ storeConfig to 192.168.230
      • Using readNVmem.py confirmed this changes
    • Cycled crate power
      • Checked the IP address using readIPs: 192.168.1.230
      • Checked the IP address stored using readNVmem: 192.168.1.230
    • Next stored 192.168.1.250 but instead just did a handle reset
      • Checked the IP address using readIPs: 192.168.1.230 (kept the old ip)
      • Checked the IP address stored using readNVmem: 192.168.1.250
    • Applied and Stored IP address of 192.168.1.222 and did handle reset
      • Checked the IP address using readIPs: 192.168.1.230 (reverted back to the old ip)
      • Checked the IP address stored using readNVmem: 192.168.1.222
      • Somehow, with a handle reset, it keeps a copy of IP address that was in the EEPROM the last time the crate was power cycled
  • This behavior seems to be present w/ all boards.
    • stored IP address is applied when the crate power is reset, however it does not get applied when doing just a simple handle reset.

2014-03-13, Dave

Experienced connectivity issues in CMS Lab, particularly with cmsssun4 and cms2. Connectivity to each other and to the outside world was spotty. Both cms2 and cmssun4 (and the other machines in CMS Lab) are connected to a D-Link switch which then connects to the outside world:

  • First noticed internet connectivity issues on cmssun4
    • Internet connection was inconsistent would periodically drop or slow down (once every few minutes)
    • Could ping to bu.edu w/ ping times of ~0.7 ms but would periodically pause and/or see ping times of ~100+ ms
  • ssh connection from cmssun4 to cms2 would periodically slow down
    • Connection did not time out usually, instead a dramatic lag would be introduced
    • Could ping to cms2.bu.edu w/ ping times of ~0.2 ms but would periodically pause and/or see ping times of ~100+ ms
  • cms2 seemed to also have problems connecting to the outside world
    • Eric experienced problems connecting to cms2 remotely in the morning
    • Could ping to bu.edu w/ ping times of ~1 ms but would periodically pause and/or see ping times of ~100+ ms
  • Plugged cms2 and cmssun4 into a new switch and plugged that switch into the original D-Link switch
    • Connection between cms2 and cmssun4 did not seem to improve (same ping stats as above)
    • Unplugged ethernet cables back to original configuration
  • Mysteriously, the connect problems seemed to have been fixed on its own.
    • Virtual machine was running during problems, but stopped and shortly after, problem was fix. May be coincidence, some test should be done.

2014-03-12, Eric

Guoan fixed the disk space problem through some partition-rearranging magic a few weeks ago.

Dan moved the uTCA crate into a small rack borrowed from Ed.

HOWTO set fan speeds on the crate:

[cms2] /home/hazen > telnet 192.168.1.41
Trying 192.168.1.41...
Connected to 192.168.1.41.
Escape character is '^]'.


Welcome to NAT-MCH

nat> fan_ctl
FAN control:
  print help menu:     0
  get fan properties:  1
  get fan speed level: 2
  set fan speed level: 3
  set silent:          4
  set loud:            5
  set minimum:         6
  set maximum:         7
Enter mode (RET=3/0x3): 

You can use obvious choices from this list, or enter 3 and pick a number apparently on the scale 1-10

2013-12-05, Eric

Working on cms2

  • /export/home is full. Note that /dev/sda is 1000G but only 100G is allocated as root. Create a new empty 800G partition as /dev/sda4 . No FS yet.
  • Reboot to get console alive (bypass yum auto-update).
  • Attempt to do updates. Requires root pw which is Guoan one starting with !

2013-11-13, David

Broadcast command (EvN, OrN, BcN reset) problems
  • Using AMC13 SN40 (firmware, T1:0x94, T2:0x17 ), uHTR SN7 in slot AMC2, and TTT SN2, broadcast commands issue resets as expected and EvN/OrN match and BcN match with an offset of 1.
  • Using same setup but with AMC13 SN42(firmware, T1:0x95, T2:0x18 ), broadcast commands do not seem to reset the number for the uHTR
  • Rolling back SN42 to old firmware ( T1:0x94, T2:0x17 ) and retesting:
    • Rolling back to old firmware does not fix the broadcast command resets
    • Problem is perhaps specific to the SN42 board?
Updated uHTR to newest versions (front: HF 1.6Gbps 0.A.62, back: HF 1.6Gbps 0.C.22)
  • Initial CrateTest run (w/ SN 40 using old firmware) had strange uHTR EvN behavior
  • Subsequent CrateTest runs has correct EvN matches (perhaps the uHTR was not fully initialized before initial run?)
    • Note: EvN and BcN now match. It seems though that the OrN now has a offset of at least 1:
        FED:   0 EvN: 0da000  BcN: 224  OrN: 0002bcd2  TTS: 0/0 EvTyp: 1  CalTyp: 0 Size: 236
        UHTR  1 [ 444] EvN 0da000 BcN 224 OrN 13
        FED:   0 EvN: 0dc000  BcN: a78  OrN: 0002c1ac  TTS: 0/0 EvTyp: 1  CalTyp: 0 Size: 236
        UHTR  1 [ 444] EvN 0dc000 BcN a78 OrN 0d
  • Retesting SN42 with new uHTR firmware:
    • System works with AMC13 SN42 w/ old firmware
    • Problem related to resets and link disconnects (make sure to disable link between amc13 and uHTR before removing hardware)

2013-10-21, David

Issue regarding CrateTest
  • After running random_100kHz_rules triggers without prescaling but with backpressure (in order to check system's response to backpressure), the setup when into an usable state
  • TTC Bc0s error register incrementing
    • Reboot of AMC13 gets rid of incrementing error
    • However, attempts to set up the system and run the CrateTest cause the TTC Bc0s error to increment again
    • Reloading from flash seems to stop the error from returning
  • After a reloading AMC13 from flash, attempt to setup system for CrateTest runs continue to fail
    • Links to uHTR report Bc0 lock; however, no events were being built
  • Eventually reloaded uHTR back FPGA from flash
    • Setting up system again now works
  • There is perhaps a bug somewhere in the system, which has troubling dealing with overflow and backpressure
    • If triggers are coming too fast, ideally system should be able to go into Busy State, stop triggers, and then go back into ready once the event building/dumping has caught up
    • It seems that if we go to a busy state, it stays there

2013-09-13, David and Eric

We are experiencing IPBus2 errors during block reading, which has also been experienced by collaborators at both Bristol and CERN.
  • There seems to be a random chance for a block read of the events to fail (for example during a dump for events to file).
  • Failures will occur after a variable amount of block reads
  • Failure do not seem to related to corrupt data. A successful work around seems to be that if a block read fails, rechecking the same event immediately afterwards will succeed.
  • Created a test functions in AMC13Tool to either block read through all event (i.e. block reads followed by page turns) or block reading over the same event over and over.
    • Failures occur in both causes which can occur after variable number of block reads, typically after a few hundred. But can occur as early as after a few tens of reads or 1000+ block reads.
    • If the size of the block read is reduced (e.g. reading only 1/10th of the total number of words per event, for the fake events create by AMC13 that is about 620 words), the rate at which the error occurs seems to increase.
      • Consistently failing after only a few block reads (typically less than 10)
      • With this reduced block read size, errors can be experienced by manually doing successive block reads on the events saved in buffer (e.g. brv 0x4000 620)
  • Introducing a sleep between successive seems to slightly decrease the rate of failures, but only marginally.
    • A one second sleep may only double or triple the average number of block reads before failure for a give block read size.
    • But the significant delay introduced severely slows down the process of reading or dump a multitude of events, and therefore is not a viable work around.

2013-09-13, Nic E, David and Eric

Nic E is here with his VT892 crate, PM, MCH and AMC13XG# 34 to diagnose dead AMC13 problems.

  • (Yesterday) Wu looks at #39 T1 board and finds that the +12V draws excessive current above about ~1V:

AMC13XG #39 is dead because most likely one of the components on the 12V line is shorted to ground so there's no power supply for the module. There are four power ICs and eight capacitors connected to the 12V line. We need to remove these ICs one by one to find the culprit. Capacitors are unlikely the source of the problem. The ICs are U9, U16, U19 and U13 Caps are C146, C147, C148, C161, C162, C186, C187 and C159

  • confirm that #34 doesn't power up in our crate. MMC complains about power out of spec and requests payload power shutdown.

Combined working parts of broken boards (SN39 T2 abd SN34 T1). MCH info of combination in BU crate:

nat> show_fru

FRU Information:
----------------
 FRU  Device  State  Name
==========================================
  0   MCH      M4    NMCH-CM
  3   mcmc1    M4    NAT-MCH-MCMC
 12   AMC8     M4    BU AMC13
 40   CU1      M4    VT VT095
 41   CU2      M4    VT VT095
 50   PM1      M4    VT UTC010
==========================================
nat> show_sensorinfo 12
Sensor Information for AMC 8
========================================================
  #   SDRType  Sensor Entity Inst  Value   State  Name
--------------------------------------------------------
  0   MDevLoc          0xc1  0x68                 BU AMC13
  0   Full     0xf2    0xc1  0x68  0x01        Hotswap
  1   Full     Temp    0xc1  0x68  27.00   ok     T2 Temp
  2   Full     Voltage 0xc1  0x68  12.54   ok     +12V
  3   Full     Voltage 0xc1  0x68  3.315   ok     +3.3V
  4   Full     Voltage 0xc1  0x68  1.2000  ok     +1.2V
  5   Full     0x08    0xc1  0x68   0x00     0x02  Pwr Good
  6   Full     0x15    0xc1  0x68   0x00     0x01  Alarm Level
  7   Full     0xc0    0xc1  0x68   0x00     0x2d  FPGA Config
--------------------------------------------------------

Capacitor on SN39 T1 was found to be causing short and was replaced. MCH printout while in Cornell crate:

FRU Information:
----------------
 FRU  Device  State  Name
==========================================
  0   MCH      M4    NMCH-CM
  3   mcmc1    M4    NAT-MCH-MCMC
  9   AMC5     M4    BU AMC13
 40   CU1      M4    VT VT095
 41   CU2      M4    VT VT095
 51   PM2      M4    VT UTC010
==========================================
nat> show_sensorinfo 9
Sensor Information for AMC 5
==================================================================
  #   SDRType  Sensor Entity Inst  Value   State    Name
------------------------------------------------------------------
  0   MDevLoc          0xc1  0x65                   BU AMC13
  0   Full     0xf2    0xc1  0x65  0x01             Hotswap
  1   Full     Temp    0xc1  0x65  26.00     ok     T2 Temp
  2   Full     Voltage 0xc1  0x65  12.48     ok     +12V
  3   Full     Voltage 0xc1  0x65  3.315     ok     +3.3V
  4   Full     Voltage 0xc1  0x65  1.2000    ok     +1.2V
  5   Full     0x08    0xc1  0x65  0x00       0x02  Pwr Good
  6   Full     0x15    0xc1  0x65  0x00       0x01  Alarm Level
  7   Full     0xc0    0xc1  0x65  0x00       0x2d  FPGA Config
------------------------------------------------------------------

2013-09-13, David

Investigating second dead AMC13XG #34 (from Cornell). Board does not seem to be powering up properly. Swapped T1 and T2 board with components from a working AMC13XG (#43).

  • Combination of SN34 T2 board with SN43 T1 board seems to exhibit original issue w/ SN34 board.
  • Combination of SN34 T1 board with SN43 T2 board seems to power up and can be communicated with (used AMC13Tool).
  • Problem possibly isolated to T2. Note that this is the opposite behavior exhibited by the dead AMC13XG #39 investigated previously.

Additional check of the working parts from the broken Cornell boards

  • Combination of SN39 T2 board with SN34 T1 board seems to power up and can be communicated with (used AMC13Tool).

Setting Cornell MCH I/P address 192.168.1.5

2013-09-12, David

Investigating dead AMC13XG #39 (from Cornell). Board does not seem to be powering up properly. Swapped T1 and T2 board with components from a working AMC13XG (#43).

  • Combination of SN39 T2 board with SN43 T1 board seems to power up and can be communicated with (used AMC13Tool).
  • Combination of SN39 T1 board with SN43 T2 board seems to exhibit original issue w/ SN39 board.
  • Problem possibly isolated to T1

2013-07-25, Eric and Ben

  • Studying the production-test-kills-the-AMC13 phenomenon
  • Work with SN44, firmware ver 0x17/0x8a
  • Run some simple command-line tests. log. Things work as we expect.
  • Try running the production test. Various transient errors happen, then we get into the known failure mode where no triggers are seen log.
  • Found that running the production test leaves the "Local Trigger Control" register (0x1c) set to 0xc0000000, which causes the known trigger failure.
    • Quick fix: write "wv 0x1c 0" to set Local Trigger Control back to default
  • Found that multiple runs of the production test can cause the memory error state logged above. In this state, AMC13 data is nonsensical.
    • Desperate fix: write "L" to reconfigure flash

2013-07-12, dzou

  • High rates test w/ AMC13, uHTR, and TTT:
    • uHTR in slot AMC10 (SN 4) working and sending data, uHTR in slot AMC2 (SN 7) not working events not being built-- back error light on front panel blinking

2013-07-01, dzou, dickens

  • Clock Test while changing overall crate temperature:
    • Covered top, bottom, and front of crate w/ insulating material to raise internal temperature of crate (~30 degrees)
    • Placed thermistor in crate, near AMC13 (SN43) to measure internal temperature
    • Removed insulating material to allow crate temperature to cool, while measuring clock temperature
    • Data: ClockTestCrateTemp.xlsx

2013-06-26, dickens, dzou

  • Clock Test on input to mLVDS on SN 43:
    • Characteristic exponential curves. Maximum shift ~170 ps. Compared to maximum shift of ~100 ps on SN 33 board.
  • Clock Test on input to mLVDS on SN 43 w/ temperature readout on U2:
    • Characteristic exponential curves. Maximum shift ~250 ps. Compared to maximum shift of ~100 ps on SN 33 board.

  • Clock Test on input to mLVDS on SN 43 /w controlled temperature via resistor on U2. Notes:
    • AMC13 booted up at 0 s, U2 chip allowed to reach thermal eq.
    • Begin heating U2 to 35 deg. C at 800 s
    • Begin heating U2 to 40 at 1600 s
    • Begin heating U2 to 45 at 2468 s

clockdelaytest_inputofU25.jpg

***Clock delay values normalized to U2 thermal eq. value (at ~600 s)

2013-06-26, dickens, dzou

  • Measured the voltage across capacitor 95 (C95), which should have a voltage drop of 3.3 V in parallel with the U2 chip
    • One wire soldered on to each side of the capacitor, each wired to a voltmeter
    • After keeping the AMC13 off for a very long time, voltmeter reads 3.333 V immediately after handle is pressed in, fluctuating between 3.333 V and 3.334 V consistently
    • After a quick handle reset, voltmeter reads constant 3.33 V immediately once again
  • Clock Test with SFP cool down:
    • Board on - warmed up to equilibrium
    • Took out SFP, cool down for 5 min
    • Put back SFP and took delay measurements
    • Barring the first measurement (directly after power on), measurement are within 3 ps of 195 ps delay (graph below)

2013-06-25, dickens, dzou

  • 40 MHz and 900 mV peak to peak square. Testing probe parts:
    • Both probes that connect directly to board have work in previous tests.
    • Oscilloscope Trigger Setup Notes:
      • Chann 2 AWG output signal, Chann 4 probe signal
      • Trigger on Chann 4 Width, Greater than 10.0, Level: -22mV
    • WL-PBus SN 1392: working, SN 1491: working
    • D310 Connector, SN 2067 and 2072: both working
  • Clock Delay test w/ temperature control:
    • Probe on input to U3, Temperature control on U3
    • Delay measured at various temperatures: 30, 35, and 40 Celcius

  • Clock Delay test w/ temperature set at 40 deg C, power off for roughly 8 min, while keeping chip at 40 deg C. Turning on board still gives exponential Clock shift:

Plot of Delay plotted against time after power on

2013-06-24, dickens, dzou

  • Measured the temperature of the U2 chip on T1 before, during, and after the start up of AMC13 SN 33. This provides some insight about the operating range for future , temperature-controlled tests. Please note, however, that the thermistor device functions as a heat sink, so we can expect the actual operating temperature of the chip to be somewhat higher. Setup:
    • U2 covered in heat sink compound and held in thermal contact with thermistor
    • Thermistor connected to arduino provides temperature readout which goes to laptop

Data sheet:

tempU2_1.jpg

  • ***Notes on data:
    • thermistor place in contact with chip at ~20 s
    • AMC13 booted on at ~70 s
    • bump in data from ~240 s to ~290 s due to loss of thermal contact
    • AMC13 turned off at ~370 s

2013-06-19, dzou, hazen

  • Move probe to CLK output of ADN2814 CDS
  • Data on Sheet 6 of: ClockPhaseShift.xlsx
  • Height of roughly 200 ps (similar to after SY89872/ U4, see previous entry in log)

Plot of clock phase at various power off times

  • Study datasheets for tempco or phase/delay specs
    • ADN2814: no obvious information, but it's pretty long and detailed so I may have missed something
    • SY89832: 300-500ps delay; 200ps risetime; no tempco information
    • SY89872: 450-700ps delay; 130ps risetime. There is a prop delay vs temp plot; 25ps over 60 deg C more or less linear (0.42ps/deg C)
    • DS91M125: 3-8ns delay; risetime 2ns per M-LVDS spec. There is a prop delay vs temp plot; slope is 10ps/deg C

2013-06-17, dzou, hazen

  • Move probe to LVDS pair between U3 and U4 on T1 (Fanout and divider chip) and repeat measurements.
  • Typical exponential behavior of phase shift. Height of roughly 200 ps (after power off of 5 mins).
  • Data in Sheet 5 of: ClockPhaseShift.xlsx

Plot of clock phase at various power off times

2013-06-14 dzou, dickens

  • Update TTT firmware and tested range of TTT SN 2: Found similar results as 2013-06-13 entry. AMC13 clock stops working at somewhere between 80 and 90 MHz. Lock lost at around 120 MHz.

  • Continuing Clock Phase Shift testing:
    • Placed probe in clock path prior to the m-LVDS (T2 board -- U25 input @ R18) on SN 33
    • Observed a unexpected phase behavior when changing amplitude of AWG:
      • Shifting from 900mV peak to peak to 600mV cause the signals (AWG and probe) to go out of phase and stay there even after returning to 900mV. (delay of roughly 9.58 ns)
      • Shifting back down to 600 mV puts it back in phase again. (delay of roughly 3.38 ns)
      • (will include screen shots if necessary)
    • Continued testing with settings described in 2013-06-11 entry
    • Data in Sheet 3 of: ClockPhaseShift.xlsx
Plot of clock phase at various power off times

  • Second probe location (independent from above) Moved probe to output of SFP receiver for TTC input (two vias next to U2 on T1).
    • Repeat power-cycle tests
    • See essentially no shift with power cycling

2013-06-13 dickens

  • AMC13XG and TTT frequency range tests:
    • Found the upper limit of clock frequency manageable by AMC13XG to be 85.1 MHz. Periodic "gaps" of constant amplitude voltage appear in the AMC13XG signal thereafter. These gaps become more frequent as the clock frequency increases.
      • Setup is identical to setup notated by "2013-06-11 dzou, hazen," but with variable frequency outputted by the AWG.
    • Found the upper limit of clock frequency manageable by TTT to be 124 MHz, which is where jitters begin. Jitters become more pronounced around 125 MHz and the TTT loses lock by 125.7 MHz.
      • The setup for this test maintains all connections of the previous test, but for one exception. The B channel of the TTT is connected to itself, creating a feedback loop, rather than connected to the AMC13 TTC port by 4ft fiber.

2013-06-12 dzou

  • Additional test runs made for investigating clock phase shift:
    • Made initial test run after over night power off (~16 hours)
    • Made three test runs w/ 10s power off and then one test run for various times of power off: 1, 2, 5, 10 mins (see below).
    • Data in Sheet 2 of: ClockPhaseShift.xlsx
    • ClockPhaseShift.pdf (pg 3 and 4)
  • Observations not included in test runs:
    • Phase shift after 1 hour of power on: 4.524
Plot of clock phase at various power off times

2013-06-11 dzou, hazen

  • Measuring TTC Clock Phase Shift - Notes on Setup:
    • Using Agilent AWG 40 MHz 900mV peak to peak output as external clock
    • AWG connect by 2 ft onto T on channel 4 of oscilloscope (LeCroy 725 ZI 2.5 GHz)
    • 4ft long cable from T to external clock to TTT
    • D310 differential probe soldered to FClkA on receiver AMC13 at backplane connector w/ 100 ohm between the pair
    • AMC13 receiving clock is in slot 8 (SN 43)
    • SN 40 in AMC13 with 4ft duplex fiber to B chann of TTT

  • Saved oscilloscope screen shot clocks2.jpg 4.47 ns (after being turned on for a while, before any power cycling)
  • Cycling handle power of AMC13 that is providing clock (SN 40) - power down for 30 secs (phase at 4.34 ns) (clocks3.jpg)

Plot of a few test runs. Results are similar in magnitude to those seen at CERN.

2013-06-06 dzou

  • Updates to the T1 firmware version 0x8a seems to fix the errors from 2013-06-04 debug log. Production test can run consecutively on the same board w/o a power reset w/ 0x8a.
  • The board currently in slot AMC13 seems to be SN 40 but is labelled w/ a front panel w/ SN 37. This should be corrected.

2013-06-04 David, Eric, Ben

  • Trying to test S/N 34 in slot 4. Observe that after running Charlie's production test that we can't run it again, and in fact even after power cycling can't ping T1. Try eeperase and mreset on the MMC, still no go!
  • Try the same treatment on S/N 39. Both FPGAs still respond to ping. But the test fails in a different way the 2nd time.
  • Mysteries abound. Could be hardware, MMC or production test software or a combination of them. In any case shipping anything under these conditions seems like a poor idea.
  • production test seems to not work twice in a row without spitting errors, but seems to work after a handle power reset. But an error will often occur on the first attempt if you do not wait sufficiently long for the AMC13 to boot up (2 min was sufficient).
  • Had similar problems while trying to program the FPGA firmware for S/N 44 in slot 4. After programming the FPGA's and loading, continued to get T1 not found at this location error. Note that prior to this, we were able to program S/N 43 in slot 3 with no problems.
  • Errors arise while trying to read from the flash. T1 Error : IPbus Transaction failure during write to register 'FLASH_CMD' or T2 Error : IPbus Transaction failure during write to register 'FLASH_WBUF'
  • Trying to program S/N 45 in slot 3 using 0x89. After programming FPGA, T1 is unreachable. Revert to T1 FPGA of 0x87 and T1 is working again.

2013-05-30 hazen

  • NAT MCH in crate seems to have died! Symptom was that the Ethernet switch worked (IPBus access to the AMC modules was fine) but couldn't access the MCH by telnet, and the LAN-to-ipmi bridge didn't work. Also, couldn't connect by serial cable to the WinXP machine.
  • Replaced with spare. MCH I/P is now 192.168.1.41. Works. However, the serial access using PUTTY on WinXP machine still doesn't work. Can talk to it fine using minicom on cms2 with port /dev/ttyACM0. Go figure.

2013-05-10 hill and wu

Edit by David Zou on 2013-06-07: The IP address has been recently changed so if you use any of the IPMI commands described below, make sure to use the new IP address:

192.168.1.41
MMC IPMI commands for reading an AMC13 in slot 3 (IPMB 0x76)
  • Read Spartan Address
       ipmitool -H 192.168.1.11 -U '' -P '' -T 0x82 -b 7 -t 0x76 raw 0x32 0x34 0 0 0 20
       
  • Read Virtex Address
       ipmitool -H 192.168.1.11 -U '' -P '' -T 0x82 -b 7 -t 0x76 raw 0x32 0x34 1 0 0 20
       
  • Write IP Address 192.168.1.100 to SPI Port 0
       ipmitool -H 192.168.1.11 -U '' -P '' -T 0x82 -b 7 -t 0x76 raw 0x32 0x33 0 0 0 11 03 0xff 0xff 00 00 0xc0 0xa8 0x01 0x64 0x00 0x00
       
  • Write IP Address 192.168.1.101 to SPI Port 1
       ipmitool -H 192.168.1.11 -U '' -P '' -T 0x82 -b 7 -t 0x76 raw 0x32 0x33 1 0 0 11 03 0xff 0xff 00 00 0xc0 0xa8 0x01 0x65 0x00 0x00
       
  • Configure Spartan FPGA from SPI 0
       ipmitool -H 192.168.1.11 -U '' -P '' -T 0x82 -b 7 -t 0x76 raw 0x32 0x32 0 0x22
       
  • Configure Virtex FPGA from SPI 1
       ipmitool -H 192.168.1.11 -U '' -P '' -T 0x82 -b 7 -t 0x76 raw 0x32 0x32 1 0x22
       

2013-05-03 hill

Procedure for initializing the DAQ Path on the uHTR board and getting it to send data for a module in slot #2 in the uTCA crate:

$ cd hcal/hcalUHTR
$ ./bin/linux/x86_64_slc5/uHTRtool.exe 192.168.115.8
 (0) IP[192.168.115.8]  Type: UHTR 
 ID of uHTR (-1 for exiting the tool) :: 
 Front Firmware revision : (00) 00.04.13 
 Back Firmware revision : (00) 00.06.10 
  Clock expected at    25.0000 MHz :    25.0000 MHz (front)     25.0000 MHz (back) 
  Clock expected at   100.0000 MHz :   100.0000 MHz (front)    100.0000 MHz (back) 
  Clock expected at    40.0800 MHz :    39.9999 MHz (front)     39.9999 MHz (back) 
  Clock expected at    80.1600 MHz :    79.9998 MHz (front)     79.9998 MHz (back) 
  Clock expected at   120.2400 MHz :   119.9997 MHz (front)    119.9997 MHz (back) 
  Clock expected at   160.3200 MHz :   159.9997 MHz (front)    159.9997 MHz (back) 
  Clock expected at   240.4800 MHz :   239.9995 MHz (front)    239.9995 MHz (back) 
  Clock expected at   320.6400 MHz :   319.9994 MHz (front)    319.9993 MHz (back) 
  Clock expected at    11.0000 kHz :    11.2230 kHz (front)     11.2230 kHz (back) 
  Clock expected at     0.1100 kHz :     0.0000 kHz (front)      0.0000 kHz (back) 
  Clock expected at    40.0800 MHz :    39.9999 MHz (front)     39.9999 MHz (back) 
  Clock expected at    40.0800 MHz :    39.9999 MHz (front)     39.9999 MHz (back) 
  Clock expected at   240.4800 MHz :     0.0000 MHz (front)     56.7837 MHz (back) 



   STATUS       Status summary of the uHTR card
   LINK         Status and control of frontend links
   DTC          Information received from DTC
   CLOCK        Clock module work
   SENSOR       I2C sensors and controls
   TRIG         Trigger-path work
   DAQ          DAQ-path work
   TEST         Functionality tests of uHTR Board
   FLASH        Flash programming and readback menu
   LUMI         LUMI-DAQ work
   EXIT         Exit this tool
 > daq

   STATUS       Status of the DAQ path
   SPY          Read the DAQ path spy
   CTL          Control the DAQ path
   F2B          F2B DAQ Link Operations
   QUIT         Back to top menu
 > ctl

DAQ F2B Links
  0 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 8675 (0.000000e+00 Hz)
  1 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 8675 (0.000000e+00 Hz)
  2 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 9424 (0.000000e+00 Hz)
DAQ Path : ENABLED       ZS(per sample)
   Last EVN: 10   OrN : 2983852  Header Occupancy : 0  (Peak : 1)
   Samples: 10   Presamples : 4  Pipeline Length : 50
   ZS Mask (one means ignore) : 0x   0 
   TP Samples: 14   TP Presamples : 11  
   TP ZS : TP_NZS  
   Module Id : 0 (0x0)   BC Offset : 0


 (1) Set Module Id  (2) Set BC Offset       (3) Set NSAMPLES
 (4) Set PRESAMPLES (5) Set Pipeline Length (6) Set ZS Mask 
 (7) Enable DAQ Path (toggle)   (8) Reset DAQ Path 
 (9) Toggle NZS    (10) Toggle Mark-And-Pass ZS    (11) Toggle ZS Sum-By-Two
 (12) Dump ZS Thresholds   (13) Edit ZS Thresholds   (14) Uniform ZS
 (15) Set TP PRESAMPLES  (16) Set TP SAMPLES
 (17) Toggle ZS for TP (18) Toggle SOI-only for TP

 (  Anything else will just return to the original menu )

Selection :  [-1] 3
  New nsamples :  [10] 10

DAQ F2B Links
  0 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 8675 (0.000000e+00 Hz)
  1 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 8675 (0.000000e+00 Hz)
  2 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 9424 (0.000000e+00 Hz)
DAQ Path : ENABLED       ZS(per sample)
   Last EVN: 10   OrN : 3737623  Header Occupancy : 0  (Peak : 1)
   Samples: 10   Presamples : 4  Pipeline Length : 50
   ZS Mask (one means ignore) : 0x   0 
   TP Samples: 14   TP Presamples : 11  
   TP ZS : TP_NZS  
   Module Id : 0 (0x0)   BC Offset : 0


 (1) Set Module Id  (2) Set BC Offset       (3) Set NSAMPLES
 (4) Set PRESAMPLES (5) Set Pipeline Length (6) Set ZS Mask 
 (7) Enable DAQ Path (toggle)   (8) Reset DAQ Path 
 (9) Toggle NZS    (10) Toggle Mark-And-Pass ZS    (11) Toggle ZS Sum-By-Two
 (12) Dump ZS Thresholds   (13) Edit ZS Thresholds   (14) Uniform ZS
 (15) Set TP PRESAMPLES  (16) Set TP SAMPLES
 (17) Toggle ZS for TP (18) Toggle SOI-only for TP

 (  Anything else will just return to the original menu )

 Selection :  [-1] 4
  New presamples :  [4] 4

DAQ F2B Links
  0 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 8675 (0.000000e+00 Hz)
  1 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 8675 (0.000000e+00 Hz)
  2 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 9424 (0.000000e+00 Hz)
DAQ Path : ENABLED       ZS(per sample)
   Last EVN: 10   OrN : 3803540  Header Occupancy : 0  (Peak : 1)
   Samples: 10   Presamples : 4  Pipeline Length : 50
   ZS Mask (one means ignore) : 0x   0 
   TP Samples: 14   TP Presamples : 11  
   TP ZS : TP_NZS  
   Module Id : 0 (0x0)   BC Offset : 0


 (1) Set Module Id  (2) Set BC Offset       (3) Set NSAMPLES
 (4) Set PRESAMPLES (5) Set Pipeline Length (6) Set ZS Mask 
 (7) Enable DAQ Path (toggle)   (8) Reset DAQ Path 
 (9) Toggle NZS    (10) Toggle Mark-And-Pass ZS    (11) Toggle ZS Sum-By-Two
 (12) Dump ZS Thresholds   (13) Edit ZS Thresholds   (14) Uniform ZS
 (15) Set TP PRESAMPLES  (16) Set TP SAMPLES
 (17) Toggle ZS for TP (18) Toggle SOI-only for TP

 (  Anything else will just return to the original menu )

 Selection :  [-1] 5
  New pipeline length :  [50] 50

DAQ F2B Links
  0 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 8675 (0.000000e+00 Hz)
  1 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 8675 (0.000000e+00 Hz)
  2 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 9424 (0.000000e+00 Hz)
DAQ Path : ENABLED       ZS(per sample)
   Last EVN: 10   OrN : 3842107  Header Occupancy : 0  (Peak : 1)
   Samples: 10   Presamples : 4  Pipeline Length : 50
   ZS Mask (one means ignore) : 0x   0 
   TP Samples: 14   TP Presamples : 11  
   TP ZS : TP_NZS  
   Module Id : 0 (0x0)   BC Offset : 0


 (1) Set Module Id  (2) Set BC Offset       (3) Set NSAMPLES
 (4) Set PRESAMPLES (5) Set Pipeline Length (6) Set ZS Mask 
 (7) Enable DAQ Path (toggle)   (8) Reset DAQ Path 
 (9) Toggle NZS    (10) Toggle Mark-And-Pass ZS    (11) Toggle ZS Sum-By-Two
 (12) Dump ZS Thresholds   (13) Edit ZS Thresholds   (14) Uniform ZS
 (15) Set TP PRESAMPLES  (16) Set TP SAMPLES
 (17) Toggle ZS for TP (18) Toggle SOI-only for TP

 (  Anything else will just return to the original menu )

 Selection :  [-1] 8

DAQ F2B Links
  0 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 8675 (0.000000e+00 Hz)
  1 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 8675 (0.000000e+00 Hz)
  2 : Status = f  Errors = 19349  (0.000000e+00 Hz)  Words = 9424 (0.000000e+00 Hz)
DAQ Path : ENABLED       ZS(per sample)
   Last EVN: 0   OrN : 39  Header Occupancy : 0  (Peak : 0)
   Samples: 10   Presamples : 4  Pipeline Length : 50
   ZS Mask (one means ignore) : 0x   0 
   TP Samples: 14   TP Presamples : 11  
   TP ZS : TP_NZS  
   Module Id : 0 (0x0)   BC Offset : 0


 (1) Set Module Id  (2) Set BC Offset       (3) Set NSAMPLES
 (4) Set PRESAMPLES (5) Set Pipeline Length (6) Set ZS Mask 
 (7) Enable DAQ Path (toggle)   (8) Reset DAQ Path 
 (9) Toggle NZS    (10) Toggle Mark-And-Pass ZS    (11) Toggle ZS Sum-By-Two
 (12) Dump ZS Thresholds   (13) Edit ZS Thresholds   (14) Uniform ZS
 (15) Set TP PRESAMPLES  (16) Set TP SAMPLES
 (17) Toggle ZS for TP (18) Toggle SOI-only for TP

 (  Anything else will just return to the original menu )

 Selection :  [-1] 7

If you now send L1As to the uHTR, it produces events! Very good!

2013-03-11 hill

Trouble getting the TTCvi to work. The VME crate had been shut down for at least two weeks prior to this test. Otherwise, it was never touched...

Procedure which fails:

Reset the Xilinx trigger board

$ cd source ~/environ.sh (to set the environment)
$ cd ~/TTS_ctrl
$ ./periodic_120hz
Orbit Length:       -o 3563 BX
Trigger Delay:      -d 500 BX
Orbit Count:        -n 0 orbits
Trigger Spacing:    -s 25 BX
Triggers per orbit: -t 1 triggers
Repeat period:      -r 100 orbits
Random threshold:   -p 0 / 65535
TTS latency         -l 0 BXn (0 sec)
TTS sample mask     -m 0
TTC cmd BCN         -w 1000
Allow L1A in gap    -g 0
Trigger rule 1: not enabled
Trigger rule 2: not enabled
Trigger rule 3: not enabled
Trigger rule 4: not enabled
Rule enable mask: 0x0
$ cd ../ttc
$ DCCdiagnose.exe -x setup_ttc.dcc
DCCdiagnose.exe revision 27 Mar 2009
Load TTCvi_base = 0xf10000
Load vme_slot = 0x13
Load log_level = error
Load hal_path = /home/daqowner/dist/hal/hcal/
Load vme_bus = caen:0
Load ttc_bus = caen:0
HAL search path set to /opt/xdaq/hal/hcal/
Overriding default HAL path with /home/daqowner/dist/hal/hcal/ from PROGRAMMER_HAL_PATH
Looking for HAL addresstable files in directory /home/daqowner/dist/hal/hcal/
(change by setting PROGRAMMER_HAL_PATH environment variable)
INFO:  Logger set up
V2718 firmware : 2.00
A2818 firmware : 0.06
VMELibRelease  : 2.30.2
INFO:  busAdapter set up
INFO:  DCC constructed
DCC1 created
INFO:  DCC connected
INFO:  TTCvi set up
DCCdiagnose.exe - dcc->setupHAL()
(/home/daqowner/dist/hal/hcal/DCC_LRB.dat,/home/daqowner/dist/hal/hcal/DCC_LTB.dat,/home/daqowner/dist/hal/hcal/DCC_log12.dat,/home/daqowner/dist/hal/hcal/DCC_log123_conf.dat,/home/daqowner/dist/hal/hcal/DCC_logicboardv4_2c.dat
DCC::initialize()...
DCC::getMainAddressMap()... DCC::getMasterDevice()
DCC revision is 2c36
[Script setup_ttc.dcc start]
#
# DCC script to setup TTCvi
#
    ttc/write 0x82 0xf000    # reset BGO fifos
    ttc/write 0x80 0xff64        # enable external orbit, disable triggers
    ttc/write 0x92 10            # inhibit 0 delay (250ns)
    ttc/write 0x94 10       # inhibit 0 duration (250ns)
    ttc/write BData0 0x00800000  # write one word (BCR, cmd=01) to fifo 0
    ttc/write 0x90 0xd           # enable BG0 channel 0
    ttc/cmd 2                    # send ECR
    ttc/cmd 0x28                 # send OCR
    ttc/trig 4          # disable L1A
TTC L1A source set to 4 (VME)
q
$ DCCdiagnose.exe
>ttc/trig 1
TTC L1A source set to 1 (panel input)

No triggers are sent, and the TTCvi panel has LEDs lit which I have never seen lit before, namely "L1A Req" (a yellow light) and "Req0" (a red light...this one may have been lit before and I may just not have noticed).

After letting it run for awhile like that, this is what the AMC13 status looks like:

EVB Counters:  (All 32-bit counters read 0x0)
                   TTC BC0 err [0044]: 00000000 00000014
                      Run time [0048]: 00000008 901aa25a
                    Ready time [004a]: 00000008 901aa258
                     Busy time [004c]: 00000000 00000001
            L1A ovfl warn time [0050]: 00000000 00000001
AMC Counters:
                                       <---Link 00-----> <---Link 01-----> <---Link 02-----> <---Link 03-----> <---Link 04-----> <---Link 05----->
              Single Bit Error [0042]: 00000000 2a969f7d 00000000 010c7ed6 00000000 00000000 00000000 2cfa0526 00000000 1f6064d3 00000000 246de015
               Multi Bit Error [0044]: 00000000 2979f006 00000000 010ceb14 00000000 00000000 00000000 2b92e914 00000000 1c20314c 00000000 22ce9cc2
                        Resend [004a]: 00000000 000034d9 00000000 00000004 00000000 00000000 00000000 0003b62b 00000000 0001c064 00000000 00013702
                    Data Abort [0056]: 00000000 0000018f 00000000 00000000 00000000 00000000 00000000 00005cba 00000000 00000013 00000000 000006a5
                 Counter Abort [0058]: 00000000 0000056f 00000000 0000056f 00000000 00000000 00000000 00005c3a 00000000 0000001c 00000000 000003dd
                     SEQ Abort [0060]: 00000000 00000a23 00000000 00000000 00000000 00000000 00000000 0001026d 00000000 0000004c 00000000 00000d8a
                     CRC Abort [0062]: 00000000 00000a27 00000000 00000000 00000000 00000000 00000000 000104c9 00000000 0000004c 00000000 00000d8a
                   Frame Abort [0064]: 00000000 00000a27 00000000 00000000 00000000 00000000 00000000 000104ca 00000000 0000004c 00000000 00000d8a
                  K Char Abort [0066]: 00000000 000006af 00000000 00000000 00000000 00000000 00000000 0000b1c5 00000000 0000002f 00000000 00000a70


                                       <---Link 06-----> <---Link 07-----> <---Link 08-----> <---Link 09-----> <---Link 10-----> <---Link 11----->
              Single Bit Error [0042]: 00000000 0d36f3be 00000000 274afb1e 00000000 2540bcdb 00000000 26500e4a 00000000 2a4f9392 00000000 1b3271bf
               Multi Bit Error [0044]: 00000000 0c796460 00000000 26120309 00000000 247c3658 00000000 2537700e 00000000 28a7e404 00000000 1b0e5e0e
                        Resend [004a]: 00000000 00000182 00000000 00025f24 00000000 00003199 00000000 000298db 00000000 00015b55 00000000 00005991
                    Data Abort [0056]: 00000000 00000047 00000000 00002d40 00000000 00000100 00000000 0000109d 00000000 000013b7 00000000 000000c0
                 Counter Abort [0058]: 00000000 00000019 00000000 00000789 00000000 000000bd 00000000 0000028c 00000000 00000169 00000000 00000038
                     SEQ Abort [0060]: 00000000 0000008a 00000000 00004003 00000000 000002c5 00000000 00001df4 00000000 000024bb 00000000 00000131
                     CRC Abort [0062]: 00000000 0000008a 00000000 000040f8 00000000 000002c9 00000000 00001df8 00000000 00002714 00000000 00000132
                   Frame Abort [0064]: 00000000 0000008a 00000000 000040f8 00000000 000002c9 00000000 00001df8 00000000 00002714 00000000 00000132
                  K Char Abort [0066]: 00000000 00000060 00000000 0000349b 00000000 000001b9 00000000 000012fe 00000000 000014cd 00000000 000000f5

Ummmm...I fixed it, I guess. All I did was leave the system alone for a second to read about TTCvi, and it started working all of a sudden. Huh. Well, I'm not complaining! The LED lighting didn't have anything to do with the problem, apparently. Perhaps the system needs to take a minute to gather itself after being shut off for awhile.

2012-11-26, hazen/hill

Changing VadaTech IP addresses:

  • eth1 (GbE) now 192.168.40.250 change to 192.168.1.2
  • eth0 (10/100) now 192.168.1.252 change 192.168.2.2

# net interface 0
export SYSCFG_IFACE0=y
export INTERFACE0="eth0"
export IPADDR0="192.168.2.2"
export NETMASK0="255.255.255.0"
export BROADCAST0="192.168.2.255"
export GATEWAY0="192.168.1.1"
export NAMESERVER0="0.0.0.0"
# net interface 1
export SYSCFG_IFACE1=y
export INTERFACE1="eth1"
export IPADDR1="192.168.1.2"
export NETMASK1="255.255.255.0"
export BROADCAST1="192.168.1.255"
export GATEWAY1="0.0.0.0"
export NAMESERVER1="0.0.0.0"
  • After setting this configuration, we are unable to ping the 10/100 port. We are, however, able to ping the GbE port and talk to AMC13s, but not the mCTR2s. We suspect that the MMC code for the mCTR2s is out of date. Eric is going to contact Tom Gorski on this one.

2012-11-21, hill

Attempting an initial installation of the VadaTech Commercial MCH card for the uTCA crate.

Procedure:

  1. Install the VadaTech module in the MCH2
  2. Initial attempt at IPbus-based communication with the uTCA crate results in successful pinging and control of the AMC13 in MCH1 and also the AMC13 in a non-MCH slot, but unsuccessful pinging of mCTR2s
  3. Relevant Documents are...
  4. 10/100 Ethernet has the default IP address 192.168.1.252, which conflicts with our IP assignment scheme for the AMC13s.
  5. According to the 'Getting Started Guide', the default IP address of the GbE is 192.168.40.250. Unsuccessful ping to this address.
  6. Run Eric's pinger.py to see what I can talk to. Within the range 192.168.1.255.......1, I can only see the AMC13s. This was to check and make sure the VadaTech MCH card didn't magically take on the same IP address as the NAT MCH card
  7. Despite my not being able to ping the GbE, I am going to try and connect anyway, as the documentation suggests.
          [cms2] /home/chill90 > ssh root@192.168.40.250
          ssh: connect to host 192.168.40.250 port 22: Connection timed out
          [cms2] /home/chill90 > ssh 192.168.40.250
          ssh: connect to host 192.168.40.250 port 22: Connection timed out
          
    Not surprisingly, no luck here. I get similar results if I try and access the IP address via a web browser
  8. Next try connecting via the 10/100 Ethernet Port at 192.168.1.252. This pings successfully!
  9. Eric has now take over.
  10. TelNet into the card successfully.
    • > telnet 192.168.1.252
    • Username: root
    • Password: root
  11. The HUB card will be running Linux. The command-line interface is based on the IPMI v2.0. The procedure to reassign the IP addresses of the ethernet ports is as follows
    1. Open the file /etc/rc.d/rc.conf for editing
    2. net interface 0 is the 10/100 Ethernet port. Edit IPADDR0 and/or NETMASK0 and/or BROADCAST0 as desired
    3. net interface 1 is the GbE port. Edit IPADDR1 and/or NETMASK1 and/or BROADCAST1 as desired
    4. Only one of either GATEWAY0 or GATEWAY1 should be set!! The MCH will use the device with the set 'GATEWAY' value to send traffic to other subnets and networks
    5. Power cycle the MCH for the changes to take effect

2012-10-26, hazen

NOTE: Don't design any software to use this feature yet. Some small details are likely to change, but the concept will remain.

Working on IP address setting by IPMI. Using Tom's recipe:

export IPMITOOLARGS="-H 192.168.1.11 -P \"\" -T 0x82 -B 0 -b 7"

# read Spartan IP address
>  ipmitool $IPMITOOLARGS -t 0xa4 raw 0x32 0x34 0 0 7 4
f6 01 a8 c0

# read Virtex IP address (bus addr byte determined empirically!)
>  ipmitool $IPMITOOLARGS -t 0xa4 raw 0x32 0x34 1 0 7 4
f7 01 a8 c0

Changing the address:

# set spartan address low byte to 128
> ipmitool $IPMITOOLARGS -t 0xa4 raw 0x32 0x33 0 0 7 1 0x80
# set virtex address low byte to 129
> ipmitool $IPMITOOLARGS -t 0xa4 raw 0x32 0x33 1 0 7 1 0x81

[cms2] /home/hazen > ping 192.168.1.128
PING 192.168.1.128 (192.168.1.128) 56(84) bytes of data.
64 bytes from 192.168.1.128: icmp_seq=1 ttl=64 time=0.240 ms
64 bytes from 192.168.1.128: icmp_seq=2 ttl=64 time=0.060 ms

It works!

2012-09-28, eric

***This has since been fixed. There was a discrepancy between our local Makefile and the release Makefile.

Charlie has complained of ghosts in the machine. All is OK if I do the following:

  • Goto cms1, run ./periodic_12hz
  • Run DCCdiagnose.exe, ttc/trig 1 (light blinks)
  • Goto AMC13Tool_3 directory, run AMC13Tool
  • do init.amc
  • turn on triggers for a while

Attempting to reproduce the EvN mismatch symptom described by Jeremy so Wu can look at it by remote control. First, update mCTR2s to firmware 0.5.20.

my AMC13Tool has dependency problems. Create a directory AMC13Tool_4 and copy Charlie's 11_5_5 exe and so there. That seems OK, but the stock AMC13Tool one gets with ". ~daqowner/dist/etc/env.sh; AMC13Tool.exe" segfaults immediately frown

2012-09-14, eric and charlie

Attempting to reproduce the result below and look at uHTR spy buffer using tools here. Perl script check_xxxx.pl loops generating one software L1A and checking using dump_DTC.exe for EvN errors.

Found one! AMC13 Data is here. uHTR DAQ spy output:

> spy
  0000 2 00ff 
  0001 3 ffff Event number 16777215 (0xffffff)
  0002 3 8000 
  0003 3 03ee Orbit number 0, submodule number 1006 (0x3ee)
  0004 3 6000 Format 6, BCN 0 (0x0)
  0005 3 0017 Presamples 2, TP words 0 
  0006 3 4511 Unsuppressed 0, compact mode 1, firmware rev 0x511
  0007 3 0028 Flavor 0, Pipeline length 40 
  0008 3 2000 NS 4 WC 0
  0009 3 ffff 
  000a 3 0000 
  000b 1 ff00 

This is not helpful as the EvN is for some reason all 1's.

2012-09-13, eric and charlie

Testing V0.5.11 uCTR firmware with AMC13 V=0x17 S=0xb firmware

At 1kHz fixed trigger rate for 1s, take some data. (here: warning! binary file).

See that uHTR stay in sync with AMC13 but there are some strangely-corrupted events.

Starting to read file...
First EvN is 1 (0x000001)
AMC13 EvN = 0x00000012  uHTR(10) EvN = 0x00451112 !!
AMC13 EvN = 0x0000004e  uHTR(4) EvN = 0x0045114e !!
AMC13 EvN = 0x00000218  uHTR(10) EvN = 0x0045111a !!
After reading 940 events, Last EvN is 940 (0x0003ac)

Note the 4511's. Here is a dump of EvN 12

AMC13 EvN = 0x00000012  uHTR(10) EvN = 0x00451112 !!
FED:   0 EvN: 000012  BcN: 1d5  OrN: 0008a747  TTS: 0/0 EvTyp: 1  CalTyp: 0 Size: 26
UHTR  4 [  12] EvN 000012 BcN 1d5 OrN 17
  0: 0412
  1: 0000
  2: 8000
  3: bbee
  4: 61d5
  5: 0017
  6: 4511
  7: 0028
  8: 2000
  9: ffff
 10: 0000
 11: 1208
UHTR 10 [  12] EvN 451112 BcN 1d7 OrN 04
  0: 0012
  1: 4511
  2: bbee
  3: 2000
  4: 61d7
  5: 0000
  6: 4539
  7: 2000
  8: ffff
  9: 0000
 10: 1208
 11: 0000

Note that in the 2nd uHTR payload the high byte of the 1st word is 00 (should be 0a) plus the 2nd word is 4511 (should be 0000). The 8000 word is missing, and the rest of the words are shifted up by one. The word count in the AMC13 header (see below) is 0b instead of 0c, with a zero fill word added (correctly) by the AMC13.

Below is a raw dump of the data from the AMC13 excerpted. (deadbeef and following word count added by software). Note the 4511 ffff

000770 deadbeef 0000001a 1d500008 51000012
000780 008a7470 00000000 00000010 00170410
000790 00000000 00000000 0000c00c 00000000
0007a0 00000000 0000c00b 00000412 bbee8000
0007b0 001761d5 00284511 ffff2000 12080000
0007c0 45110012 2000bbee 000061d7 20004539
0007d0 0000ffff 00001208 ad8c0000 a000000d
0007e0 deadbeef 0000001a 1d500008 51000013
0007f0 008a7530 00000000 00000010 00170410
000800 00000000 00000000 0000c00c 00000000
000810 00000000 0000c00c 00000413 1bee8000
000820 001761d5 00284511 ffff2000 13080000
000830 00000a13 1bee8000 001761d5 00284511
000840 ffff2000 13080000 766b0000 a000000d

2012-08-31, hazen

Working on AMC13 "pre-ship" certification test. Firmware up through virtex=0x10 have LSC/LDC implemented and can send DAQ data in the old "Wu" format. ] Enable by turning on "SLINK Enable" (bit 1 in reg 1). There are secret counters which monitor the LDC:

0x10 LDC_accept_cntr
0x11 LDC_abort_cntr
0x12 LDC_ACK_cntr
0x13 LDC_event_cntr
0x14 LDC_word_cntr
0x15 LDC_CRC_bad_cntr
0x16 LDC_SEQ_bad_cntr
0x17 LDC_wc_bad_cntr
0x18 LDC_frame_bad_cntr
0x19 LDC_buf_ovf_cntr
0x1a LDC_CMSCRC_err

2012-08-06, hazen

Download Jeremy's HEAD version from 8/2/12. Compile 3 test BIT/MCS files:

  • Default (jumper/flash set IP address)
  • Fixed I/P address 192.168.1.34
  • Fixed I/P address 192.168.115.254

Flash the 1.34 version. Responds to ping and mCTR2tool ok. Now try the fixed 115.254 version. Argh, the "reload" command once again doesn't work. Cycle crate power. It works... can talk to board at above IP.

Now try flashing the "default" firmware. "reload" works. Can still access at 115.254. Verify that SPI flash IP address is 1.32. Remove jumper J1204 and re-power. Still at 115.254 (sigh!). Maybe messed up compiling. Confusion: Slot 5 board had no J1204, but was installed in board in slot 11. Unplug slot 11 board altogether. Still get a ping response from 115.254.

Re-install J1204 jumper on slot 4 and re-power. Still get response from 115.254.

Compile new version with original code but jumper-controlled backup address changed from 115.254 to 115.240. Program this one to flash. (J1204 is installed). Do "reload". Responds at 115.240 as expected. Remove J1204 and re-power. Still at 115.240. Sigh.

Flash same firmware into other board. Same results.

Try setting IP address to 192.168.115.40 (maybe 115 is somehow special). No change.

Per Jeremy's request, program the 0.4.03 version from the web page into slot 11 board. Once again "reload" doesn't work. Power cycle. Can't access the board at all.

2012-08-02, hazen

Re-flash '2012-07-16a' version into both modules. Fix a few bugs in AMC13DaqTest.exe so it can handle nested scripts correctly. Run a few tests with script AMC13DaqTest/test.amc and see same event size in both mCTR. More tomorrow.

2012-08-01, hazen

Porting to ISE 14. Removed cores defined in ipcore_dir and substitute ones from Wu_Cores_30July.zip e-mailed by Wu. Import into ISE 14.1 and re-compile. Compiles OK and superficially works. However, seeing some discrepancies in event length between old and new firmwares.

2012-07-16, hazen

Merged changes are here: ctr2_merge_Wu.zip. Now add them to project and re-compile. Program into two mCTR2 in slots 5, 11 at IP addresses 192.168.1.32 and 192.168.1.40.

2012-07-13, hazen

Wu has made some minor changes to the code to fix the link reset problem. His changes are here: ctr2_Wu_Fix.zip but must be merged with my firmware from here: ctr2_trunk_with_DAQ_2012-07-12a_esh.zip.

I will start on this next week.

2012-07-12, hazen

Temporary solution to IP addressing: Assign a totally fixed address. In ctr2_uhtr1600.v just set ip_addr to a fixed value. Build two firmwares with addresses 192.168.1.32 and 192.168.1.40. The board now in slot 5 is 32 and slot 11 is 40.

DAQ links do not work with this firmware: ctr2_trunk_with_DAQ_2012-07-12a_esh.zip

Will ask Wu for help.

2012-07-11, hazen

Add two lines below to UCF and recompile per Wu's suggestion:

  NET "*/DAQ_Link_wu/UsrClk" TNM_NET = "DAQ_UsrClk";
  TIMESPEC "TS_DAQ_UsrClk" = PERIOD "DAQ_UsrClk" 8ns HIGH 50%;

MiniCTR2 Programming:

(instructions below from MN)

  1) Program the 4.02 firmware bitfile from JTAG.
  2) mCTR2tool.exe 192.168.115.254 -t uhtr
  3) select your device (1).  Then FLASH -> PROGRAM (you'll need an MCS file)
      If Wu's firmware is branched from ours after 3.01, then you can do this with Wu's firmware.
  To change the IP address, use the SPICFG menu of mCTR2tool. 

Here is what I did:

  • Generate MCS file with "generic parallel flash" and zero offset
  • mCTR2tool.exe 192.168.1.32 -t uhtr
  • "FLASH" then "PROG"

Doesn't work. It's a 128M SPI flash. That works better!

IP Addressing is a problem. NAT-MCH doesn't seem to route packets to any addresses outside 192.168.1.xxx, even though the settings are now correct:

[cms2] /home/hazen/work/uHTR_Firmware > telnet 192.168.1.11
Trying 192.168.1.11...
Connected to 192.168.1.11.
Escape character is '^]'.

Welcome to NAT-MCH

nat> ifconfig
network interface nat0:
  IP address:        192.168.1.11
  broadcast address: 192.168.255.255
  netmask:           255.255.0.0
nat>

2012-07-10, hazen

Attempting to integrate Wu's DAQ link into uHTR 4.02 release under ISE 13.3 on VirtualBox ampere.bu.edu on Eric's desktop machine.

  • Rename Wu's MiniCTR.vhd to DAQ_Link_wu.vhd and put in .../ctr2/ctr2_uhtr1600/DAQ_Path/ in project
    • Edit to remove chipscope stuff
  • Edit DAQ_Link.vhd to instantiate DAQ_Link_wu.vhd
  • Copy and add other sources:
    • Add CRC16D16.vhd
    • Add Hamming.vhd
    • copy ipcore_dir tree
    • Add dataFIFO.ngc, DataBuf.ngc, TDP16_16.ngc, SDP16_16.ngc

2012-07-09, hazen

Basic DAQ event readout now seems to work. Have added several useful commands to my new tool AMC13DaqTest which is currently only in ~hazen/src/11_4_1.

HOWTO record test data

Log on to CMS1 as user daq and start DCCdiagnose.exe to control TTC system. Log on to CMS2 and run AMC13DaqTest.exe. For now:

  $ source ~hazen/environ.sh
  $ /src/11_5_1/hcal/hcalUpgrade/amc13/bin/linux/x86_64_slc5/AMC13DaqTest.exe

cms1 (DCCdiagnose.exe) cms2 (AMC13DaqTest.exe) Notes
  en 5,11 Enable AMC inputs (will change to 0,10 at some point). Also resets AMC13
ttc/trig 4   Disable TTC triggers (set to VME control only)
ttc/cmd 2   Issue TTC Event Count Reset command
ttc/l1a 10   Generate 10 triggers
  st Display AMC13 status
Ctrl 0: 03000001
LSC Link Down
Ctrl 1: 000e0001
run mode
AMC Link status: 04100410
Mon buffer page: 00000000  Evts: 0000000a  words: 00000712
  df test.dat Dump events to file

You can dump the file in hex as follows:

   $ od -Ax -t x8 test.dat | less
   000000 00000712deadbeef 51000001c8c00008
   000010 000000059af5ddb0 000e041000000010
   000020 0000000000000000 000000000000c701
   000030 0000c70100000000 0800800000000401
   000040 0007000600058c8c 000b000a00090008
   000050 000f000e000d000c 0013001200110010
   000060 0017001600150014 001b001a00190018
   000070 001f001e001d001c 0023002200210020
     ....
   011af0 06f306f206f106f0 06f706f606f506f4
   011b00 06fb06fa06f906f8 705f06fe06fd06fc
   011b10 0000000000000a00 a00003891aca0000

2012-07-08, hazen

New firmware version 0xe. Still see problem#2 below. Investigating.

2012-07-07, hazen

Trying to reproduce a DAQ firmware problem.

Test #1 - failure to reset

(set Xilinx board to deliver fast triggers at 20kHz)

cms1 (DCCdiagnose.exe) cms2 (AMC13Tool.exe) Notes
ttc/trig 1   Enable triggers
ttc/trig 4   Disable triggers
  wv 3 410 Enable AMC inputs
  wv 1 1 Set run mode
  wv 0 1 Reset all
  rv d Read word count
  0xd 0x712 FAIL: Should be zero!
  wv 0 1 Reset all
  rv d Read word count
  0xd 0 Always zero 2nd time

This fails intermittently.

Test 2 with AMC13DaqTest.exe

cms1 (DCCdiagnose.exe) cms2 (AMC13DaqTest.exe) Notes
  en 5,11 Enable AMC inputs 5, 11 and reset AMC13
ttc/cmd 1   Reset EvN to 1
ttc/l1a   Make one trigger
  rd Read and display event
  ne advance to next event
ttc/l1a   Make trigger
  rv 0x4000 10 See corrupted data

This fails consistently:

It is probably a software problem, but it seems that I am competing for use of the TTC system so stop for now.

2012-07-05, hazen

Debugged AMC13 event readout code, added file dump feature to AMC13DaqTest.cc. Discovered a firmware issue which causes data to appear incorrect after the first event, reported to Wu.

E-mailed Jeremy to ask about word order question.

2012-07-01, hazen

Install new release 11_5_0 in daqowner. Do not set as default yet. Oops, 11_5_0 is buggy. Update immediately to 11_5_1 and make it the default.

2012-06-29, hazen

Checked in to CVS several updates of work done with Charlie.

HyperDAQ

    DTCManager.cc/hh were updated to include Charlie's new HyperDAQ. This required substantial changes to the address tables. These changes were done to the existing files AMC13_AddressTable_S6/V6.txt.

Run Control

    AMC13.cc/hh were updated to add startRun(), endRun(), AMCInputEnable(), nextEventSize(), readNextEvent().

2012-06-27, rohlf

Fix AMC13Tool to support hex version names in firmware files. Fork address tables with names AMC13_AddressTable_S7/V8.txt. FIXME: This address table is now incompatible with the one used by the rest of the software!

2012-06-12, hazen

First release firmware and instructions for DAQ in AMC13:

to set up L1A:
set L1a to about 20KHz
start dccdiagnose.exe
ttc/trig 4(turn off L1A)

run amc13tool
wv 3 410 (enable amc5 and amc11)
wv 1 1 (set run bit)
wv 0 1 (reset all)

ttc/cmd 0x28(reset orbit number)
ttc/trig 1 (enable L1A)

this will take 0x800 events in the memory and throw away all when the buffer is full.

In general, the initialization  and read out remains basically the same as with DCC2.

2012-05-24, rohlf

Sucessfully updated Spartan firmware from v3 to v6 at point 5. This was accomplished remotely in Bat. 40 at CERN with AMC13Tool by first writting Spartan v6 to flash at 0x000000 (its old location) as if programming the Header, power cycling the AMC13 module with the NAT tool, reprogramming the flash Header (0x000000), Golden (0x100000), Spartan (0x200000), and Virtex (0x400000), and then issuing the software command to reconfigure from flash.

2012-05-24, rohlf

Update firmware in cDAQ lab at CERN using python software. Older software (now named amc13_python_backup-23-may2012) was used to program Spartan firmware v6 into 0x0, power cycle crate, program Header (0x000000), Golden (0x100000), Spartan (0x200000), and Virtex (0x400000) where in the process a bug was discovered in programming the header due to incorrect erase of a single page, power cycling again, and verifying the configuration.

2012-04-27, hazen

Install HCAL xDAQ 11.4.0 (as daqowner) per instructions:

To install on a teststand (as daqowner) :
      wget http://cmshcalweb01.cern.ch/hcalsw/release/installDAQ_11_4_0.perl
      perl installDAQ_11_4_0.perl --mode=teststand
      ~daqowner/common/bin/pickRelease.sh (choose 11.4.0)

You can make a code-development area on a teststand or USC using
      perl installDAQ_11_4_0.perl --mode=[teststand|usc]
--ownsource=${HOME}/src/11_4_0 --packages=hcalUpgrade --cvsuser=[your afs id]
You can list multiple packages, separated by commas.

2012-01-10, hazen

Trying to program a new AMC13 MMC. Plug in to crate, connect JTAG ICE 3 cable to JTAG connector. Fails to program. Try to force power on with "pwr_on 9" (it's in AMC slot 4) but doesn't help.

Note:

  pwr_on [fru #]
  pwr_off [fru #]

  where [fru #] is the fru number starting with 5 to16. That means if you
  want to switch on e.g. AMC2 use the command "pwr_on 6".

Remembering wisdom of Tom Gorski:

    Maybe grasping at straws here, but if you look at the reset circuit for the AVR on the MMC schematic page, you see that it is important for the IPMI_ENABLE# line to be driven low, in order to release reset. You didn't say what your power situation was to the card. If you had it in a rig with an MCH trying to talk to the card, then the IPMI_ENABLE# line would indeed be zero, and the FET would not be pulling reset low, and the problem is probably somewhere else. On the other hand, if you have the card in a rig with a dumb power supply, then AVR reset will be pulled low unless a shunt is installed at JMP4. Since reset goes to the JTAG connector you might be able to check this with a voltmeter.

This should be taken care of by the uTCA backplane, no? Meanwhile, trying to program the thing in Wu's test fixture. Doesn't work. Checking to be sure jumpers are installed. AHA ... the required ECO (soldered wire) on the T3 board was missing. Now all is OK. One can indeed program the MMC on a new board in a MicroTCA crate.

Another observation: an AMC won't power up fully without the front panel hardware installed, as the handle switch must be depressed for the negotiation with the MMC to complete.

2011-12-06, heister

cms2: SLC5 updated, installation fine tuned, PyChips and microHAL installed for the "daq" user, all tested and works

2011-12-06, hazen

Install 2nd NIC on cmssun2 (old Dell computer, Ubuntu 10.04 OS). Cable to AMC13 in uTCA crate. Install IPBus test firmware from Wu. Install PyChips. Works! (I think).

Added AMC13FlashProgramming page.

2011-12-02, hazen

Started this log. One dead NAT-MCH at Boston. 2nd NAT-MCH shipped from MN by Jeremy.

Reset I/P address to 192.168.1.11 and connect to CMS1. One can now do cms1> telnet 192.168.1.11. But, this MCH has firmware v2.0 which doesn't work with AMC13 (symptom is both red/green LEDs blink forever on AMC13 with MMC firmware V1.2).

Updating MCH firmware to V2.10 received from NAT on 9/20/11. Briefly, the procedure is:

  unzip the e-mail attachement somewhere
  copy the "bin" file to /tftpboot/mch on cms1.bu.edu
  connect to the MCH using the USB console and type "update_firmware" and follow the prompts
  type "reboot" to load the new firmware

It works! Now the AMC13 seems to be up and running with the MCH.

-- EricHazen - 02 Dec 2011

Topic attachments
I Attachment History Action SizeSorted ascending Date Who Comment
PNGpng Screenshot_-_01192016_-_103020_AM.png r1 manage 28.9 K 19 Jan 2016 - 15:31 DanielGastler 2016-01-19 A working FEROL at BU: Webpage 1
PNGpng Screenshot_-_01192016_-_103040_AM.png r1 manage 45.2 K 19 Jan 2016 - 15:32 DanielGastler 2016-01-19 A working FEROL at BU: Webpage 2
JPEGjpg tempU2_1.jpg r1 manage 52.1 K 24 Jun 2013 - 22:19 BenjaminBDickens  
Texttxt dump_run000052_event00006104.txt r1 manage 64.2 K 19 Jan 2016 - 15:42 DanielGastler 2016-01-19 A working FEROL at BU: Event dump
Texttxt dump_run000052_event00005873_fed0100.txt r1 manage 65.4 K 19 Jan 2016 - 15:43 DanielGastler 2016-01-19 A working FEROL at BU: Fed dump
JPEGjpg clockdelaytest_inputofU25.jpg r1 manage 78.0 K 01 Jul 2013 - 21:35 BenjaminBDickens  
Edit | Attach | Watch | Print version | History: r165 | r154 < r153 < r152 < r151 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r152 - 12 Feb 2016 - DanielGastler
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback