Windows Processes are Hanging

We are seeing new failures in our build environment ever since we have upgraded to the latest version of Joulescope. Are builds are hanging in the step that executes this command:

C:\Python\python.exe C:\CBAT_Test\Python_Datalogger\python_datalogger.py --duration 70 --frequency 50000 --product_tag CMIUV2_LTE_COM_AVG --command “C:\CBAT_Test\Labview_7.5_PUMA\Executable\Labview_Main_PUMA.exe - RunConnectivityTest COM10 PUMA 200 spare spare” --timedelay 15

We added additional parameters to your sample code for embedded commands into the data capture. Its almost like the subprocess method no longer works properly. What’s even more strange, rolling back to an earlier library versions of pyjoulescope_driver and joulescope do no fix the issue. Are we missing a library we need to roll back?

Hi @Nate - Sorry to hear that you are having issues with the script no longer working. You should be able to use C:\Python\python.exe -m pip freeze to display versions. What version of joulescope and pyjoulescope_driver are you seeing the failure with?

If you roll all the way back to joulescope version 0.9.11, do you see this issue?

C:\Python\python.exe -m pip uninstall pyjoulescope_driver joulescope 
C:\Python\python.exe -m pip install joulescope==0.9.11

pyjls==0.4.2
pyjoulescope-driver==1.0.5
joulescope==1.0.9

rollback to 0.9.11 works.

1 Like

Ok, so what if you upgrade to the latest?

C:\Python\python.exe -m pip uninstall pyjoulescope_driver joulescope 
C:\Python\python.exe -m pip install -U joulescope 

If you run the pip freeze command, you should see:

joulescope==1.0.12
pyjoulescope_driver==1.1.0

Does this still freeze in your CI run?

Also, double-check that the python version you are using from the command line is the same as the one used by the CI test. Even better, log the result of python3 -VV and pip freeze in your CI run.

Freezes up with the latest library, 0.9.11 works.

Step 4/8 : HW Self Test (Command Line)

[10:10:18] Step 5/8 : Commanded LTE Current Profile (Command Line) (running for 4m:48s)

[10:10:18] [Step 5/8] Starting: C:\CBAT_Test\Labview_6.5_Sequence\Executable\Labview_Test_Sequence.exe - C:\CBAT_Test\Labview_6.5_Sequence\Test_Sequence\Current_Profile_LTE_command_Joulescope.txt

[10:10:18] [Step 5/8] in directory: C:\BuildAgent\work\8c2e29fe287ce4cc

[10:10:19] [Step 5/8] Starting Labview_Test_Sequence C:\CBAT_Test\Labview_6.5_Sequence\Test_Sequence\Current_Profile_LTE_command_Joulescope.txt 11/17/2022 10:10:19 AM

[10:10:20] [Step 5/8] JS110-002636

[10:10:20] [Step 5/8]

[10:10:20] [Step 5/8] Python 3.10.0 (tags/v3.10.0:b494f59, Oct 4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)]

[10:10:20] [Step 5/8]

[10:10:21] [Step 5/8] async-generator==1.10

[10:10:21] [Step 5/8] attrs==21.4.0

[10:10:21] [Step 5/8] beautifulsoup4==4.10.0

[10:10:21] [Step 5/8] bs4==0.0.1

[10:10:21] [Step 5/8] certifi==2021.10.8

[10:10:21] [Step 5/8] cffi==1.15.0

[10:10:21] [Step 5/8] charset-normalizer==2.0.9

[10:10:21] [Step 5/8] cryptography==36.0.2

[10:10:21] [Step 5/8] cycler==0.11.0

[10:10:21] [Step 5/8] fonttools==4.28.3

[10:10:21] [Step 5/8] fpdf==1.7.2

[10:10:21] [Step 5/8] h11==0.13.0

[10:10:21] [Step 5/8] idna==3.3

[10:10:21] [Step 5/8] img2pdf==0.4.3

[10:10:21] [Step 5/8] joulescope==1.0.12

[10:10:21] [Step 5/8] kiwisolver==1.3.2

[10:10:21] [Step 5/8] lxml==4.6.4

[10:10:21] [Step 5/8] matplotlib==3.5.0

[10:10:21] [Step 5/8] numpy==1.23.4

[10:10:21] [Step 5/8] outcome==1.1.0

[10:10:21] [Step 5/8] packaging==21.3

[10:10:21] [Step 5/8] pandas==1.3.4

[10:10:21] [Step 5/8] pikepdf==4.1.0

[10:10:21] [Step 5/8] Pillow==8.4.0

[10:10:21] [Step 5/8] psutil==5.8.0

[10:10:21] [Step 5/8] pycparser==2.21

[10:10:21] [Step 5/8] pyjls==0.4.2

[10:10:21] [Step 5/8] pyjoulescope-driver==1.1.0

[10:10:21] [Step 5/8] pymonocypher==3.1.0.0

[10:10:21] [Step 5/8] pyOpenSSL==22.0.0

[10:10:21] [Step 5/8] pyparsing==3.0.6

[10:10:21] [Step 5/8] PyPDF2==1.26.0

[10:10:21] [Step 5/8] pypiwin32==223

[10:10:21] [Step 5/8] PySocks==1.7.1

[10:10:21] [Step 5/8] python-dateutil==2.8.2

[10:10:21] [Step 5/8] pytz==2021.3

[10:10:21] [Step 5/8] pywebio==1.5.0

[10:10:21] [Step 5/8] pywin32==302

[10:10:21] [Step 5/8] reportlab==3.6.3

[10:10:21] [Step 5/8] requests==2.26.0

[10:10:21] [Step 5/8] scipy==1.7.3

[10:10:21] [Step 5/8] selenium==4.1.3

[10:10:21] [Step 5/8] setuptools-scm==6.3.2

[10:10:21] [Step 5/8] six==1.16.0

[10:10:21] [Step 5/8] sniffio==1.2.0

[10:10:21] [Step 5/8] sortedcontainers==2.4.0

[10:10:21] [Step 5/8] soupsieve==2.3.1

[10:10:21] [Step 5/8] tomli==1.2.2

[10:10:21] [Step 5/8] tornado==6.1

[10:10:21] [Step 5/8] trio==0.20.0

[10:10:21] [Step 5/8] trio-websocket==0.9.2

[10:10:21] [Step 5/8] ua-parser==0.10.0

[10:10:21] [Step 5/8] urllib3==1.26.7

[10:10:21] [Step 5/8] user-agents==2.2.0

[10:10:21] [Step 5/8] wsproto==1.1.0

[10:10:21] [Step 5/8]

[10:10:24] [Step 5/8]

[10:10:24] [Step 5/8] wait 3

Thanks for the detail! I am at my contract manufacturer today, and I might not be able to investigate this until tomorrow.

I did notice that the python version is 3.10.0, which is rather old. The 3.10.x branch has had a bunch of bug fixes changelog. Not sure if they affect this, but that could be one thing to try. If you are going to upgrade, I recommend staying on 3.10.8 rather than going to 3.11.0, which is still very new.

I tried the upgrade to python 3.10.8. Still hangs.

Commanded LTE Current Profile (Command Line) (running for 3m:12s)

[10:44:55] [Step 5/8] Starting: C:\CBAT_Test\Labview_6.5_Sequence\Executable\Labview_Test_Sequence.exe - C:\CBAT_Test\Labview_6.5_Sequence\Test_Sequence\Current_Profile_LTE_command_Joulescope.txt

[10:44:55] [Step 5/8] in directory: C:\BuildAgent\work\8c2e29fe287ce4cc

[10:44:57] [Step 5/8] Starting Labview_Test_Sequence C:\CBAT_Test\Labview_6.5_Sequence\Test_Sequence\Current_Profile_LTE_command_Joulescope.txt 11/17/2022 10:44:56 AM

[10:44:57] [Step 5/8] JS110-002636

[10:44:57] [Step 5/8]

[10:44:57] [Step 5/8] Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]

[10:44:57] [Step 5/8]

[10:44:58] [Step 5/8] async-generator==1.10

[10:44:58] [Step 5/8] attrs==21.4.0

[10:44:58] [Step 5/8] beautifulsoup4==4.10.0

[10:44:58] [Step 5/8] bs4==0.0.1

[10:44:58] [Step 5/8] certifi==2021.10.8

[10:44:58] [Step 5/8] cffi==1.15.0

[10:44:58] [Step 5/8] charset-normalizer==2.0.9

[10:44:58] [Step 5/8] cryptography==36.0.2

[10:44:58] [Step 5/8] cycler==0.11.0

[10:44:58] [Step 5/8] fonttools==4.28.3

[10:44:58] [Step 5/8] fpdf==1.7.2

[10:44:58] [Step 5/8] h11==0.13.0

[10:44:58] [Step 5/8] idna==3.3

[10:44:58] [Step 5/8] img2pdf==0.4.3

[10:44:58] [Step 5/8] joulescope==1.0.12

[10:44:58] [Step 5/8] kiwisolver==1.3.2

[10:44:58] [Step 5/8] lxml==4.6.4

[10:44:58] [Step 5/8] matplotlib==3.5.0

[10:44:58] [Step 5/8] numpy==1.23.4

[10:44:58] [Step 5/8] outcome==1.1.0

[10:44:58] [Step 5/8] packaging==21.3

[10:44:58] [Step 5/8] pandas==1.3.4

[10:44:58] [Step 5/8] pikepdf==4.1.0

[10:44:58] [Step 5/8] Pillow==8.4.0

[10:44:58] [Step 5/8] psutil==5.8.0

[10:44:58] [Step 5/8] pycparser==2.21

[10:44:58] [Step 5/8] pyjls==0.4.2

[10:44:58] [Step 5/8] pyjoulescope-driver==1.1.0

[10:44:58] [Step 5/8] pymonocypher==3.1.0.0

[10:44:58] [Step 5/8] pyOpenSSL==22.0.0

[10:44:58] [Step 5/8] pyparsing==3.0.6

[10:44:58] [Step 5/8] PyPDF2==1.26.0

[10:44:58] [Step 5/8] pypiwin32==223

[10:44:58] [Step 5/8] PySocks==1.7.1

[10:44:58] [Step 5/8] python-dateutil==2.8.2

[10:44:58] [Step 5/8] pytz==2021.3

[10:44:58] [Step 5/8] pywebio==1.5.0

[10:44:58] [Step 5/8] pywin32==302

[10:44:58] [Step 5/8] reportlab==3.6.3

[10:44:58] [Step 5/8] requests==2.26.0

[10:44:58] [Step 5/8] scipy==1.7.3

[10:44:58] [Step 5/8] selenium==4.1.3

[10:44:58] [Step 5/8] setuptools-scm==6.3.2

[10:44:58] [Step 5/8] six==1.16.0

[10:44:58] [Step 5/8] sniffio==1.2.0

[10:44:58] [Step 5/8] sortedcontainers==2.4.0

[10:44:58] [Step 5/8] soupsieve==2.3.1

[10:44:58] [Step 5/8] tomli==1.2.2

[10:44:58] [Step 5/8] tornado==6.1

[10:44:58] [Step 5/8] trio==0.20.0

[10:44:58] [Step 5/8] trio-websocket==0.9.2

[10:44:58] [Step 5/8] ua-parser==0.10.0

[10:44:58] [Step 5/8] urllib3==1.26.7

[10:44:58] [Step 5/8] user-agents==2.2.0

[10:44:58] [Step 5/8] wsproto==1.1.0

[10:44:58] [Step 5/8]

[10:45:01] [Step 5/8]

[10:45:01] [Step 5/8] wait 3

1 Like

Hi @Nate - I tried playing with the script, and this seems to work from the command line with both the JS110 and JS220:

python python_datalogger.py --duration 70 --frequency 50000 --product_tag hi --timedelay 15 --command "python -VV"

Does the same command work from the command line but fail from CI?

It looks like you have access to stdout. Can you try to isolate what is hanging? For example, add a print before device.start, before time.sleep, before subprocess.run, and after subprocess.run.

What does Task Manager show when this hang occurs? I see this:

Actual utilization seems to fluctuate from 5% to 17%.

It works from the command line. The print statements don’t seem to help when the LabVIEW command is embedded in the datalogger.

Here is the sequence file I am running in the CI environment:

joulescope capture --duration 0.00001 junk.jls
C:\CBAT_Test\Labview_7.2_Relay_Control\Executable\CMD_Relay_Control.exe - RELAY_CONTROL Relay_Serial_Number Off On On Off True
wait 3
C:\Python\python.exe C:\CBAT_Test\Python_Datalogger\python_datalogger.py --duration 1 --frequency 50000 --product_tag CMIUV2_LTE_COM_AVG --command “python -VV” --timedelay 1
wait 3
C:\Python\python.exe C:\CBAT_Test\Python_Datalogger\python_datalogger.py --duration 70 --frequency 50000 --product_tag CMIUV2_LTE_COM_AVG --command “C:\CBAT_Test\Labview_7.5_PUMA\Executable\Labview_Main_PUMA.exe - RunConnectivityTest COM10 PUMA 200 spare spare” --timedelay 15
wait 3
C:\CBAT_Test\Labview_7.2_Relay_Control\Executable\CMD_Relay_Control.exe - RELAY_CONTROL Relay_Serial_Number Off Off Off Off True

When I am running the old version of Joulescope library, this is my output from the CI:

[08:05:29] [Step 5/8] Starting: C:\CBAT_Test\Labview_6.5_Sequence\Executable\Labview_Test_Sequence.exe - C:\CBAT_Test\Labview_6.5_Sequence\Test_Sequence\Current_Profile_LTE_command_Joulescope_DEBUG.txt

[08:05:29] [Step 5/8] in directory: C:\BuildAgent\work\8c2e29fe287ce4cc

[08:05:30] [Step 5/8] Starting Labview_Test_Sequence C:\CBAT_Test\Labview_6.5_Sequence\Test_Sequence\Current_Profile_LTE_command_Joulescope_DEBUG.txt 11/18/2022 8:05:30 AM

[08:05:30] [Step 5/8] done capturing data: quit from stop duration

[08:05:30] [Step 5/8]

[08:05:33] [Step 5/8]

[08:05:33] [Step 5/8] wait 3

[08:05:38] [Step 5/8] Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]

[08:05:38] [Step 5/8] Current Joulescopes: [<joulescope.driver.Device object at 0x00000262C6890190>]

[08:05:38] [Step 5/8]

[08:05:38] [Step 5/8] C:\CBAT_Test\Datalogs\20221118_080537_CMIUV2_LTE_COM_AVG.jls

[08:05:38] [Step 5/8]

[08:05:38] [Step 5/8] Before device.start

[08:05:38] [Step 5/8] Before time.sleep

[08:05:38] [Step 5/8] Before subprocess.run

[08:05:38] [Step 5/8] After subprocess.run

[08:05:38] [Step 5/8] Capturing data: type CTRL-C to stop

[08:05:38] [Step 5/8]

[08:05:38] [Step 5/8] wait 3

[08:06:51] [Step 5/8] Current Joulescopes: [<joulescope.driver.Device object at 0x0000025CFFF80190>]

[08:06:51] [Step 5/8]

[08:06:51] [Step 5/8] C:\CBAT_Test\Datalogs\20221118_080541_CMIUV2_LTE_COM_AVG.jls

[08:06:51] [Step 5/8]

[08:06:51] [Step 5/8] Before device.start

[08:06:51] [Step 5/8] Before time.sleep

[08:06:51] [Step 5/8] Before subprocess.run

[08:06:51] [Step 5/8] After subprocess.run

[08:06:51] [Step 5/8] Capturing data: type CTRL-C to stop

[08:06:51] [Step 5/8]

[08:06:51] [Step 5/8] wait 3

[08:06:57] [Step 5/8]

[08:06:57] [Step 5/8] Finished Labview_Test_Sequence 11/18/2022 8:06:57 AM

[08:06:58] [Step 5/8] Process exited with code 0

When I upgrade the Joulescope library running the same sequence file in the CI:

[08:09:10] [Step 5/8] Starting: C:\CBAT_Test\Labview_6.5_Sequence\Executable\Labview_Test_Sequence.exe - C:\CBAT_Test\Labview_6.5_Sequence\Test_Sequence\Current_Profile_LTE_command_Joulescope_DEBUG.txt
[08:09:10] [Step 5/8] in directory: C:\BuildAgent\work\8c2e29fe287ce4cc
[08:09:11] [Step 5/8] Starting Labview_Test_Sequence C:\CBAT_Test\Labview_6.5_Sequence\Test_Sequence\Current_Profile_LTE_command_Joulescope_DEBUG.txt 11/18/2022 8:09:11 AM
[08:09:11] [Step 5/8] done capturing data: quit from stop duration
[08:09:11] [Step 5/8]
[08:09:15] [Step 5/8]
[08:09:15] [Step 5/8] wait 3
[08:09:19] [Step 5/8] Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]
[08:09:19] [Step 5/8] Current Joulescopes: [<joulescope.v1.js110.DeviceJs110 object at 0x0000027BDE741BA0>]
[08:09:19] [Step 5/8]
[08:09:19] [Step 5/8] C:\CBAT_Test\Datalogs\20221118_080918_CMIUV2_LTE_COM_AVG.jls
[08:09:19] [Step 5/8]
[08:09:19] [Step 5/8] Before device.start
[08:09:19] [Step 5/8] Before time.sleep
[08:09:19] [Step 5/8] Before subprocess.run
[08:09:19] [Step 5/8] After subprocess.run
[08:09:19] [Step 5/8] Capturing data: type CTRL-C to stop
[08:09:19] [Step 5/8]
[08:09:19] [Step 5/8] wait 3
[08:21:47] [Step 5/8] Process exited with code 1
[08:21:47] [Step 5/8] Step Commanded LTE Current Profile (Command Line) interrupted
[08:21:47] Step 6/8: Python Current Waveform Analysis (Command Line)
[08:21:47] [Step 6/8] Build step Python Current Waveform Analysis (Command Line) is skipped because the previous step was interrupted
[08:21:47] Step 7/8: Run Python Bipartisan Judge (Command Line)
[08:21:47] [Step 7/8] Build step Run Python Bipartisan Judge (Command Line) is skipped because the previous step was interrupted

I have to click stop because the build in CI is hanging.

Thoughts?

Ok, if I understand correctly, this is the command that is hanging:

C:\Python\python.exe C:\CBAT_Test\Python_Datalogger\python_datalogger.py --duration 1 --frequency 50000 --product_tag CMIUV2_LTE_COM_AVG --command “python -VV” --timedelay 1

I think that this is your test command. The message order is a little confusing. I see done capturing data: quit from stop duration before device.start, so I am not sure what is happening. I am not sure exactly what to make of the log file output.

For the command, you should ensure that you use the same python:

C:\Python\python.exe C:\CBAT_Test\Python_Datalogger\python_datalogger.py --duration 1 --frequency 50000 --product_tag CMIUV2_LTE_COM_AVG --command “C:\Python\python.exe -VV” --timedelay 1

At 08:09:19, it reaches the data collection while not quit_: spin loop. If the command launched at 08:09:11, I have no idea why it took until 08:09:19 to get here. I do not know why it would exit correctly from the command line, but not from your CI environment.

Maybe more print statements can help isolate the problem?

  • Inside on_stop
  • before finally
  • before device.stop
  • before recorder.close
  • before device.close
  • after device.close

I also noticed that the script you sent earlier has inconsistent use of sys.exit. You probably want to have run return an integer, 0=success, any other value on failure, and then:

if __name__ == '__main__':
    rc = run()
    print(f'return_code = {rc}')
    sys.exit(rc)

The else case should return a code, not sys.exit with a string.

I figured out quit from stop duration. It comes from the joulescope capture command. Is it possible to also show the commands in the CI log? It’s difficult to know what output comes from what command.

I think that this simplified sequence of commands does what you want:

joulescope parameter_set --config auto i_range=on
C:\CBAT_Test\Labview_7.2_Relay_Control\Executable\CMD_Relay_Control.exe - RELAY_CONTROL Relay_Serial_Number Off On On Off True
wait 3
C:\Python\python.exe -VV
wait 3
C:\Python\python.exe C:\CBAT_Test\Python_Datalogger\python_datalogger.py --duration 70 --frequency 50000 --product_tag CMIUV2_LTE_COM_AVG --command “C:\CBAT_Test\Labview_7.5_PUMA\Executable\Labview_Main_PUMA.exe - RunConnectivityTest COM10 PUMA 200 spare spare” --timedelay 15
wait 3
C:\CBAT_Test\Labview_7.2_Relay_Control\Executable\CMD_Relay_Control.exe - RELAY_CONTROL Relay_Serial_Number Off Off Off Off True
wait 3

C:\Python\python.exe C:\CBAT_Test\Python_Datalogger\python_datalogger.py --duration 1 --frequency 50000 --product_tag CMIUV2_LTE_COM_AVG --command “python -VV” --timedelay 1

wait 3

C:\Python\python.exe C:\CBAT_Test\Python_Datalogger\python_datalogger.py --duration 70 --frequency 50000 --product_tag CMIUV2_LTE_COM_AVG --command “C:\CBAT_Test\Labview_7.5_PUMA\Executable\Labview_Main_PUMA.exe - RunConnectivityTest COM10 PUMA 200 spare spare” --timedelay 15

These are the two lines of interest from the sequence file. The first is stripped down to just showing the python version. This line executes perfectly.

|[08:09:19]|[Step 5/8] Before device.start|
|[08:09:19]|[Step 5/8] Before time.sleep|
|[08:09:19]|[Step 5/8] Before subprocess.run|
|[08:09:19]|[Step 5/8] After subprocess.run|
|[08:09:19]|[Step 5/8] Capturing data: type CTRL-C to stop|

Its when the next line is executed (the line with the embedded command “C:\CBAT_Test\Labview_7.5_PUMA\Executable\Labview_Main_PUMA.exe - RunConnectivityTest COM10 PUMA 200 spare spare”) that the CI hangs.

So are you saying that the second invocation has no output at all? I would expect to at least see Current Joulescopes:.

I wonder if the back-to-back invocations are causing an issue. I tried this:

set CMD=python python_datalogger.py --duration 2 --frequency 50000 --product_tag CMIUV2_LTE_COM_AVG
%CMD% && %CMD% && %CMD% && %CMD%

… but it worked perfectly :frowning:

I understand that you need the --command to run a valid test, but if you remove it temporarily, does the CI then run to completion?

No. I wonder how the subprocess method handles a NONE argument.

I wonder if the CI buffers the messages and it hangs before it can display them?

Matt,

One thing I could help explore is “what is it exactly in the call to the Labview_Main_PUMA.exe” that the new Joulescope library doesn’t like. We can strip it down to just pass out a print statement and work up towards the COM port its acquiring to the calls its making under the hood.

Before doing that, were you able to get more log information out of CI? Ideally, with the commands and print showing exactly where things hang?

Just to confirm, if you create a batch file with the exact same commands and run them at the Windows command line, everything works?

With downgraded Joulescope library from the command line I get this:

This is the signal I expect. When the LabVIEW code is running that tells the radio to transmit, I see activity in a dialog box.

image

The read counter is incrementing:

image

When I upgrage python and run everything from the command line:

Watching it, the Read loops for bootup stuff incremented to 499, but the read loops for command replys never increments and the dialog box doesn’t fill up. The 200 second timeout for the LabVIEW doesn’t seem to work.

Does that clarify whats happening?