Jekyll2022-08-21T01:20:12+00:00https://alexmalins.com/feed.xmlAlex MalinsData Scientist, Kraken Technologies JapanAlex MalinsUsing pandas isin() method with python holidays package2022-04-03T06:48:48+00:002022-04-03T06:48:48+00:00https://alexmalins.com/blog/using-pandas-isin-method-with-python-holidays-package<p>When working with a pandas time series, it is useful to know which dates in the time series are public holidays. This information may be used for data analysis, feature construction, improving models, etc.</p>
<p>Pandas has a built-in calendar for identifying <a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html?#timeseries-holiday">US Federal Holidays</a>. However if you live elsewhere in the world, you are out of luck - pandas requires that you manually create a holiday calendar for other countries.</p>
<p>Thankfully a superb Python package called <em><a href="https://pypi.org/project/holidays/">holidays</a></em> has your back. Holidays offers fast and efficient evaluation of holiday dates for over 80 countries worldwide. Using holidays is simple:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">holidays</span>
<span class="c1"># Create a dict-like object for England's public holidays
</span><span class="n">uk_holidays</span> <span class="o">=</span> <span class="n">holidays</span><span class="p">.</span><span class="n">UK</span><span class="p">(</span><span class="n">subdiv</span><span class="o">=</span><span class="s">"England"</span><span class="p">)</span>
<span class="c1"># Check some dates:
</span><span class="s">"2022-04-15"</span> <span class="ow">in</span> <span class="n">uk_holidays</span> <span class="c1"># True, Good Friday 2022
</span><span class="s">"2022-04-14"</span> <span class="ow">in</span> <span class="n">uk_holidays</span> <span class="c1"># False, the day before is just a normal working day
</span></code></pre></div></div>
<p>Holidays automatically calculates on-the-fly whether a date is a public holiday. You do not need to pre-specify the date range you are interested in, although this is possible (see later).</p>
<p>In example so far, holidays has automatically determined all the public holidays in England in 2022:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">val</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">uk_holidays</span><span class="p">.</span><span class="n">items</span><span class="p">()):</span>
<span class="k">print</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">val</span><span class="p">)</span>
</code></pre></div></div>
<p>Result:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2022-01-01 New Year's Day
2022-01-03 New Year's Day (Observed)
2022-04-15 Good Friday
2022-04-18 Easter Monday
2022-05-02 May Day
2022-06-02 Spring Bank Holiday
2022-06-03 Platinum Jubilee of Elizabeth II
2022-08-29 Late Summer Bank Holiday
2022-12-25 Christmas Day
2022-12-26 Boxing Day
2022-12-27 Christmas Day (Observed)
</code></pre></div></div>
<h3 id="pandas-time-series-example">Pandas time series example</h3>
<p>Now lets look at a pandas time series and consider the values on holidays. Lets use a dataset on motorway traffic in England downloaded from <a href="https://webtris.highwaysengland.co.uk/">WebTRIS</a>. The data is for traffic clockwise per day on the M25 between junctions 21 & 22 in 2021:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"m25_traffic_data_2021.csv"</span><span class="p">)</span>
<span class="n">df</span><span class="p">[</span><span class="s">"Date"</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">to_datetime</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s">"Date"</span><span class="p">])</span>
<span class="n">df</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Date</th>
<th>Vehicles per day</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>2021-01-01</td>
<td>15221</td>
</tr>
<tr>
<th>1</th>
<td>2021-01-02</td>
<td>25920</td>
</tr>
<tr>
<th>2</th>
<td>2021-01-03</td>
<td>23805</td>
</tr>
<tr>
<th>3</th>
<td>2021-01-04</td>
<td>43649</td>
</tr>
<tr>
<th>4</th>
<td>2021-01-05</td>
<td>46113</td>
</tr>
</tbody>
</table>
</div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">matplotlib</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="n">matplotlib</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">"seaborn-white"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"figure.figsize"</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="mf">8.8</span><span class="p">,</span> <span class="mf">3.6</span><span class="p">)</span>
<span class="n">ax</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"Date"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"Vehicles per day"</span><span class="p">)</span>
</code></pre></div></div>
<figure class="align-center">
<a href="https://alexmalins.com/images/traffic1.png"><img src="https://alexmalins.com/images/traffic1.png" style="width: 660px" class="align-center" /></a>
<figcaption>Traffic between J21 & J22 on the M25 in 2021.</figcaption>
</figure>
<p>From the time series we can immediately see there is:</p>
<ul>
<li>a weekly seasonality,</li>
<li>an effect of the easing of lockdown restrictions throughout early 2021,</li>
<li>a sensor downtime that occurred in early July,</li>
<li>and effects from the Christmas and New Year holiday periods.</li>
</ul>
<p>Lets pick out the data on public holidays using the holidays package:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="p">[</span><span class="s">"Date"</span><span class="p">].</span><span class="n">isin</span><span class="p">(</span><span class="n">uk_holidays</span><span class="p">)]</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Date</th>
<th>Vehicles per day</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
</div>
<p>Oh no! No results are reported and something is clearly wrong.</p>
<p>There is, unfortunately, an <a href="https://github.com/pandas-dev/pandas/issues/46123">incompatibility issue</a> between holidays and the pandas <code class="language-plaintext highlighter-rouge">isin()</code> method. It prevents holidays dynamically building its internal calendar on-the-fly.</p>
<p>Broadly there are two ways to circumvent the problem. Let us look at each in turn.</p>
<h3 id="use-apply--lambda-function-instead">Use <code class="language-plaintext highlighter-rouge">apply()</code> & lambda function instead</h3>
<p>The issue with the <code class="language-plaintext highlighter-rouge">isin()</code> method is that under the hood it is not performing <code class="language-plaintext highlighter-rouge">date in uk_holidays</code>. My preferred method instead is to use the <code class="language-plaintext highlighter-rouge">apply()</code> method to force such operations explicitly:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="p">[</span><span class="s">"Date"</span><span class="p">].</span><span class="nb">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">d</span><span class="p">:</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">uk_holidays</span><span class="p">)]</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Date</th>
<th>Vehicles per day</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>2021-01-01</td>
<td>15221</td>
</tr>
<tr>
<th>91</th>
<td>2021-04-02</td>
<td>52987</td>
</tr>
<tr>
<th>94</th>
<td>2021-04-05</td>
<td>43391</td>
</tr>
<tr>
<th>122</th>
<td>2021-05-03</td>
<td>50961</td>
</tr>
<tr>
<th>150</th>
<td>2021-05-31</td>
<td>69383</td>
</tr>
<tr>
<th>241</th>
<td>2021-08-30</td>
<td>72068</td>
</tr>
<tr>
<th>358</th>
<td>2021-12-25</td>
<td>32066</td>
</tr>
<tr>
<th>359</th>
<td>2021-12-26</td>
<td>47780</td>
</tr>
<tr>
<th>360</th>
<td>2021-12-27</td>
<td>55506</td>
</tr>
<tr>
<th>361</th>
<td>2021-12-28</td>
<td>54345</td>
</tr>
</tbody>
</table>
</div>
<p>Success! Now we have the traffic data on the public holidays. Note the <code class="language-plaintext highlighter-rouge">uk_holidays</code> object has also updated with the data on 2021 holidays:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">uk_holidays</span><span class="p">.</span><span class="n">years</span> <span class="c1"># {2021, 2022}
</span></code></pre></div></div>
<p>If instead of a column in the data frame, the dates are the index of the data frame, like so:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df2</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">df2</span> <span class="o">=</span> <span class="n">df2</span><span class="p">.</span><span class="n">set_index</span><span class="p">(</span><span class="s">"Date"</span><span class="p">)</span>
<span class="n">df2</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Vehicles per day</th>
</tr>
<tr>
<th>Date</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>2021-01-01</th>
<td>15221</td>
</tr>
<tr>
<th>2021-01-02</th>
<td>25920</td>
</tr>
<tr>
<th>2021-01-03</th>
<td>23805</td>
</tr>
<tr>
<th>2021-01-04</th>
<td>43649</td>
</tr>
<tr>
<th>2021-01-05</th>
<td>46113</td>
</tr>
</tbody>
</table>
</div>
<p>it is possible to use <code class="language-plaintext highlighter-rouge">index.map()</code> & the lambda function to select the holiday dates:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df2</span><span class="p">[</span><span class="n">df2</span><span class="p">.</span><span class="n">index</span><span class="p">.</span><span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">d</span><span class="p">:</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">uk_holidays</span><span class="p">)]</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Vehicles per day</th>
</tr>
<tr>
<th>Date</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>2021-01-01</th>
<td>15221</td>
</tr>
<tr>
<th>2021-04-02</th>
<td>52987</td>
</tr>
<tr>
<th>2021-04-05</th>
<td>43391</td>
</tr>
<tr>
<th>2021-05-03</th>
<td>50961</td>
</tr>
<tr>
<th>2021-05-31</th>
<td>69383</td>
</tr>
<tr>
<th>2021-08-30</th>
<td>72068</td>
</tr>
<tr>
<th>2021-12-25</th>
<td>32066</td>
</tr>
<tr>
<th>2021-12-26</th>
<td>47780</td>
</tr>
<tr>
<th>2021-12-27</th>
<td>55506</td>
</tr>
<tr>
<th>2021-12-28</th>
<td>54345</td>
</tr>
</tbody>
</table>
</div>
<h3 id="pre-specify-years-to-holidays">Pre-specify years to holidays</h3>
<p>Instead of relying on on-the-fly calculation of holiday dates, you can initialize the holidays calendar with all the vacation data you need. This way <code class="language-plaintext highlighter-rouge">isin()</code> works as expected:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="c1"># Initialize with holidays for 2021 & 2022
</span><span class="n">uk_holidays</span> <span class="o">=</span> <span class="n">holidays</span><span class="p">.</span><span class="n">UK</span><span class="p">(</span><span class="n">subdiv</span><span class="o">=</span><span class="s">"England"</span><span class="p">,</span> <span class="n">years</span><span class="o">=</span><span class="nb">range</span><span class="p">(</span><span class="mi">2021</span><span class="p">,</span> <span class="mi">2022</span><span class="p">))</span>
<span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="p">[</span><span class="s">"Date"</span><span class="p">].</span><span class="n">isin</span><span class="p">(</span><span class="n">uk_holidays</span><span class="p">)]</span>
</code></pre></div></div>
<div>
<style scoped="">
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Date</th>
<th>Vehicles per day</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>2021-01-01</td>
<td>15221</td>
</tr>
<tr>
<th>91</th>
<td>2021-04-02</td>
<td>52987</td>
</tr>
<tr>
<th>94</th>
<td>2021-04-05</td>
<td>43391</td>
</tr>
<tr>
<th>122</th>
<td>2021-05-03</td>
<td>50961</td>
</tr>
<tr>
<th>150</th>
<td>2021-05-31</td>
<td>69383</td>
</tr>
<tr>
<th>241</th>
<td>2021-08-30</td>
<td>72068</td>
</tr>
<tr>
<th>358</th>
<td>2021-12-25</td>
<td>32066</td>
</tr>
<tr>
<th>359</th>
<td>2021-12-26</td>
<td>47780</td>
</tr>
<tr>
<th>360</th>
<td>2021-12-27</td>
<td>55506</td>
</tr>
<tr>
<th>361</th>
<td>2021-12-28</td>
<td>54345</td>
</tr>
</tbody>
</table>
</div>
<p>So this method also works, but it requires you know the full date range of the time series first in order to set up the holidays object.</p>
<h3 id="bringing-it-all-together">Bringing it all together</h3>
<p>Now we can identify the holiday dates in the time series, let’s plot them on the graph:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">[</span><span class="s">"Holiday traffic"</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s">"Vehicles per day"</span><span class="p">][</span><span class="n">df</span><span class="p">[</span><span class="s">"Date"</span><span class="p">].</span><span class="nb">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">d</span><span class="p">:</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">uk_holidays</span><span class="p">)]</span>
<span class="n">ax</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"Date"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="p">[</span><span class="s">"Vehicles per day"</span><span class="p">,</span> <span class="s">"Holiday traffic"</span><span class="p">],</span> <span class="n">style</span><span class="o">=</span><span class="p">[</span><span class="s">"-"</span><span class="p">,</span> <span class="s">"o"</span><span class="p">])</span>
</code></pre></div></div>
<figure class="align-center">
<a href="https://alexmalins.com/images/traffic2.png"><img src="https://alexmalins.com/images/traffic2.png" style="width: 660px" class="align-center" /></a>
<figcaption>Orange dots show traffic on public holiday dates.</figcaption>
</figure>
<p>We see that traffic on public holidays on this stretch of motorway is generally lower than the surrounding days.</p>
<p>The holidays package can be used to analyze more deeply the effects of holidays in time series,</p>
<h3 id="tldr">TLDR</h3>
<p>The pandas <code class="language-plaintext highlighter-rouge">isin()</code> method is incompatible with the on-the-fly generation of holiday dates feature of the holidays package.</p>
<p>Use a lambda function and the <code class="language-plaintext highlighter-rouge">apply()</code> method instead. Or the <code class="language-plaintext highlighter-rouge">index.map()</code> method in your dates are the index of the pandas series / data frame.</p>
<p>Another option is to seed the holidays calendar with all years you are interested in beforehand. This way <code class="language-plaintext highlighter-rouge">isin()</code> will work as expected, but take care to seed all years otherwise you will get incorrect results.</p>Alex MalinsWhen working with a pandas time series, it is useful to know which dates in the time series are public holidays. This information may be used for data analysis, feature construction, improving models, etc.Installing PyNE on Windows via WSL2020-11-11T06:48:48+00:002020-11-11T06:48:48+00:00https://alexmalins.com/blog/installing-pyne-on-windows-via-wsl<p>Although <a href="https://pyne.io/">PyNE</a> is officially only supported for Linux and Mac, it is possible to build and use it on Windows using the Windows Subsystem for Linux (WSL) and the Ubuntu terminal app from the Microsoft Store. These instructions describe how to install a basic PyNE build (no MOAB, DAGMC or OpenMC interfaces) on Windows via WSL. They presume only some basic knowledge on using Linux and Windows command lines. They were written for Windows 10 (1909), Ubuntu 20.04 and PyNE 0.7.3 - mileage for other releases may vary.</p>
<h3 id="step-1---enable-wsl-in-windows">Step 1 - Enable WSL in Windows</h3>
<p>Open Windows PowerShell with administrator privileges (right click: Run as administrator) and run:</p>
<div class="language-bat highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">dism.exe</span> <span class="na">/online /enable-feature /featurename</span><span class="nl">:Microsoft</span><span class="na">-Windows-Subsystem-Linux /all /norestart
</span></code></pre></div></div>
<p>This enables WSL 1 in Windows. Now restart your PC.</p>
<p>(Or you could continue to update to WSL 2 at this point by following these <a href="https://docs.microsoft.com/en-us/windows/wsl/install-win10">instructions</a>. WSL 2 offers more advanced <a href="https://docs.microsoft.com/en-us/windows/wsl/wsl2-faq">features</a> that might be useful to you, such as GPU support, but they are not necessary for PyNE).</p>
<h3 id="step-2---install-ubuntu-2004-lts-from-the-microsoft-store">Step 2 - Install Ubuntu 20.04 LTS from the Microsoft Store</h3>
<p>From the Microsoft Store app, search for <em>Ubuntu</em> and install the Ubuntu 20.04 LTS app.</p>
<h3 id="step-3---run-ubuntu-2004-lts-and-install-libraries-needed-by-pyne">Step 3 - Run Ubuntu 20.04 LTS and install libraries needed by PyNE</h3>
<p>Run the Ubuntu 20.04 LTS app for the first time from the Windows Start menu, wait for it to set itself up, then enter a UNIX username and a password. Run the following commands in the terminal:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt update
<span class="nb">sudo </span>apt upgrade
<span class="nb">sudo </span>apt <span class="nb">install </span>python3-pip build-essential cmake gfortran libblas-dev liblapack-dev libeigen3-dev libhdf5-dev hdf5-tools
pip3 <span class="nb">install</span> <span class="nt">--user</span> <span class="nv">numpy</span><span class="o">==</span>1.17.4 scipy cython nose tables matplotlib future
</code></pre></div></div>
<p>This will install the libraries needed for a basic build of PyNE.</p>
<p>Now edit the <code class="language-plaintext highlighter-rouge">.bashrc</code> file using a text editor (<code class="language-plaintext highlighter-rouge">nano</code>, <code class="language-plaintext highlighter-rouge">emacs</code>, <code class="language-plaintext highlighter-rouge">vim</code> etc.) and add the following commands to the end of the file:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">alias </span><span class="nv">pip</span><span class="o">=</span>pip3
<span class="nb">alias </span><span class="nv">python</span><span class="o">=</span>python3
<span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/.local/bin:</span><span class="nv">$PATH</span><span class="s2">"</span>
<span class="nb">export </span><span class="nv">LD_LIBRARY_PATH</span><span class="o">=</span><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/.local/lib:/usr/lib/x86_64-linux-gnu/hdf5/serial:</span><span class="nv">$LD_LIBRARY_PATH</span><span class="s2">"</span>
</code></pre></div></div>
<p>Close the Ubuntu terminal.</p>
<h3 id="step-4---download-the-pyne-source-code">Step 4 - Download the PyNE source code</h3>
<p>Open your favourite browser in Windows and download a <code class="language-plaintext highlighter-rouge">.tar.gz</code> file containing the PyNE source code from <a href="https://github.com/pyne/pyne/releases">https://github.com/pyne/pyne/releases</a>. The latest release at the time of writing is <code class="language-plaintext highlighter-rouge">0.7.3</code>.</p>
<h3 id="step-5---copy-the-source-code-to-ubuntu-unpack-and-build-pyne">Step 5 - Copy the source code to Ubuntu, unpack, and build PyNE</h3>
<p>Reopen a Ubuntu 20.04 LTS terminal window and copy the source tarball into the Ubuntu WSL file system via:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cp</span> /mnt/c/DOWNLOAD_DIR/pyne-0.7.3.tar.gz <span class="nb">.</span>
</code></pre></div></div>
<p>Edit the drive letter and <code class="language-plaintext highlighter-rouge">DOWNLOAD_DIR</code> as appropriate to match where you downloaded the source file to. Unpack the source code and navigate into the source directory via:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">tar</span> <span class="nt">-xf</span> pyne-0.7.3.tar.gz
<span class="nb">cd </span>pyne-0.7.3/
</code></pre></div></div>
<p>Now build PyNE and install locally:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python setup.py <span class="nb">install</span> <span class="nt">--user</span>
</code></pre></div></div>
<p>Subject to build being successful, you can now run the following commands to make the nuclear data:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd
</span>nuc_data_make
</code></pre></div></div>
<p>An optional final step here is to run the tests:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd </span>pyne-0.7.3/tests/
./travis-run-tests.sh
</code></pre></div></div>
<h3 id="step-6---using-pyne-in-python-on-wsl">Step 6 - Using PyNE in Python on WSL</h3>
<p>You should now be able to use PyNE within Python on the Ubuntu 20.04 LTS app, e.g.:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">pyne</span> <span class="kn">import</span> <span class="n">data</span>
<span class="o">>>></span> <span class="n">data</span><span class="p">.</span><span class="n">half_life</span><span class="p">(</span><span class="s">"H-3"</span><span class="p">)</span>
<span class="mf">388789632.0</span>
</code></pre></div></div>
<h3 id="notes-and-alternative-windows-install-options-updated-june-06-2021">Notes and alternative Windows install options [Updated June 06, 2021]</h3>
<p>PyNE can be a little finicky to build, but persistence pays off. If you encounter any errors during builds, the Issues and PR sections of the PyNE Github <a href="https://github.com/pyne/pyne">repo</a>, and the <a href="https://groups.google.com/forum/#!forum/pyne-dev">pyne-dev</a> and <a href="https://groups.google.com/forum/#!forum/pyne-users">pyne-users</a> mailing lists are great sources of information.</p>
<p>The above instructions were based on the PyNE Ubuntu Install Scripts from <a href="https://github.com/pyne/install_scripts">https://github.com/pyne/install_scripts</a>. An alternative starting from Step 3 onwards would be to download and run the <code class="language-plaintext highlighter-rouge">ubuntu.sh</code> install script available from that repository. This will install MOAB, DAGMC and OpenMC as well.</p>
<p>Another alternative that is easier than compiling from source is to install <a href="https://gist.github.com/kauffmanes/5e74916617f9993bc3479f401dfec7da">Anaconda on WSL</a>, then install PyNE from <a href="https://anaconda.org/conda-forge/pyne">conda-forge</a>. See discussion on this option in <a href="https://groups.google.com/g/pyne-dev/c/dJVAc6K4L7w">these</a> <a href="https://groups.google.com/g/pyne-dev/c/MzhxFaYA9N4">threads</a> on the pyne-dev mailing list.</p>
<p>Finally, Gurdeep Kamal (<a href="https://www.tokamakenergy.co.uk/">Tokamak Energy</a>) reported on the MCNP users mailing list that PyNE can be installed on Windows using the <a href="https://pyne.io/install/vb.html">VirtualBox image</a>.</p>Alex MalinsAlthough PyNE is officially only supported for Linux and Mac, it is possible to build and use it on Windows using the Windows Subsystem for Linux (WSL) and the Ubuntu terminal app from the Microsoft Store. These instructions describe how to install a basic PyNE build (no MOAB, DAGMC or OpenMC interfaces) on Windows via WSL. They presume only some basic knowledge on using Linux and Windows command lines. They were written for Windows 10 (1909), Ubuntu 20.04 and PyNE 0.7.3 - mileage for other releases may vary.Interactive rebase to switch base branches mid GitHub pull request2020-08-23T13:46:15+00:002020-08-23T13:46:15+00:00https://alexmalins.com/blog/interactive-rebase-during-github-pull-request<p>A situation occurred the other week that I had not encountered before when working on a GitHub pull request for the <a href="https://pyne.io/">PyNE</a> Python package. The PR <a href="https://github.com/pyne/pyne/pull/1270">(#1270)</a> added a new function to initialize materials by supplying activities of radionuclides. It complements existing methods to create materials from masses or atom fractions.</p>
<p>The new lines of code were in my branch for the PR which was forked from the default <code class="language-plaintext highlighter-rouge">develop</code> branch on PyNE’s GitHub repository. Unbeknown to me at the time, the PyNE team were undertaking a hackathon to prepare for a new PyNE release, and active development was ongoing on a separate branch in the main repository (<code class="language-plaintext highlighter-rouge">0.7.0-rc</code>). Normally everything would be brought in line easily by rebasing the branch for PR (#1270) to the <code class="language-plaintext highlighter-rouge">0.7.0-rc</code> branch ready for merging the PR.</p>
<p>In this instance, however, there had previously been another PR <a href="https://github.com/pyne/pyne/pull/1264">(#1264)</a> merged into the <code class="language-plaintext highlighter-rouge">develop</code> branch subsequent to <code class="language-plaintext highlighter-rouge">0.7.0-rc</code> being branched off. This meant that after rebasing my branch to <code class="language-plaintext highlighter-rouge">0.7.0-rc</code>, the new PR (#1270) branch contained both the commits for the new activity-based material initializations and the irrelevant PR (#1264) commits. Schematically it looked like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>origin develop branch: A---C
\ \
new PR (#1270) branch: \ C-D
\ /
origin 0.7.0-rc branch: A-B
</code></pre></div></div>
<p>After rebasing to <code class="language-plaintext highlighter-rouge">0.7.0-rc</code>, the head branch for PR (#1270) presented both the <code class="language-plaintext highlighter-rouge">C</code> commits from PR (#1264) and the <code class="language-plaintext highlighter-rouge">D</code> commits associated with the new activity-to-material functionality.</p>
<p>The way to tidy this up so that only the relevant <code class="language-plaintext highlighter-rouge">D</code> commits for the new functionality were part of PR (#1270) was to perform a git interactive rebase:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git rebase <span class="nt">-i</span> HEAD~x
</code></pre></div></div>
<p><a href="https://www.atlassian.com/git/tutorials/rewriting-history/git-rebase">Interactive rebase</a>, among other things, lets you track back and delete previous git commits from the code lineage. Note <code class="language-plaintext highlighter-rouge">x</code> here is the number of commits you choose to track back and process during the interactive rebase. With interactive rebase, the <code class="language-plaintext highlighter-rouge">C</code> commits could be deleted from the head branch for PR (#1270), leaving only the relevant <code class="language-plaintext highlighter-rouge">D</code> commits, thus tidying everything up.</p>
<p>Credit goes to <a href="https://github.com/gonuke">Paul Wilson</a> and <a href="https://github.com/bam241">Baptiste Mouginot</a> from the University of Wisconsin-Madison for guidance on how to use git interactive rebase to perform this fix.</p>Alex MalinsA situation occurred the other week that I had not encountered before when working on a GitHub pull request for the PyNE Python package. The PR (#1270) added a new function to initialize materials by supplying activities of radionuclides. It complements existing methods to create materials from masses or atom fractions.Updating the version of a port on MacPorts2016-02-05T11:33:19+00:002016-02-05T11:33:19+00:00https://alexmalins.com/blog/updating-the-version-of-a-port-on-macports<p>If you come across a piece of software on MacPorts with an out of date version, here are instructions for how to update it. First open up a terminal window and identify the port’s maintainer:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>port info <span class="nt">--maintainers</span> <portname>
</code></pre></div></div>
<p>If the port has a maintainer, easiest is to send them a polite email requesting the port is brought up to date.</p>
<p>If the port has no maintainer (nomaintainer@macports.org), or allows open maintenance (openmaintainer@macports.org), you can upgrade the port manually yourself via the following steps. If successful, you can then submit a new ticket to <a href="https://trac.macports.org/">http://trac.macports.org/</a> in order for the update to be propagated onto the MacPorts server for the benefit of other users.</p>
<p>First change directory into the port folder:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> <span class="si">$(</span>port <span class="nb">dir</span> <portname><span class="si">)</span>
</code></pre></div></div>
<p>Backup the port build file:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo cp </span>Portfile Portfile.orig
</code></pre></div></div>
<p>Use a text editor to edit the software version in the Portfile to the latest version number (e.g. version 1.4.1 to 1.5):</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>nano Portfile
</code></pre></div></div>
<p>If the Portfile revision entry is not set to 0, set it to 0 as this is the first revision for the updated version. Now download the source code of the latest version of the port and find its checksums with the following command:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>port <span class="nt">-d</span> checksum <portname>
</code></pre></div></div>
<p>Copy the checksums (rmd160 and sha256) that are printed out for the for the latest version and paste them into place over the old checksums in the Portfile. Similarly, if the Portfile contains a livecheck.md5 entry, fetch this for the latest version and propagate it into the Portfile:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>port <span class="nt">-d</span> livecheck <portname>
</code></pre></div></div>
<p>You can now compile and install the latest version on your local machine by:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>port <span class="nt">-d</span> <span class="nb">install</span> <portname>
</code></pre></div></div>
<p>If the compile is successful and the updated software functions correctly, you can then file an update. First create a diff file for the changes to the Portfile:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>diff <span class="nt">-u</span> Portfile.orig Portfile | <span class="nb">sudo tee</span> <portname>.diff
</code></pre></div></div>
<p>Now make a new ticket at <a href="https://trac.macports.org/">http://trac.macports.org/</a>. Give the ticket a title like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><portname> @<current_version>: update to <latest_version>
</code></pre></div></div>
<p>CC in the email address coupled to the port which you identified at the beginning of this process. Attach the <portname>.diff file. All being well one of the MacPorts team will use your patch and update the port to the latest version on the MacPorts server within a few days.</portname></p>
<h3 id="notes">Notes</h3>
<p>This post was based on the instructions in the MacPorts howto/Upgrade guide: <a href="https://trac.macports.org/wiki/howto/Upgrade">https://trac.macports.org/wiki/howto/Upgrade</a></p>Alex MalinsIf you come across a piece of software on MacPorts with an out of date version, here are instructions for how to update it. First open up a terminal window and identify the port’s maintainer:Converting a scanned TIFF document to PDF and creating text searchable PDFs2014-12-08T12:07:51+00:002014-12-08T12:07:51+00:00https://alexmalins.com/blog/converting-a-scanned-tiff-document-to-pdf-and-creating-text-searchable-pdfs<p>You can follow these steps to create a text searchable PDF document if your scanner only outputs TIFF files. If your scanner creates PDF files but doesn’t perform OCR to make text searchable, skip to the last step.</p>
<h3 id="convert-tiff-to-pdf">Convert TIFF to PDF</h3>
<p><a href="https://imagemagick.org/index.php">ImageMagick</a> comes with a command line tool <a href="http://www.imagemagick.org/script/convert.php">magick</a> to do this.</p>
<div class="language-bat highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">magick</span> <span class="nb">convert</span> <span class="kd">scanned</span>.tiff <span class="kd">scanned</span>.pdf
</code></pre></div></div>
<p>Executing this command creates a PDF file from a TIFF created by a scanner.</p>
<h3 id="optional-rotate-the-pdf-pages"><em>Optional: Rotate the PDF Pages</em></h3>
<p>Sometimes the scanned pages will need rotating to the correct orientation. Use <a href="https://www.pdflabs.com/tools/pdftk-server/">PDFtk</a> to rotate the pages. Rotating all the pages in the scanned PDF by 90º anti-clockwise is achieved with the following command:</p>
<div class="language-bat highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">pdftk</span> <span class="kd">scanned</span>.pdf <span class="kd">cat</span> <span class="m">1</span><span class="na">-endwest </span><span class="kd">output</span> <span class="kd">rotated</span>.pdf
</code></pre></div></div>
<p>Individual pages can be selected and rotated as necessary, see <a href="https://www.pdflabs.com/docs/pdftk-cli-examples/">PDFtk examples</a>.</p>
<h3 id="perform-optical-character-recognition">Perform Optical Character Recognition</h3>
<p>For this step I resort to a copy of <a href="https://acrobat.adobe.com/us/en/acrobat/acrobat-pro.html">Acrobat Pro</a>.</p>
<p>It would have been nice if I had succeeded in achieving good quality output for this step using open source software. Solutions do exist, mainly using <a href="https://github.com/tesseract-ocr/tesseract">Tesseract</a> to do OCR and then forming a new PDF file with a text searchable layer hidden underneath the scanned images. See e.g. <a href="http://www.konradvoelkel.com/2013/03/scan-to-pdfa/">Voelkel’s</a> and <a href="https://github.com/fritz-hh/OCRmyPDF">OCRmyPDF</a> solutions.</p>
<p>However despite reasonable stabs, for various reasons I couldn’t succeed with either. The quality of the OCR output I was getting from Tesseract was lower than Acrobat. Also Acrobat offers the advantage that it performs small rotations to the pages to make sure the text is horizontal. So eventually I gave up on the open source route and now use Acrobat.</p>
<p>Note Acrobat can perform OCR on any PDF file. This is very useful for making old journal articles text searchable if the download offered by the publisher is not.</p>
<h3 id="edits">Edits</h3>
<p>24<sup>th</sup> August 2018 - Updated ImageMagick command “convert” to “magick convert”</p>Alex MalinsYou can follow these steps to create a text searchable PDF document if your scanner only outputs TIFF files. If your scanner creates PDF files but doesn’t perform OCR to make text searchable, skip to the last step.Removing pesky PDF title pages and other PDF tricks2014-12-07T11:43:31+00:002014-12-07T11:43:31+00:00https://alexmalins.com/blog/remove-pesky-pdf-title-pages-and-other-pdf-tricks<p>Taking out pages from PDF files, binding PDFs together, and removing PDF security. Here are a couple of free programmes to perform these tasks.</p>
<h3 id="taking-out-pages-from-a-pdf">Taking out pages from a PDF</h3>
<figure class="align-right">
<a href="https://alexmalins.com/images/pdftk.png"><img src="https://alexmalins.com/images/pdftk.png" style="width: 345px" class="align-right" /></a>
<figcaption>PDFtk command to delete the first page of a PDF.</figcaption>
</figure>
<p>Sometimes PDFs contain a title page or blank pages that you might want to remove. <a href="https://www.pdflabs.com/tools/pdftk-server/">PDFtk Server</a> is a useful command line tool that can take pages out from a PDF file. To remove the title page from a PDF file, run the following command:</p>
<div class="language-bat highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">pdftk</span> <span class="kd">input</span>.pdf <span class="kd">cat</span> <span class="m">2</span><span class="na">-end </span><span class="kd">output</span> <span class="kd">output</span>.pdf
</code></pre></div></div>
<p>Set the pages you want to maintain within the document after the <em>cat</em> command, and change <em>input.pdf</em> and <em>output.pdf</em> to suit.</p>
<h3 id="merging-pdf-files">Merging PDF files</h3>
<p>You may wish to combine a journal article with its supplementary information, or with comment letters and author responses. Again PDFtk can do this. To combine two pdf files into one:</p>
<div class="language-bat highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">pdftk</span> <span class="kd">input1</span>.pdf <span class="kd">input2</span>.pdf <span class="kd">cat</span> <span class="kd">output</span> <span class="kd">combined</span>.pdf
</code></pre></div></div>
<p>PDFtk can also perform a host of other useful tasks - see its <a href="https://www.pdflabs.com/docs/pdftk-cli-examples/">examples page</a>.</p>
<h3 id="removing-pdf-security">Removing PDF Security</h3>
<p>Sometimes PDF files come with encryption settings that prevent you from doing the above, or from performing <a href="https://en.wikipedia.org/wiki/Optical_character_recognition">OCR</a> on the document. I find that <a href="http://qpdf.sourceforge.net/">QPDF</a> works well for removing the security settings, even if you do not know the PDF password. To remove the security settings from a file run:</p>
<div class="language-bat highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">qpdf</span> <span class="o">-</span><span class="na">-decrypt </span><span class="kd">input</span>.pdf <span class="kd">output</span>.pdf
</code></pre></div></div>
<p>However QPDF does not always succeed. An alternative is to use a PDF printer driver to ‘print’ an unencrypted PDF. Examples of such software for Windows include CutePDF, Bullzip, PDFCreater etc. Preview on Mac will also ‘Export as PDF’. <a href="https://www.ghostscript.com/">Ghostscript</a> can also perform this task:</p>
<div class="language-bat highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">gs</span> <span class="na">-q -dNOPAUSE -dBATCH -sDEVICE</span><span class="o">=</span><span class="kd">pdfwrite</span> <span class="na">-sOutputFile</span><span class="o">=</span><span class="kd">output</span>.pdf <span class="na">-c </span>.setpdfwrite <span class="na">-f </span><span class="kd">input</span>.pdf
</code></pre></div></div>
<h3 id="more-powerful-manipulation-of-pdfs">More Powerful Manipulation of PDFs</h3>
<p>The python package <a href="https://pypi.org/project/pyPdf/">pyPdf</a> can be used to perform PDF file operations with the benefit of easy scripting ability. For instance when merging many hundreds of PDF sections of a larger document, each with a filename describing one part of the document (front matter, initial pages with Roman numerals, main pages…), I wrote a python script to merge the files in the correct order.</p>
<h3 id="edits">Edits</h3>
<p>9<sup>th</sup> June 2015 - Added ghostscript command to print to a new PDF file</p>Alex MalinsTaking out pages from PDF files, binding PDFs together, and removing PDF security. Here are a couple of free programmes to perform these tasks.