darcs

Issue 2721 all transaction commands fail on Windows with ghc >= 9.6

Title all transaction commands fail on Windows with ghc >= 9.6
Priority Status resolved
Milestone Resolved in 2.18.1
Superseder Nosy List bfrk
Assigned To
Topics

Created on 2024-05-18.20:11:25 by bfrk, last changed 2024-05-24.15:02:11 by noreply.

Messages
msg23827 (view) Author: bfrk Date: 2024-05-18.20:11:25
The error message is

  parse error in file _darcs/rebase
  not enough input

I verified that somehow an empty _darcs/rebase gets created before the 
transaction starts, which causes revertTentativeRebase to copy this file to 
_darcs/rebase.tentative, which is then read and parsed which causes this failure. 
I also verified that there is no _darcs/rebase before the command is run.

How this happens and why only on Windows with ghc >= 9.6 is completely mysterious 
to me.
msg23832 (view) Author: ganesh Date: 2024-05-18.21:20:40
I'm seeing the same behaviour when I try it locally
(the empty `_darcs/rebase` in particular.)

I note that on Linux, after running a similar command
(like `darcs add`), we end up with `_darcs/rebase` containing
"rebase 0.2 {\n}". That surprised me a bit.
msg23833 (view) Author: bfrk Date: 2024-05-18.21:36:15
The rebase patch is nowadays handled in a generic way with no distinction 
between "a rebase is in progress" and "no rebase in progress", at least not in 
the core Repository layer. That is, "no rebase in progress" is represented as 
"rebase state with no patches", which is what you see. We do handle the case of 
"no _darcs/rebase present" with an exception handler during 
revertTentativeRebase.
msg23834 (view) Author: ganesh Date: 2024-05-18.22:31:07
It looks like it's actually being created inside revertTentativeRebase.

I'll debug more inside that tomorrow.
msg23835 (view) Author: bfrk Date: 2024-05-19.08:46:26
I have verified that indeed System.Directory.copyFile is buggy: instead 
of throwing an exception if the source file does not exist, it succeeds 
and creates an empty target file. Test program:

import System.Directory
main = copyFile "nonexistent" "shouldnotexist"

On Linux this fails with:

test: nonexistent: 
copyFile:atomicCopyFileContents:withReplacementFile:copyFileToHandle:ope
nFdAt: does not exist (No such file or directory)
msg23836 (view) Author: ganesh Date: 2024-05-19.09:07:36
Confirmed on my local machine - on Windows, the copyFile succeeds and
both files exist with size 0.

On Linux, the copyFile fails and neither file exists after.
msg23837 (view) Author: bfrk Date: 2024-05-19.09:47:17
The bug seems to be *not* in directory package: 3.1.7.1 can be used with 
both ghc-9.4 and ghc-9.6 but succeeds with teh former and fails with the 
latter. The likely culprit is Win32 (or perhaps base).
msg23838 (view) Author: ganesh Date: 2024-05-19.10:22:40
Now that we know about it, I guess we can guard any copyFile
operations appropriately? And then figure out getting the GHC
bug fixed separately (that would take time to roll out in any
case)
msg23840 (view) Author: bfrk Date: 2024-05-19.12:46:05
Okay, I have a patch that does this. Will run the ci and if it works send 
it. Could you report the issue upstream with what information we have? 
This is a pretty serious bug IMO. I can hardly believe that we are the 
first to run into that.
msg23841 (view) Author: ganesh Date: 2024-05-19.13:04:11
I've just checked GHC 9.10.1 and it has the same issue.
msg23842 (view) Author: ganesh Date: 2024-05-19.13:04:59
> Could you report the issue upstream with what information we have? 
This is a pretty serious bug IMO. I can hardly believe that we are the 
first to run into that.

Agreed. yes, I'll do the reporting.
msg23845 (view) Author: bfrk Date: 2024-05-19.13:17:50
I'd also be glad if you could help debugging issue2720. I tried to get 
behind this one yesterday but gave up because doing it via the CI is just 
too much hassle.
msg23855 (view) Author: ganesh Date: 2024-05-19.19:37:44
I think it is directory: even with GHC 9.4.8, directory 1.3.7.1
is ok and 1.3.8.0 is not ok. (with later GHCs you have to use
directory 1.3.8.1 because 1.3.8.0 has a version bound that excludes 
them)

I'll try to track it down in directory's git repo.
msg23860 (view) Author: bfrk Date: 2024-05-19.20:12:54
But then why does this build succeed (even though the result has the 
same behavior i.e. failure on Windows):

ben@home[1]:.../darcs/screened>cabal build -w ghc-9.6 --constraint 
'directory<1.3.7.2'
Resolving dependencies...
Build profile: -w ghc-9.6.2 -O1
In order, the following will be built (use -v for more details):
 - directory-1.3.7.1 (lib:directory) (requires build)
 - ...

?
msg23862 (view) Author: ganesh Date: 2024-05-19.20:40:33
I'm going back and checking darcs itself now, but I've been working with
this test program:

module Main where

import Control.Exception
import System.Directory
import System.IO.Error (isDoesNotExistError)

main :: IO ()
main = do
  removeFile "nonexistent.source" `catch` (\e -> if isDoesNotExistError e then return () else throwIO e)
  removeFile "shouldnotexist.target" `catch` (\e -> if isDoesNotExistError e then return () else throwIO e)
  (copyFile "nonexistent.source" "shouldnotexist.target" >> putStrLn "BAD: copyFile did not throw") `catch` (\e -> if isDoesNotExistError e then putStrLn "OK: copyFile threw" else throwIO e)
  doesFileExist "nonexistent.source" >>= \b -> if b then putStrLn "BAD: nonexistent.source exists" else putStrLn "OK: nonexistent.source does not exist"
  doesFileExist "shouldnotexist.target" >>= \b -> if b then putStrLn "BAD: shouldnotexist.target exists" else putStrLn "OK: shouldnotexist.target does not exist"
  removeFile "nonexistent.source" `catch` (\e -> if isDoesNotExistError e then return () else throwIO e)
  removeFile "shouldnotexist.target" `catch` (\e -> if isDoesNotExistError e then return () else throwIO e)
  return ()

and constraining Win32 with:

cabal run --with-compiler=/c/ghcup/ghc/9.4.8/bin/ghc --constraint 'Win32==2.13.3.0' copyfile-bug

and the problem seems to definitely start with this commit in directory:

commit 78b3e596502e180f228299f10726e13837d0a352 (HEAD)
Author: Julian Ospald <hasufell@posteo.de>
Date:   Sun May 8 20:19:46 2022 +0200^
msg23863 (view) Author: ganesh Date: 2024-05-19.20:53:47
Confirmed that the tests pass with

 cabal run --enable-tests test:darcs-test --constraint 'directory<1.3.7.2' -w /c/ghcup/bin/ghc-9.6 -- -t EXAMPLE

and fail with

 cabal run --enable-tests test:darcs-test --constraint 'directory==1.3.8.1' -w /c/ghcup/bin/ghc-9.6 -- -t EXAMPLE

btw it's *just* directory 1.3.8.0 that excludes GHC 9.6+, directory 1.3.7.1 supports it as does 1.3.8.1.
msg23864 (view) Author: ganesh Date: 2024-05-19.21:02:47
https://github.com/haskell/directory/issues/177
msg23868 (view) Author: bfrk Date: 2024-05-19.21:21:29
I just ran the CI (just the EXAMPLE.sh test) w/o my workaround patch but 
with directory constrained to < 1.3.7.2. I thought I had done that but 
apparently I did not or made a mistake because it actually looks good. 
Running the full test suite now for Windows with ghc-9.4, 9.6, and 9.8.

So a much simpler workaround would be to just limit directory to < 
1.3.7.2.
msg23869 (view) Author: bfrk Date: 2024-05-19.21:23:25
Or (perhaps clearer) to < 1.3.8.
msg23870 (view) Author: ganesh Date: 2024-05-19.21:30:11
I think restricting directory might leave us with similar problems
with Stackage/distributions as we have/had with tls.
msg23871 (view) Author: bfrk Date: 2024-05-19.21:38:06
Good point. Unless they fix this issue soon and make a new release. We 
can then allow < 1.3.8 and (say) >= 1.3.9 (or whatever its name will be) 
and hope other packages in stackage catch up quickly...
msg23875 (view) Author: ganesh Date: 2024-05-20.07:19:42
It's been fixed already and shipped in 1.3.8.5.

I'm getting quite a lot of test failures locally but those could just
be deficiencies in my Windows setup.
msg23876 (view) Author: bfrk Date: 2024-05-20.07:34:48
For the moment I have obliterated my complicated workaround from 
screened. I guess we will be fine excluding a few versions of directory.
msg23900 (view) Author: noreply Date: 2024-05-24.15:02:09
The following patch sent by Ben Franksen <ben.franksen@online.de> updated issue issue2721 with
status=resolved;resolvedin=2.18.1

Hash: 48b4d37e2d3ae4773260f7c7afd823bd1fa78024
Author: Ben Franksen <ben.franksen@online.de>
* resolve issue2721 by excluding certain versions of directory package

  Versions 1.3.8 up to 1.3.8.4 of the directory package have a bug in their
  implementation of copyFile on Windows.
History
Date User Action Args
2024-05-18 20:11:25bfrkcreate
2024-05-18 21:20:40ganeshsetmessages: + msg23832
2024-05-18 21:36:16bfrksetmessages: + msg23833
2024-05-18 22:31:07ganeshsetmessages: + msg23834
2024-05-19 08:46:27bfrksetmessages: + msg23835
2024-05-19 09:07:37ganeshsetmessages: + msg23836
2024-05-19 09:47:17bfrksetmessages: + msg23837
2024-05-19 10:22:40ganeshsetmessages: + msg23838
2024-05-19 12:46:05bfrksetmessages: + msg23840
2024-05-19 13:04:11ganeshsetmessages: + msg23841
2024-05-19 13:04:59ganeshsetmessages: + msg23842
2024-05-19 13:17:51bfrksetmessages: + msg23845
2024-05-19 19:37:44ganeshsetmessages: + msg23855
2024-05-19 20:12:54bfrksetmessages: + msg23860
2024-05-19 20:40:33ganeshsetmessages: + msg23862
2024-05-19 20:53:48ganeshsetmessages: + msg23863
2024-05-19 21:02:47ganeshsetmessages: + msg23864
2024-05-19 21:21:29bfrksetmessages: + msg23868
2024-05-19 21:23:25bfrksetmessages: + msg23869
2024-05-19 21:30:11ganeshsetmessages: + msg23870
2024-05-19 21:38:06bfrksetmessages: + msg23871
2024-05-20 07:19:42ganeshsetmessages: + msg23875
2024-05-20 07:34:48bfrksetmessages: + msg23876
2024-05-24 15:02:11noreplysetstatus: unknown -> resolved
messages: + msg23900
resolvedin: 2.18.1