Yes, github's language detection is definitely messed up, especially in the C language family. A while ago, a C or C++ source file would be detected as Objective-C if you had a variable called "id" (I think that's been fixed though).
I know it's not truly this simple, but if the file extension is ".ino", I feel like your detection algorithm should be free to use that as a massive indication it is Arduino code.